DETAILED ACTION
This communication is in response to the Amendments and Arguments filed on December 16, 2020. Claims 1, 3-4, 6-12, 14-15, and 17-19 are pending and have been examined.
All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments 
Applicant’s amendment filed on December 16, 2020 has been entered. 
In view of the amendment to the claims, the amendment of claims 1, 6, 12, and 19, and the cancellation of claims 2, 5, 13, and 16 have been acknowledged.  
In view of the cancellation of claims 2, 5, 13, and 16, the 35 U.S.C. §112(b) and 35 U.S.C. §103 rejections of claims 2, 5, 13, and 16 have been withdrawn. 
In view of the amendment and explanations of claims 1, 12, and 19, the 35 U.S.C. §112(b) rejections of claims 1, 12, and 19 are amended in response to the amendment of claims 1, 12, and 19.
The Examiner, as of the Non-Final Office Action dated November 4, 2020, rejected claims 1, 12, and 19 as being indefinite due to an indefinite claim limitation. The amendment provided by the Applicant in the Response filed December 16, 2020, fails to address the indefiniteness noted by the Examiner. 
In light of broadest reasonable interpretation of the claims based on the amendments presented, the Examiner has amended the rejections under both 35 U.S.C. §112(b) and 35 U.S.C. §103. Further explanation of rejections are presented in the appropriate sections below.

Response to Arguments
The Applicant’s arguments with respect to enablement of claims 12-19, see pages 7-11 of the Response filed December 16, 2020, have been fully considered and are not persuasive.  Examiner reminds Applicant that the rejection of claims 12-19 in the Non-Final Office Action dated November 4, 2020 was, in part, on the basis of 35 U.S.C. §112(b) as being indefinite, not on the basis of 35 U.S.C. §112(a) as being non-enabled. As such, the Applicant’s arguments regarding enablement are moot.
However, in light of the explanations from the Applicant regarding corresponding structure, the 112(b) rejection of claims 12-19 as being indefinite has been withdrawn. 
Regarding the corresponding structure, the Applicant explains, on page 8 of the Response, that the corresponding structure for the “amplitude correction module” of claims 12 and 19, is “the computer 800 including processor and memory” to “perform at least part of the processing described herein, such as the processing of FIGs. 5, 6, and/or 7.” The Applicant further applies the above explanation to “similar rejections under §112 of other claim features” on pages 8 and 11 of the Response. 
Therefore, for the purposes of 35 U.S.C. §112(f) interpretation, the Examiner understands the corresponding structure for the “time alignment module”, the “amplitude correction module”, and the “frequency-based processing module” of claims 12 and 19 to be “the computer 800 including processor and memory” and the algorithm described as the function for each of said modules.
The Applicant's arguments regarding the prior art rejections, see pages 11-14 of the Response filed December 16, 2020, have been fully considered but they are not persuasive. 
Regarding claim 1, the Applicant asserts that “the proposed combination of frequency band signal amplitude in Sudo cannot be combined with Canniff to arrive at the invention, as claimed, since the proposed combination cannot perform ‘adjusting an amplitude of one or both of the chunks of the first audio file and the second audio file and generating an amplitude adjusted 
In response to the Applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
As explained in the Non-Final Office Action dated November 4, 2020, Canniff discloses the time alignment of the first and second audio files. (Non-Final Office Action, page 15). Canniff is modified by the teachings of Sudo to include adjusting the amplitude of the time-aligned first and second audio files. (Non-Final Office Action, pages 15-16). The Applicant cannot show nonobviousness based on lack of teaching of time-alignment in Sudo, as the rejection is based in part on the combination of time alignment from Canniff and adjusting the amplitude in Sudo.
The Applicant further argues that “the amplitude of signals in a frequency band is totally and completely different than adjusting the amplitudes of time-aligned first and second audio files.” Though not completely clear, the Examiner understands that the Applicant disagrees with the combination of Sudo and Canniff to achieve time-based processing of the amplitude adjusted output. However, the Applicant’s arguments are moot in light of newly cited references. 
Regarding the newly cited references, the Applicants amendment of “one of both” to “one or both” changed the breadth of the claim interpretation from requiring “adjustment of the amplitude of a single chunk selected from the group consisting of the chunks of the first audio file and chunks of the second audio file” (a single chunk from one of the two audio files) to requiring “adjustment of the amplitude of at least one chunk from each of the first audio file and the second audio file” (at least one chunk from each audio file). Sudo fails to expressly disclose adjusting the 
Regarding claim 5, the Applicant asserts that, because cross correlation can be used as a technique to align two signals in time, it cannot be used to teach “distance processing between the amplitude adjusted output of the first and second audio files.” Response, Page 14. The Examiner respectfully disagrees.
 Though the specification of the instant application recites that “cross correlation” is technique which can be used for time alignment of two audio signals, the claim as presented includes no such limitation.  The BRI of the phrase “distance processing,” as recited in claim 5, includes all available distance processing techniques, including distance processing techniques which can be used for time alignment, such as cross correlation.
Further, Canniff describes “comparing the output signal with a reference signal derived from the test signal using a cross correlation function” to “determine [signal] quality, including temporal clipping.” (Caniff, ¶ [0009]). Thus, Canniff discloses the use of cross correlation for both time-based processing and distance processing.
The Applicant further argues that, in light of the previous arguments, claims 1 and all pending claims are patentable over the cited references.   The Examiner respectfully disagrees with this assertion with the reasoning as mentioned above. The Applicant has not provided any further statement and therefore, the Examiner directs the Applicant to the below rationale.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:


The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
“a time alignment module to align in time first and second audio files;” (claim 12)
“an extraction module to divide the first audio file into chunks and to divide the second audio files into chunks that correspond to the chunks of the first audio file;” (claim 12) 
“an amplitude correction module to adjust an amplitude of one or both of the chunks of the first audio file and the second audio file and generate an amplitude adjusted output of the first and second audio files;” (claim 12) 
“a time-based processing module to perform time-based processing of the amplitude adjusted output of the first and second audio files to identify audio anomalies in the second audio file;” (claim 12) and 
“a frequency-based processing module to perform frequency-based processing of the amplitude adjusted output of the first and second audio files to identify audio anomalies in the second audio file.” (claim 12)

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claim 1, the claim limitation "adjusting the amplitude of one or both of the chunks of the first audio file and the second audio file" at pg. 13, line 7 is unclear.  In light of the or both” creates further unclear interpretations. In a first example, selecting “one” of the “one or both of the chunks…” creates an outcome where only one chunk, selected from either the first audio file or the second audio file, is amplitude adjusted. In a second example, selecting “one” of the “one or both of the chunks…” creates an outcome where all chunks of either the first audio file or the second audio file are selected and amplitude adjusted.  In either the first example or the second example, if only one chunk or only one group of chunks is selected, then it is unclear how the first audio file AND the second audio file are “amplitude adjusted” such that one can perform “distance processing between” them, as recited in the claim. Therefore the scope of claim is indefinite.  See MPEP 2173.05(p).
For examination purposes, the claim limitation "adjusting the amplitude of one or both of the chunks of the first audio file and the second audio file" in claim 1 will be read as requiring the adjustment of the amplitude of at least one chunk from each of the first audio file and the second audio file. To overcome this rejection, the Examiner recommends that the Applicant describe the results of the method and system occurring at each of the first audio file and the second audio file, individually. One example is presented below: 
Current claim 1: “…adjusting an amplitude of one or both of the chunks of the first audio file and the second audio file and generating an amplitude adjusted output of the first and second audio files…” 
Possible amended claim 1: “…adjusting an amplitude of the chunks of the first audio file and the chunks of the second audio file and generating an amplitude adjusted first audio file and an amplitude adjusted second audio file…”

Claims 2-11 are dependents of claim 1 and contain all the features of their independent claim, but fail to resolve the deficiencies of claim 1. Therefore, they are rejected for the same reasons as above.

Regarding claim 12, the claim limitation "adjust an amplitude of one or both of the chunks of the first audio file and the second audio file" at pg. 14, lines 23-24 is unclear.  In light of the fact that the claim further recites various processing of “the amplitude adjusted output of the first and second audio files,” the phrase “one or both” creates unclear interpretations. In a first example, selecting “one” of the “one or both of the chunks…” creates an outcome where only one chunk, selected from either the first audio file or the second audio file, is amplitude adjusted. In a second example, selecting “one” of the “one or both of the chunks…” creates an outcome where all chunks of either the first audio file or the second audio file are selected and amplitude adjusted.  In either the first example or the second example, if only one chunk or only one group of chunks is selected, then it is unclear how the first audio file AND the second audio file are “amplitude adjusted” such that one can perform “distance processing between” them, as recited in the claim. Therefore the scope of claim is indefinite.  See MPEP 2173.05(p).
For examination purposes, the claim limitation "adjust an amplitude of one or both of the chunks of the first audio file and the second audio file" in claim 12 will be read as requiring the adjustment of the amplitude of at least one chunk from each of the first audio file and the second audio file. To overcome this rejection, the Examiner recommends that the Applicant describe the results of the method and system occurring at each of the first audio file and the second audio file, individually. One example is presented below: 
Current claim 1: “…adjust an amplitude of one or both of the chunks of the first audio file and the second audio file and generate an amplitude adjusted output of the first and second audio files…” 
Possible amended claim 1: “…adjust an amplitude of the chunks of the first audio file and the chunks of the second audio file and generate an amplitude adjusted first audio file and an amplitude adjusted second audio file…”
 
Claims 13-18 are dependents of claim 12 and contain all the features of their independent claim, but fail to resolve the deficiencies of claims 12. Therefore, they are rejected for the same reasons as above.

Regarding claim 19, the claim limitation "adjusting the amplitude of one or both of the chunks of the first audio file and the second audio file" at pg. 15, lines 29-30 is indefinite.  In light of the fact that the claim further recites various processing of “the amplitude adjusted output of the first and second audio files,” the phrase “one or both” creates unclear interpretations. In a first example, selecting “one” of the “one or both of the chunks…” creates an outcome where only one chunk, selected from either the first audio file or the second audio file, is amplitude adjusted. In a second example, selecting “one” of the “one or both of the chunks…” creates an outcome where all chunks of either the first audio file or the second audio file are selected and amplitude adjusted.  In either the first example or the second example, if only one chunk or only one group of chunks is selected, then it is unclear how the first audio file AND the second audio file are “amplitude adjusted” such that one can perform “distance processing between” them, as recited in the claim. Therefore the scope of claim is indefinite.  See MPEP 2173.05(p).
For examination purposes, the claim limitation "adjusting the amplitude of one or both of the chunks of the first audio file and the second audio file" in claim 19 will be read as requiring the adjustment of the amplitude of at least one chunk from each of the first audio file and the second audio file. To overcome this rejection, the Examiner recommends that the Applicant describe the results of the method and system occurring at each of the first audio file and the second audio file, individually. One example is presented below: 
Current claim 1: “…adjusting an amplitude of one or both of the chunks of the first audio file and the second audio file and generate an amplitude adjusted output of the first and second audio files…” 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-4, 6, 9, 10, 12, and 14-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Canniff et al. (U.S. Pat. App. Pub. No. 2006/0265211, hereinafter Canniff) in view of Eppolito et al. (U.S. Pat. App. Pub. No. 2012/0195433, hereinafter Eppolito) and Baker et al. (U.S. Pat. App. Pub. No. 2014/0032973, hereinafter Baker).

Regarding claim 1, Canniff discloses a method, comprising: aligning in time first and second audio files (“Using the duration pattern of the silence periods in the reference signal, the speech burst are approximately aligned with the corresponding speech samples in the reference signal;” Canniff ¶ [0035]); dividing the first audio file into chunks (“For this additional search the Canniff, ¶ [0040]); dividing the second audio files into chunks that correspond to the chunks of the first audio file (“And, the most probable match speech sample is also subdivided into sub-frames of the same predetermined size;” Canniff, ¶ [0040]); …[and] performing time-based processing of the … output of the first and second audio files to identify audio anomalies in the second audio file (though not expressly disclosed as “amplitude adjusted”, the method includes “comparing the output signal with a reference signal derived from the test signal using a cross correlation function,” where cross correlation is described in the specification of the instant application as a “time-alignment technique”; Canniff, ¶ [0009]), wherein the time-based processing comprises distance processing between the … output of the first and second audio files (though not expressly disclosed as “amplitude adjusted,” the method includes “comparing the output signal with a reference signal derived from the test signal using a cross correlation function,” where cross correlation is an audio distortion processing technique; Canniff, ¶ [0009]). However, Canniff fails to expressly recite, wherein the chunks of the first and second audio files each comprise extracted words, adjusting an amplitude of one or both of the chunks of the first audio file and the second audio file and generating an amplitude adjusted output of the first and second audio files, performing frequency-based processing of the amplitude adjusted output of the first and second audio files to identify audio anomalies in the second audio file, and performing time-based processing of the amplitude adjusted output of the first and second audio files to identify audio anomalies in the second audio file, wherein the time-based processing comprises distance processing between the amplitude adjusted output of the first and second audio files.

Eppolito discloses systems and methods for determining the relationship between audio channels in a multi-channel audio file. (Eppolito, ¶ [0005]). Regarding claim 1, Eppolito discloses adjusting an amplitude of one or both of the chunks of the first audio file and the second audio file (The system discloses filtering the two channels for “particular frequency components… that are Eppolito, ¶ [0117]) and generating an amplitude adjusted output of the first and second audio files (the noise cancellation process generates “filtered channel data 1040.”; Eppolito, ¶ [0118]); performing time-based processing (The system can also compare the filtered audio signals using “cross correlation of channel X and channel Y in the time domain” to “determine pairing of audio channels.” Eppolito, ¶ [0153]; [0151]) of the amplitude adjusted output of the first and second audio files (“After performing noise filtering … on the selected audio channels, the process compares (at 930) the two audio channels…”, thus the signals are amplitude adjusted by noise filtering prior to time based processing; Eppolito, ¶ [0109]) to identify audio anomalies in the second audio file, wherein the time-based processing comprises distance processing between the amplitude adjusted output of the first and second audio files (The system uses cross correlation, a type of distance processing, to “measure the similarity between two waveforms as a function of a timing offset applied to one of the two waveforms,” where similarity includes detection of differences or anomalies; Eppolito, ¶¶ [0147], [0150]); and performing frequency-based processing (the system can perform a “Frequency domain correlation” which “is sometimes referred to as ‘phase correlation,’ thus performing frequency based processing of the amplitude adjusted output, Eppolito, ¶ [0150]) of the amplitude adjusted output of the first and second audio files (“After performing noise filtering … on the selected audio channels, the process compares (at 930) the two audio channels…”, thus the signals are amplitude adjusted by noise filtering prior to frequency based processing; Eppolito, ¶ [0109]) to identify audio anomalies in the second audio file (phase correlation compares two channels “by quantifying the degree of similarity between audio channels,” thus detecting anomalies; Eppolito, ¶ [0109]; [0103]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech transmission system of Canniff for analyzing speech transmission quality to incorporate the teachings of Oppolito to include adjusting an amplitude of one or both of the chunks of the first audio file and the second audio file and generating an amplitude adjusted output of the first and second audio files; performing time-based processing of the amplitude adjusted output of the first and second audio files to identify audio anomalies in the second audio file; and performing frequency-based processing of the amplitude adjusted output of the first and second audio files to identify audio anomalies in the second audio file; performing time-based processing of the amplitude adjusted output of the first and second audio files to identify audio anomalies in the second audio file, wherein the time-based processing comprises distance processing between the amplitude adjusted output of the first and second audio files. As recognized by Eppolito, there is need in the art for automatically determining the relationship between audio channels. (Eppolito, ¶ [0004]). However, Canniff and Eppolito fail to expressly disclose wherein the chunks of the first and second audio file comprise extracted words.

Baker teaches systems and methods for analyzing an audio sample, including comparison and analysis of structural blocks to a known reference sample. (Baker, ¶¶ [0134] and [0135]) Regarding claim 1, Baker teaches wherein the chunks of the first and second audio file comprise extracted words (“the system of pattern analysis may further be configured to operate with one or more of said sequences of sound units is …a word;” Baker, ¶ [0016]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech transmission system of Canniff for analyzing speech transmission quality and the teachings of Eppolito for adjusting amplitude audio files and performing frequency-based and time-based processing of the amplitude adjusted output, to incorporate the teachings of Baker to include wherein the chunks of the first audio file comprise extracted words. As recognized by Baker, “by performing multiple, redundant overlapping analyses with different operating characteristics” the pattern analysis can be made more “robust against errors, misalignments and failures of process”. (Baker, Abstract). 

Regarding claim 3, the rejection of claim 1 is incorporated. Canniff, Eppolito, and Baker disclose all of the elements of the current invention as stated above. However, Canniff and Eppolito fail to expressly disclose the method according to claim 1, wherein the chunks of the first audio file comprise extracted sentences.

The relevance of Baker is described above with relation to claim 1. Regarding claim 3, Baker teaches the method according to claim 1, wherein the chunks of the first audio file comprise Baker, ¶ [0016]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech transmission system of Canniff for analyzing speech transmission quality and the teachings of Eppolito for adjusting amplitude audio files and performing frequency-based and time-based processing of the amplitude adjusted output, to incorporate the teachings of Baker to include wherein the chunks of the first audio file comprise extracted words. As recognized by Baker, “by performing multiple, redundant overlapping analyses with different operating characteristics” the pattern analysis can be made more “robust against errors, misalignments and failures of process”. (Baker, Abstract).

Regarding claim 4, the rejection of claim 1 is incorporated. Canniff, Eppolito, and Baker disclose all of the elements of the current invention as stated above. However, Canniff and Eppolito fail to expressly disclose the method according to claim 1, wherein the chunks of the first audio file comprise extracted syllables.

The relevance of Baker is described above with relation to claim 1. Regarding claim 4, Baker teaches the method according to claim 1, wherein the chunks of the first audio file comprise extracted syllables (“the system of pattern analysis may further be configured to operate with one or more of said sequences of sound units is …a syllable;” Baker, ¶ [0016]).

Canniff for analyzing speech transmission quality and the teachings of Eppolito for adjusting amplitude audio files and performing frequency-based and time-based processing of the amplitude adjusted output, to incorporate the teachings of Baker to include wherein the chunks of the first audio file comprise extracted syllables. As recognized by Baker, “by performing multiple, redundant overlapping analyses with different operating characteristics” the pattern analysis can be made more “robust against errors, misalignments and failures of process”. (Baker, Abstract).

Regarding claim 6, the rejection of claim 5 is incorporated. Canniff further discloses the method according to claim 5, further including generating a time-based processing score (Canniff discloses the determination of a cross correlation result compared to a threshold for determining speech quality as a time-based processing score; Canniff, ¶ [0036]) 

Regarding claim 9, the rejection of claim 1 is incorporated. Canniff, Eppolito, and Baker disclose all of the elements of the current invention as stated above. However, Canniff and Eppolito fail to expressly disclose the method according to claim 1, wherein the identified audio anomalies comprise missed words in the second audio file.

The relevance of Baker is described above with relation to claim 1. Regarding claim 9, Baker further discloses the method according to claim 1, wherein the identified audio anomalies comprise missed words in the second audio file (“error detection and error correction capabilities Baker, ¶ [0127])

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech transmission system of Canniff for analyzing speech transmission quality and the teachings of Eppolito for adjusting amplitude audio files and performing frequency-based and time-based processing of the amplitude adjusted output, to incorporate the teachings of Baker to include wherein the identified audio anomalies comprise missed words in the second audio file. As recognized by Baker, “by performing multiple, redundant overlapping analyses with different operating characteristics” the pattern analysis can be made more “robust against errors, misalignments and failures of process”. (Baker, Abstract).

Regarding claim 10, the rejection of claim 1 is incorporated. Canniff, Eppolito, and Baker disclose all of the elements of the current invention as stated above. However, Canniff and Eppolito fail to expressly disclose the method according to claim 1, wherein the identified audio anomalies comprise distorted words.

The relevance of Baker is described above with relation to claim 1. Regarding claim 10, Baker further discloses the method according to claim 1, wherein the identified audio anomalies comprise distorted words (“Block 415 checks for missed detections and for situations in which a detected target does not match as well as it should” where the target for detection is “a few words,” thus missed detections can be distorted words; Baker, ¶¶ [0127] and [0250]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech transmission system of Canniff for analyzing speech transmission quality and the teachings of Eppolito for adjusting amplitude in audio files and performing frequency-based and time-based processing of the amplitude adjusted output, to incorporate the teachings of Baker to include wherein the identified audio anomalies comprise distorted words. As recognized by Baker, “by performing multiple, redundant overlapping analyses with different operating characteristics” the pattern analysis can be made more “robust against errors, misalignments and failures of process”. (Baker, Abstract).

Regarding claim 12, Canniff discloses a system (e.g., speech transmission system 100, Canniff, ¶ [0018]) comprising: a time alignment module to align in time first and second audio files (“Using the duration pattern of the silence periods in the reference signal, the speech burst are approximately aligned with the corresponding speech samples in the reference signal.”; Canniff ¶ [0035]); an extraction module to divide the first audio file into chunks ( “For this additional search the speech burst is subdivided into sub-frames of a predetermined size;” Canniff, ¶ [0040]) and to divide the second audio files into chunks that correspond to the chunks of the first audio file (“And, the most probable match speech sample is also subdivided into sub-frames of the same predetermined size;” Canniff, ¶ [0040]); … [and] a time-based processing module to perform time-based processing of the … output of the first and second audio files to identify audio anomalies in the second audio file (though not expressly disclosed as “amplitude adjusted”, the system includes “comparing the output signal with a reference signal derived from the test signal using a cross correlation function,” where cross correlation is a time-based process; Canniff, ¶ [0009]), wherein the time-based processing comprises distance processing between the … output of the first and second audio files (though not expressly disclosed as “amplitude adjusted,” the method includes “comparing the output signal with a reference signal Canniff, ¶ [0009]). However, Canniff fails to expressly recite wherein the chunks of the first and second audio files each comprise extracted words, an amplitude correction module to adjust an amplitude of one or both of the chunks of the first audio file and the second audio file and generate an amplitude adjusted output of the first and second audio files and a frequency-based processing module to perform frequency-based processing of the amplitude adjusted output of the first and second audio files to identify audio anomalies in the second audio file, and a time-based processing module to perform time-based processing of the amplitude adjusted output of the first and second audio files to identify audio anomalies in the second audio file, wherein the time-based processing comprises distance processing between the amplitude adjusted output of the first and second audio files.

The relevance of Eppolito is described above with relation to claim 1. Regarding claim 12, Eppolito discloses an amplitude correction module (noise filtering modules 840 and 845, Eppolito, ¶ [0104]) to adjust an amplitude of one or both of the chunks of the first audio file and the second audio file (The system discloses filtering the two channels for “particular frequency components… that are likely to contain noise” using a band-pass filter, where targeted band pass filtering, and all passive filters, are well known in the art to reduce the amplitude of the original signals; Eppolito, ¶ [0117]) and generating an amplitude adjusted output of the first and second audio files (the noise cancellation process generates “filtered channel data 1040.”; Eppolito, ¶ [0118]); a time-based processing module to perform time-based processing (The system can also compare the filtered audio signals using “cross correlation of channel X and channel Y in the time domain” to “determine pairing of audio channels.” Eppolito, ¶ [0153]; [0151]) of the amplitude adjusted output of the first and second audio files (“After performing noise filtering … on the selected audio channels, the process compares (at 930) the two audio channels…”, thus the signals are amplitude adjusted by noise filtering prior to time based processing; Eppolito, ¶ [0109]) Eppolito, ¶¶ [0147], [0150]); a frequency based processing module to perform frequency-based processing (the system can perform a “Frequency domain correlation” which “is sometimes referred to as ‘phase correlation,’ thus performing frequency based processing of the amplitude adjusted output, Eppolito, ¶ [0150]) of the amplitude adjusted output of the first and second audio files (“After performing noise filtering … on the selected audio channels, the process compares (at 930) the two audio channels…”, thus the signals are amplitude adjusted by noise filtering prior to frequency based processing; Eppolito, ¶ [0109]) to identify audio anomalies in the second audio file (phase correlation compares two channels “by quantifying the degree of similarity between audio channels,” thus detecting anomalies; Eppolito, ¶ [0109]; [0103]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech transmission system of Canniff for analyzing speech transmission quality to incorporate the teachings of Oppolito to include adjusting an amplitude of one or both of the chunks of the first audio file and the second audio file and generating an amplitude adjusted output of the first and second audio files; performing time-based processing of the amplitude adjusted output of the first and second audio files to identify audio anomalies in the second audio file; and performing frequency-based processing of the amplitude adjusted output of the first and second audio files to identify audio anomalies in the second audio file; performing time-based processing of the amplitude adjusted output of the first and second audio files to identify audio anomalies in the second audio file, wherein the time-based processing comprises distance processing between the amplitude adjusted output of the Eppolito, there is need in the art for automatically determining the relationship between audio channels. (Eppolito, ¶ [0004]). However, Canniff and Eppolito fail to expressly disclose wherein the chunks of the first and second audio file comprise extracted words.

The relevance of Baker is described above with relation to claim 1. Regarding claim 12, Baker teaches wherein the chunks of the first and second audio file comprise extracted words (“the system of pattern analysis may further be configured to operate with one or more of said sequences of sound units is …a word;” Baker, ¶ [0016]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech transmission system of Canniff for analyzing speech transmission quality and the teachings of Eppolito for adjusting amplitude audio files and performing frequency-based and time-based processing of the amplitude adjusted output, to incorporate the teachings of Baker to include wherein the chunks of the first audio file comprise extracted words. As recognized by Baker, “by performing multiple, redundant overlapping analyses with different operating characteristics” the pattern analysis can be made more “robust against errors, misalignments and failures of process”. (Baker, Abstract). 

Regarding claim 14, the rejection of claim 12 is incorporated. Claim 14 is substantially the same as claim 3 and is therefore rejected under the same rationale as above.

Regarding claim 15, the rejection of claim 12 is incorporated. Claim 15 is substantially the same as claim 4 and is therefore rejected under the same rationale as above.

Claims 7, 8, 11, 17, and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Canniff in view of Eppolito and Baker, as applied to claims 1 and 12 above, and in further view of Deshpande et al. (U.S. Pat. App. Pub. No. 2018/0038954, hereinafter Deshpande).

Regarding claim 7, the rejection of claim 1 is incorporated. Canniff, Eppolito, and Baker disclose all of the elements of the current invention as stated above. However, Canniff, Eppolito, and Baker fail to expressly disclose wherein the frequency-based processing comprises spectral power processing of the amplitude adjusted output of the first and second audio files.

Deshpande discloses methods and systems for anomaly detection in digital signals. (Deshpande, ¶ [0037]). Regarding claim 7, Deshpande discloses wherein the frequency-based processing comprises spectral power processing of the amplitude adjusted output of the first and second audio files (“In an embodiment, one or more features are extracted from the set of digital signals in at least one of a time domain and a frequency domain. The one or more extracted features comprises… power spectral density”; Deshpande, ¶ [0175]).

Canniff for analyzing speech transmission quality, the teachings of Eppolito for adjusting amplitude in audio files and performing frequency-based and time-based processing of the amplitude adjusted output, and the teachings of Baker regarding the contents of the processed audio files, to incorporate the teachings of Deshpande to include wherein the frequency-based processing comprises spectral power processing of the amplitude adjusted output of the first and second audio files. Spectral information can be used in conjunction with energy profiles of time domain and frequency domain features to detect an anomaly better than spectral information alone, as recognized by Deshpande. (Deshpande, ¶¶ [0042])

Regarding claim 8, the rejection of claim 7 is incorporated. Canniff, Eppolito, and Baker disclose all of the elements of the current invention as stated above. However, Canniff, Eppolito, and Baker fail to expressly disclose further including generating a frequency based processing score.

Regarding claim 8, Deshpande further discloses further including generating a frequency based processing score (Deshpande discloses that time and frequency domain processing can further include scoring as part of the fusion of the time and frequency domain features; Deshpande, ¶ [0042]).

Canniff for analyzing speech transmission quality and the teachings of Eppolito for adjusting amplitude audio files and performing frequency-based and time-based processing of the amplitude adjusted output, and the teachings of Baker regarding the contents of the processed audio files, to incorporate the teachings of Deshpande to further include generating a frequency based processing score. Spectral information can be used in conjunction with energy profiles of time domain and frequency domain features to detect an anomaly better than spectral information alone, as recognized by Deshpande. (Deshpande, ¶¶ [0042])

Regarding claim 11, the rejection of claim 1 is incorporated. Canniff, Eppolito, and Baker disclose all of the elements of the current invention as stated above. Canniff further discloses further including using the time-based processing score and/or the frequency based processing score to classify ones of the identified audio anomalies (though not expressly disclosing “time-based” or “frequency-based,” Canniff discloses quantification of a frame and classification based on the quantification as either silence or speech; Canniff ¶ [0034]). Canniff further discloses the method according to claim 1, wherein the time-based processing comprises distance processing between the amplitude adjusted output of the first and second audio files (the method includes “comparing the output signal with a reference signal derived from the test signal using a cross correlation function,” where cross correlation is an audio distortion processing technique; Canniff, ¶ [0009]) and generating a time-based processing score (Canniff discloses the determination of a cross correlation result compared to a threshold for determining Canniff, ¶ [0036]). However, Canniff, Eppolito, and Baker fail to expressly disclose wherein the frequency-based processing comprises spectral power processing of the amplitude adjusted output of the first and second audio files and generating a frequency based processing score.

The relevance of Deshpande is described above with relation to claim 7. Regarding claim 11, Deshpande further discloses wherein the frequency-based processing comprises spectral power processing of the amplitude adjusted output of the first and second audio files (“In an embodiment, one or more features are extracted from the set of digital signals in at least one of a time domain and a frequency domain. The one or more extracted features comprises… power spectral density”; Deshpande, ¶ [0175]) and generating a frequency based processing score (Deshpande discloses that time and frequency domain processing can further include scoring as part of the fusion of the time and frequency domain features; Deshpande, ¶ [0042]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech transmission system of Canniff for analyzing speech transmission quality and the teachings of Eppolito for adjusting amplitude audio files and performing frequency-based and time-based processing of the amplitude adjusted output, and the teachings of Baker regarding the contents of the processed audio files, to incorporate the teachings of Deshpande to further include wherein the frequency-based processing comprises spectral power processing of the amplitude adjusted output of the first and second audio files. Spectral information can be used in conjunction with energy profiles of time domain and frequency Deshpande. (Deshpande, ¶¶ [0042]).

Regarding claim 17, the rejection of claim 12 is incorporated. Claim 17 is substantially the same as claim 7 and is therefore rejected under the same rationale as above.

Regarding claim 18, the rejection of claim 12 is incorporated. Claim 18 is substantially the same as claim 11 and is therefore rejected under the same rationale as above.

Claim 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Canniff in view of Eppolito.

Regarding claim 19, Canniff discloses a system (e.g., speech transmission system 100, Canniff, ¶ [0018]) comprising: a time alignment means to align in time first and second audio files (“Using the duration pattern of the silence periods in the reference signal, the speech burst are approximately aligned with the corresponding speech samples in the reference signal.”; Canniff ¶ [0035]); an extraction means to divide the first audio file into chunks (“For this additional search the speech burst is subdivided into sub-frames of a predetermined size;” Canniff, ¶ [0040]) and to divide the second audio files into chunks that correspond to the chunks of the first audio file (“And, the most probable match speech sample is also subdivided into sub-frames of the same predetermined size;” Canniff, ¶ [0040]); … [and] a time-based processing means to perform time-based processing of the amplitude adjusted output of the first and second audio files to identify audio anomalies in the second audio file (though not expressly disclosed as “amplitude adjusted”, the method includes “comparing the output signal with a reference signal derived from the test signal using a cross correlation function,” where cross correlation is described in the specification Canniff, ¶ [0009]). However, Canniff fails to expressly recite an amplitude correction means to adjust an amplitude of one or both of the chunks of the first audio file and the second audio file and generate an amplitude adjusted output of the first and second audio files and a frequency-based processing means to perform frequency-based processing of the amplitude adjusted output of the first and second audio files to identify audio anomalies in the second audio file.

The relevance of Eppolito is described above with relation to claim 1. Regarding claim 19, Eppolito discloses an amplitude correction means (noise filtering modules 840 and 845, Eppolito, ¶ [0104]) for adjusting an amplitude of one or both of the chunks of the first audio file and the second audio file (The system discloses filtering the two channels for “particular frequency components… that are likely to contain noise” using a band-pass filter, where targeted band pass filtering, and all passive filters, are well known in the art to reduce the amplitude of the original signals; Eppolito, ¶ [0117]) and generating an amplitude adjusted output of the first and second audio files (the noise cancellation process generates “filtered channel data 1040.”; Eppolito, ¶ [0118]); a time-based processing means for performing time-based processing (The system can also compare the filtered audio signals using “cross correlation of channel X and channel Y in the time domain” to “determine pairing of audio channels.” Eppolito, ¶ [0153]; [0151]) of the amplitude adjusted output of the first and second audio files (“After performing noise filtering … on the selected audio channels, the process compares (at 930) the two audio channels…”, thus the signals are amplitude adjusted by noise filtering prior to time based processing; Eppolito, ¶ [0109]) to identify audio anomalies in the second audio file (cross correlation compares two channels “by quantifying the degree of similarity between audio channels,” thus detecting anomalies; Eppolito, ¶ [0109]; [0103]), wherein the time-based processing comprises distance processing between the amplitude adjusted output of the first and second audio files (The system uses cross correlation, a type of distance processing, to “measure the similarity between two waveforms as a function of Eppolito, ¶¶ [0147], [0150]); a frequency based processing means for performing frequency-based processing (the system can perform a “Frequency domain correlation” which “is sometimes referred to as ‘phase correlation,’ thus performing frequency based processing of the amplitude adjusted output, Eppolito, ¶ [0150]) of the amplitude adjusted output of the first and second audio files (“After performing noise filtering … on the selected audio channels, the process compares (at 930) the two audio channels…”, thus the signals are amplitude adjusted by noise filtering prior to frequency based processing; Eppolito, ¶ [0109]) to identify audio anomalies in the second audio file (phase correlation compares two channels “by quantifying the degree of similarity between audio channels,” thus detecting anomalies; Eppolito, ¶ [0109]; [0103]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech transmission system of Canniff for analyzing speech transmission quality to incorporate the teachings of Oppolito to include adjusting an amplitude of one or both of the chunks of the first audio file and the second audio file and generating an amplitude adjusted output of the first and second audio files; performing time-based processing of the amplitude adjusted output of the first and second audio files to identify audio anomalies in the second audio file; and performing frequency-based processing of the amplitude adjusted output of the first and second audio files to identify audio anomalies in the second audio file; performing time-based processing of the amplitude adjusted output of the first and second audio files to identify audio anomalies in the second audio file, wherein the time-based processing comprises distance processing between the amplitude adjusted output of the first and second audio files. As recognized by Eppolito, there is need in the art for automatically determining the relationship between audio channels. (Eppolito, ¶ [0004]).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627.  The examiner can normally be reached on 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SES/Patent Examiner, Art Unit 2657                                                                                                                                                                                                        
/Paras D Shah/Primary Examiner, Art Unit 2659                                                                                                                                                                                                        
01/15/2021