DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1 – 4, 6, 11 – 14, 16, 21 rejected under 35 U.S.C. 102(a)(1)/(a)(2) as being anticipated by Bharitkar (WIPO Patent Application Publication WO2019/160556).

Regarding Claim 1, Bharitkar discloses:
A computer-implemented method, executed on a computing device (para 0015, 0064, Figs. 8, 10: Encoded features and rate-based augmentation based speech authentication apparatuses, methods for encoded features and rate-based augmentation based speech authentication, and non-transitory computer readable media having stored thereon machine readable instructions to provide encoded features and rate-based augmentation based speech authentication are disclosed. Figures 8-10 respectively illustrate an example block diagram 800, an example flowchart of a method 900, and a further example block diagram 1000 for encoded features and rate-based augmentation based speech authentication. The block diagram 800, the method 900, and the block diagram 1000 may be implemented on the apparatus 100 [computing device] described above with reference to Figure 1. Figure 8 shows hardware of the apparatus 100 that may execute the instructions of the block diagram 800. The hardware may include a processor 802, and a memory 804 i.e., a non-transitory computer readable medium, storing machine readable instructions that when executed by the processor 802 cause the processor to perform the instructions of the block diagram 800.), comprising:
receiving feature-based voice data associated with a first acoustic domain (Figs. 1, 2, 3; para 0023-25, 0037-38, 0018, 0039: Referring to Figure 1, the apparatus 100 may include a registration module 102 that utilizes a feature extraction module 104 to extract a plurality of features of a registration speech signal 106 for a user 108, received by apparatus 100, that is to be registered. The feature extraction module 104 may extract the plurality of features of speech signal 106, thus received speech signal 106 [voice data] is feature-based and includes a fundamental frequency, formants and gradients, for utilization by apparatus 100; the acoustic domain would be the associated environment [domain] in which the registration speech signal 106, the speech of the user, is generated and is acoustically affected by the user, and the components of apparatus 100 including the microphone used to capture the voice of the user 108 in the environment); and
performing one or more rate-based augmentations on at least a portion of the feature-based voice data, thus defining rate-based augmented feature-based voice data (Figs.1, 2, 3; para 0015, 0022, 0026: apparatus 100 provides for speech authentication based on the use of features extracted at different speech rates, where the speech may be synthesized artificially, at different rates [one or more rate-based augmentations], to form the basis for speech augmentation to a machine learning model. The original speech and the rate-adjusted speech that may be designated augmented speech [defining rate-based augmented feature-based voice data] may be encoded, prior to training of the machine learning model. Figures 1, 2, 3 illustrate a layout of an encoded features and rate-based augmentation based speech authentication apparatus 100 using speech rate modification module 114 to modify a speech rate of the registration speech signal 106 to generate a rate-adjusted speech signal 116.).

Regarding Claim 2, in addition to the elements stated above regarding claim 1, Bharitkar further discloses:
receiving a selection of a target acoustic domain (para 0001: An example of such factors includes ambient noise when the speech based authentication is being utilized. Another example of such factors includes differences in the condition of the user during a registration phase during which the user enrolls with the device for speech authentication, and an authentication (e.g., verification) phase during which the user utilizes the speech authentication feature to gain access to the device. With respect to differences in the condition of the speaker, examples of such differences include how the user speaks, for example slower or faster, health of the user, etc.; para 0020: Further, the machine learning model may be trained to accommodate speech rate variations for registered users to build robustness against speech-rate variations.; para 0043: Additional features may be designed for the same captured speech signal during registration, but by changing the speech rate in order to create robustness against speech rate variations which may occur during the subsequent authentication (or verification) stage. The registration speech signal 106 may be rate adjusted by p percent (where p = 0 percent is speech at a normal spoken rate, p < 0 percent represents speech at a slower rate than the spoken speech, and p > 0 percent represents speech at a faster rate than the spoken speech), thus the target acoustic domain may be based upon [selection] the speech rate, for example, a slower speech rate may be a target domain for a person in poor health).

Regarding Claim 3, in addition to the elements stated above regarding claim 2, Bharitkar further discloses:
wherein performing the one or more rate-based augmentations to the at least a portion of the feature-based voice data includes performing the one or more rate-based augmentations to the at least a portion of the feature-based voice data based upon, at least in part, the target acoustic domain (Fig. 1, para 0001: An example of such factors includes ambient noise when the speech based authentication is being utilized. Another example of such factors includes differences in the condition of the user during a registration phase during which the user enrolls with the device for speech authentication, and an authentication (e.g., verification) phase during which the user utilizes the speech authentication feature to gain access to the device. With respect to differences in the condition of the speaker, examples of such differences include how the user speaks, health of the user, etc.; para 0015: The apparatuses, methods, and non-transitory computer readable media disclosed herein provide for speech authentication based on the use of features extracted at different speech rates, where the speech may be synthesized artificially, at different rates, to form the basis for speech augmentation to a machine learning model. The original speech and the rate-adjusted speech that may be designated augmented speech may be encoded, prior to training of the machine learning model.; para 0022: Figure 1 illustrates an example layout of an encoded features and rate based augmentation based speech authentication apparatus; para 0026: A speech rate modification module 114 may modify a speech rate of the registration speech signal 106 to generate a rate-adjusted speech signal 116.; para 0043: Additional features may be designed for the same captured speech signal during registration, but by changing the speech rate in order to create robustness against speech rate variations which may occur during the subsequent authentication (or verification) stage. The registration speech signal 106 may be rate adjusted by p percent (where p = 0 percent is speech at a normal spoken rate, p < 0 percent represents speech at a slower rate than the spoken speech, and p > 0 percent represents speech at a faster rate than the spoken speech), thus the target acoustic domain may be based upon the speech rate and used to perform rate-based augmentation of the speech accordingly, for example, a slower speech rate may be a target domain for a person in poor health, and a slower rate can be used for any augmentation).

Regarding Claim 4, in addition to the elements stated above regarding claim 1, Bharitkar further discloses:
wherein performing the one or more rate-based augmentations to the at least a portion of the feature-based voice data includes decreasing a phoneme-rate of at least a portion of the feature-based voice data (Fig. 1, para 0015: The apparatuses, methods, and non-transitory computer readable media disclosed herein provide for speech authentication based on the use of features extracted at different speech rates, where the speech may be synthesized artificially, at different rates, to form the basis for speech augmentation to a machine learning model. The original speech and the rate-adjusted speech that may be designated augmented speech may be encoded, prior to training of the machine learning model.; para 0026: A speech rate modification module 114 may modify a speech rate of the registration speech signal 106 to generate a rate-adjusted speech signal 116. In this regard, the speech rate modification module 114 may modify the speech rate of the registration speech signal 106 by p < 0 percent to perform time dilation on the registration speech signal 106 and p > 0 percent to perform time compression on the registration speech signal 106, where p represents a percentage; para 0043: That is, the registration speech signal 106 may be rate adjusted by p percent for time dilation when p < 0 (e.g., the rate of the registration speech signal 106 may be slowed down by p percent) and time compression when p > 0 (e.g., the rate of the registration speech signal 106 may be made faster by p percent) for slowing or increasing the speech rate without perceptibly changing the "color'' (e.g., any artifacts such as clicks, metallic sounds, or any sounds that make the speech sound unnatural) of the speech signal. [It is noted Applicant’s disclosure para 00103-00104 describe the phoneme rate at the rate of speaking, where how fast or slow the speaker is speaking determines the phoneme rate; therefore, the applicant equates that the modifications made to augment the speech rate though an increase or decrease would be the equivalent of increasing or decreasing the phoneme-rate]).

Regarding Claim 6, in addition to the elements stated above regarding claim 1, Bharitkar further discloses:
wherein performing the one or more rate-based augmentations to the at least a portion of the feature-based voice data includes increasing a phoneme-rate of at least a portion of the feature-based voice data (Fig. 1, para 0015: The apparatuses, methods, and non-transitory computer readable media disclosed herein provide for speech authentication based on the use of features extracted at different speech rates, where the speech may be synthesized artificially, at different rates, to form the basis for speech augmentation to a machine learning model. The original speech and the rate-adjusted speech that may be designated augmented speech may be encoded, prior to training of the machine learning model.; para 0026: A speech rate modification module114 may modify a speech rate of the registration speech signal 106 to generate a rate-adjusted speech signal 116. In this regard, the speech rate modification module 114 may modify the speech rate of the registration speech signal 106 by p < 0 percent to perform time dilation on the registration speech signal 106 and p > 0 percent to perform time compression on the registration speech signal 106, where p represents a percentage.; para 0043: That is, the registration speech signal 106 may be rate adjusted by p percent for time dilation when p < 0 (e.g., the rate of the registration speech signal 106 may be slowed down by p percent) and time compression when p > 0 (e.g., the rate of the registration speech signal 106 may be made faster by p percent) for slowing or increasing the speech rate without perceptibly changing the"color" (e.g.,  any artifacts such as clicks, metallic sounds, or any sounds that make the speech sound unnatural) of the speech signal. [It is noted Applicant’s disclosure para 00103-00104 describe the phoneme rate at the rate of speaking, where how fast or slow the speaker is speaking determines the phoneme rate; therefore, the applicant equates that the modifications made to augment the speech rate though an increase or decrease would be the equivalent of increasing or decreasing the phoneme-rate]).

Claim 11 is rejected under the same grounds stated above for Claim 1.

Claim 12 is rejected under the same grounds stated above for Claim 2.

Claim 13 is rejected under the same grounds stated above for Claim 3.

Claim 14 is rejected under the same grounds stated above for Claim 4.

Claim 16 is rejected under the same grounds stated above for Claim 6.

Claim 21 is rejected under the same grounds stated above for Claim 1.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 5, 7 – 10, 15, 17 - 20 rejected under 35 U.S.C. 103 as being unpatentable over Bharitkar in view of Abiko et al. (U.S. Patent Application Publication 2001/0047267) hereinafter Abiko.

Regarding Claim 5, in addition to the elements stated above regarding claim 4, Bharitkar does not explicitly disclose:
wherein decreasing a phoneme-rate of at least a portion of the feature-based voice data includes adding one or more frames to the feature-based voice data.
However, in a related field of endeavor (i.e. decrease speech speed, adding frames) Abiko teaches (para 0037, 0061) when speech speed [rate] is decreased, frames are inserted [adding] to the audio data. In addition, Bharitkar apparatus 100 is enabled (para 0040) to manipulate the speech data at a frame level. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teaching of Abiko to Bharitkar’s apparatus 100 to allow it to add a frame when decreasing the rate of the speech thus providing an enhanced user experience by allowing the user to listen to outputted data as if the data were reproduced at a desired speed, as well as providing a simpler implementation by use of the well-known and proven technique of frame insertion (Abiko para 0061).

Regarding Claim 7, in addition to the elements stated above regarding claim 6, Bharitkar does not explicitly disclose:
wherein increasing a phoneme-rate of at least a portion of the feature-based voice data includes dropping one or more frames from the feature-based voice data.
However, in a related field of endeavor (i.e. increase speech speed, thinning out frames) Abiko teaches (para 0037, 0061) when speech speed [rate] is increased, frames are thinned out [dropping frame] of the audio data. In addition, Bharitkar apparatus 100 is enabled (para 0040) to manipulate the speech data at a frame level. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teaching of Abiko to Bharitkar’s apparatus 100 to allow it to drop a frame when increasing the rate of the speech thus providing an enhanced user experience by allowing the user to listen to outputted data as if the data were reproduced at a desired speed, as well as providing a simpler implementation by use of the well-known and proven technique of thinning frames (Abiko para 0061).

Regarding Claim 8, in addition to the elements stated above regarding claim 2, Bharitkar further discloses:
training a machine learning model based upon, at least in part, the target acoustic domain (Fig. 1; para 0001: factors includes differences in the condition of the user during a registration phase during which the user enrolls with the device for speech authentication, and an authentication (e.g., verification) phase during which the user utilizes the speech authentication feature to gain access to the device. With respect to differences in the condition of the speaker, examples of such differences include how the user speaks, for example slower or faster, health of the user, etc.; para 0020: Further, the machine learning model may be trained to accommodate speech rate variations for registered users to build robustness against speech-rate variations.; para 0043: Additional features may be designed for the same captured speech signal during registration, but by changing the speech rate in order to create robustness against speech rate variations which may occur during the subsequent authentication (or verification) stage. The registration speech signal 106 may be rate adjusted by p percent (where p = 0 percent is speech at a normal spoken rate, p < 0 percent represents speech at a slower rate than the spoken speech, and p > 0 percent represents speech at a faster rate than the spoken speech).; para 0045: At block 312, with respect to dynamic time warping performed by the dynamic time warping module 118, dynamic time warping may be applied between the features derived from the registration speech signal 106 as well as the features obtained from the rate-adjusted speech signal 116 .The warped features may serve as augmented data for training the machine learning model 122. Thus, the use of rate- change by the speech rate modification module 114, and the dynamic time warping may ensure that the machine learning model input is made substantially invariant to rate changes of speech so that during authentication, if the user 108 changes the speech rate (e.g., time-dilating or time-compressing certain words), the machine learning model 122 may capture these variances; para 0046: At block 314, the speech signal from 304 (i.e., the registration speech signal 106) maybe rate adjusted using the speech rate modification module 114, which implements a speech rate adjustment model. The resulting signal may be denoted the rate-adjusted speech signal 116. As discussed above, the registration speech signal 106 may be rate adjusted by p percent for time dilation when p < 0 and time compression when p > 0 for slowing or increasing the speech rate without perceptibly changing the "color" of the speech signaI. Thus, the target acoustic domain may be based upon the speech rate and used as training data for training the machine learning model [training a machine learning based upon, at least in part, the target acoustic domain], for example, a slower speech rate may be a target domain for a person in poor health).
Bharitkar does not explicitly disclose the machine learning model, when trained, to one or more of [examiner notes the following limitations “add” or “remove” are claimed in the alternative] add at least one frame to the feature-based voice data and remove at least one frame from the feature- based voice data.
However, in a related field of endeavor (i.e. adding frames) Abiko teaches (para 0037, 0061) when speech speed [rate] is decreased or increased, frames are inserted [add] to the audio data or thinned [remove] from the audio data. In addition, Bharitkar apparatus 100 is enabled (para 0040) to manipulate the speech data at a frame level. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teaching of Abiko to Bharitkar’s apparatus 100 to train the learning model to add or thin a frame based on the target acoustic domain of decreasing or increasing the rate of the speech thus providing an enhanced model by allowing the user to listen to outputted data as if the data were reproduced at a desired speed, as well as providing a simpler implementation by use of the well-known and proven technique of frame insertion (Abiko para 0061).

Claim 9 is rejected under the same grounds stated above for Claim 8.

Regarding Claim 10, in addition to the elements stated above regarding claim 9, the combination does not explicitly disclose:
wherein the trained machine learning model is configured to perform smoothing of the feature-based voice data when one or more of adding at least one frame to the feature-based voice data and removing at least one frame from the feature-based voice data.
However, in a related field of endeavor (i.e. scale factor conversion [smoothing]) Abiko teaches (para 0037, 0054-55, 0082-83) mitigating the discontinuous jump [smoothing] of the acoustic pressure that can occur at a joint between frames, for example after thinning out frames, by using scale factor conversion at the joint of frames. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teaching of Abiko to the combination’s trained machine learning model to allow it to mitigate the discontinuous jump [smoothing] of the acoustic pressure that can occur at a joint between frames, for example after thinning out frames, thus providing an enhanced user listening experience, by reducing annoying noise for the user who listens to the reproduction sound, by mitigating the audio discontinuity (Abiko para 0082-83).

Claim 15 is rejected under the same grounds stated above for Claim 5.

Claim 17 is rejected under the same grounds stated above for Claim 7.

Claim 18 is rejected under the same grounds stated above for Claim 8.

Claim 19 is rejected under the same grounds stated above for Claim 8.

Claim 20 is rejected under the same grounds stated above for Claim 10.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID F SIEGEL whose telephone number is (571)272-5715. The examiner can normally be reached M-W 6:30am - 3pm, Th-F 7am-3:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fan Tsang can be reached on 571-272-7547. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DAVID SIEGEL/Examiner, Art Unit 2653                                                                                                                                                                                                        
/FAN S TSANG/Supervisory Patent Examiner, Art Unit 2653