DETAILED ACTION

Introduction

1.	This office action is in response to Applicant's submission filed on 05/20/2019. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . The Applicants had made a preliminary amendment to the claims in which claims 3-10, 13-17 and 22 were amended while claims 18-20 and 23-25 were cancelled. Thus, claims 1-17 and 21-22 are currently pending and examined below. 

Drawings

2.	The drawings filed on 05/20/2019 have been accepted and considered by the Examiner. 

Information Disclosure Statement

3.	The Information Statement (IDS) filed on 05/20/2019 has been accepted and considered in this office action and is in compliance with the provisions of 37 CFR 1.97.


Priority

4.	The Applicants priority to UK Application # GB 1808118.2, filed on May 18, 2018, has been accepted and considered in this office action. 

Claim Objections

5.	Claims 2-10 and 12-17 objected to because of the following informalities: these dependent claims begin with “A system” or “A method”. This seems to be a typographical error. The claims should begin with “The system” or “The method” as they are either dependent on claim 1, 11 or 13.   Appropriate correction is required.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

6.	Claims 1, 5, 8, 11 and 21-22 are rejected under 35 U.S.C. 103 as being unpatentable over Senior (U.S. Patent Application Publication # 2017/0011738 A1) in view of Moreno (U.S. Patent Application Publication # 2016/0217367 A1). Both Senior and Moreno are of record as they have been disclosed by the Applicants in their IDS filed on 05/20/2019.

With regards to claim 1, Senior teaches a speaker recognition system comprising at least one microphone operable to output data representing speech of a speaker and a controller operable to receive the data output from the at least one microphone (Figure 1, shows a speaker inputting audio data into a cellphone which inherently includes a microphone);

Figures 1-2 and para 54, teach that the input audio data is processes for speech recognition. Figures 1-2 and paragraphs 38-40, teach that the training state of the first neural network 120 serves as the source of output target information for training the second neural network 130);

wherein the first artificial neural network comprises fewer layers and/or fewer parameters than the second artificial neural network (Figures 1-2 and paragraphs 38-40, teach that the second neural network 130 may be smaller than the first neural network 120, having many fewer parameters, e.g., weights for neurons, or connections between neurons);

and wherein the first artificial neural network is configured to emulate a result derivable using an output of the second artificial neural network (Figures 1-2 and paragraphs 38-40, teach that the second neural network 130 can be trained directly to match or approximate the CTC output distributions produced by the first neural network 120);

Although Senior teaches that the input audio data is processes for speech recognition (Figures 1-2 and para 54), it may not explicitly detail that the second artificial neural network is trained to perform speaker recognition and identify the Moreno (Para 19 and figure 1, teach a speech recognition system 100 that provides the same subset of parameters to each of the nodes during training and runtime. The output of the neural network may be used to determine whether a particular keyword or key phrase was spoken, whether an utterance was spoken by a particular speaker, or both). 

Senior and Moreno can be considered as analogous art as they belong to a similar field of endeavor in audio processing by artificial neural networks. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Moreno (Use of artificial neural networks for speaker recognition) with those of Senior (Use of artificial neural networks for speech recognition) so as to provide a system with small memory footprint, low latency, low battery consumption, and/or low computational cost (Moreno, para 8). 

With regards to claims 11 and 21, these are method claims for the corresponding apparatus claim 1. These two sets of claims are related as method and apparatus of using the same, with each claimed system element's function corresponding to the claimed method step. Accordingly, claims 11 and 21 are similarly rejected under the same rationale as applied above with respect to apparatus claim 1.

claim 22, this is a computer readable medium (CRM) claim for the corresponding apparatus claim 1. These two claims are related as CRM and apparatus of using the same, with each claimed system element's function corresponding to the claimed CRM step. Accordingly, claim 22 is similarly rejected under the same rationale as applied above with respect to apparatus claim 1.

With regards to claim 5, Senior may not explicitly detail the limitation wherein the controller is configured to transmit a control signal to control operation of a 2device in the environment around the speaker recognition system dependent on the identifying of the speaker. However, Moreno teaches this (Figure 7 and para 101, teach that the processors 712 may be coupled to a speaker 702 and a microphone 704 that allow output and input of audio signals to and from an area that physically surrounds the computing device 700, while para 106, teaches the use of the signal to activate a device using a high speech controller as outlined in para 121). 

Senior and Moreno can be considered as analogous art as they belong to a similar field of endeavor in audio processing by artificial neural networks. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Moreno (Use of speaker recognition for control operations) with those of Senior (Use of artificial neural networks for speech Moreno, para 8). 

With regards to claim 8, Senior teaches the speaker recognition system according to claim 1, wherein the first artificial neural network has been trained based on embedding vectors generated by an intermediate layer of the second artificial neural network (Paragraphs 44-46 and figure 1, teach that for the same particular training utterance, the computing system 110 also generates a sequence of input vectors that are provided to the second neural network 130. The second neural network 130 may include a different sized input layer compared to the first neural network 120, and so may accept a different number of input values).

7.	Claims 2-3 are rejected under 35 U.S.C. 103 as being unpatentable over Senior in view of Moreno and further in view of Li (U.S. Patent Application Publication # 2016/0078339 A1). Li is also of record as it has been disclosed by the Applicants in their IDS filed on 05/20/2019.

With regards to claim 2, Senior and Moreno may not explicitly detail the limitation wherein the first artificial neural network comprises a student artificial neural network, and the second artificial neural network comprises a teacher artificial neural network. However, Li teaches this (Figure 5, discloses the teaching of the student neural network based on the outputs of the teacher model. Para 52, para 56, discloses the training of the student network to approximate the output of the teacher network).

Senior, Moreno and Li can be considered as analogous art as they belong to a similar field of endeavor in audio processing by artificial neural networks. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Li (Use of teacher and student artificial neural networks in speech processing) with those of Senior and Moreno (Use of artificial neural networks for speech processing) so as to provide a more accurate student deep neural network (DNN) by learning the DNN model from an even larger "giant" teacher DNN (Li, para 6). 

With regards to claim 3, Senior and Moreno may not explicitly detail the limitation wherein the speaker recognition system comprises a memory, and wherein the speaker recognition system is configured to store the first artificial neural network in the memory. However, Li teaches this (Figure 1 and para 5, disclose storage capabilities and the deployment of a reduced size i.e. student network, on devices such as smart phones).

Senior, Moreno and Li can be considered as analogous art as they belong to a similar field of endeavor in audio processing by artificial neural networks. It would thus have been obvious to one having ordinary skill in the art to advantageously Li, para 6). 

8.	Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Senior in view of Moreno and further in view of Gomez (U.S. Patent Application Publication # 2016/0098646 A1). 

With regards to claim 4, Senior and Moreno may not explicitly detail the limitation wherein the second artificial neural network is not stored in the speaker recognition system. However, Gomez teaches this (Figures 1 and 6, show speech recognition process performed in a distributed system wherein a DNN located remotely from the user device).

Senior, Moreno and Gomez can be considered as analogous art as they belong to a similar field of endeavor in audio processing by artificial neural networks. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Gomez (Storage of a DNN located remotely from the user device) with those of Senior and Moreno (Use of artificial neural networks for speech processing) so that the system may perform more Gomez, para 12). 

9.	Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Senior in view of Moreno and further in view of Khoury (U.S. Patent Application Publication # 2018/0082691 A1). 

With regards to claim 6, Senior and Moreno may not explicitly detail the limitation wherein the speaker recognition system is configured to perform text-independent speaker recognition. However, Khoury teaches this (Figures 4-5 and paragraphs 52-53, teach a DNN based speaker recognition system that improves conventional text independent speaker recognition systems by the addressing the duration variability of both the enrollment and test speech utterances via employing a statistical normalizer).

Senior, Moreno and Khoury can be considered as analogous art as they belong to a similar field of endeavor in audio processing by artificial neural networks. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Khoury (Use of a DNN based text independent speaker recognition system) with those of Senior and Moreno (Use of artificial neural networks for speech processing) so that the system removes undesired variability in the distribution of the i-vectors from the same speaker (Khoury, para 52). 

With regards to claim 16, Senior and Moreno may not explicitly detail the limitation wherein the processing the received data comprises deriving a set of mel-frequency cepstral coefficients from the received data, and inputting the derived set of mel-frequency cepstral coefficients to the first artificial neural network. However, Khoury teaches this (Para 6, teaches a speaker recognition systems including a feature extractor, where features such as Mel-frequency cepstral components or MFCCs are extracted and normalized. Para 57 and figure 4, teach a non-limiting configuration for the DNN used includes an input layer of dimension 61440, which is equal to 1024 Gaussians by 60-dimensional MFCCs).

Senior, Moreno and Khoury can be considered as analogous art as they belong to a similar field of endeavor in audio processing by artificial neural networks. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Khoury (Use of a DNN based text independent speaker recognition system with MFCCs as input) with those of Senior and Moreno (Use of artificial neural networks for speech processing) so that the system removes undesired variability in the distribution of the i-vectors from the same speaker (Khoury, para 52). 

10.	Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Senior in view of Moreno and further in view of Bak (U.S. Patent Application Publication # 2012/0313849 A1). 

With regards to claim 7, Senior and Moreno may not explicitly detail the limitation wherein the speech is between 2 and 10 seconds in duration. However, Bak teaches this (Para 135, teaches a voice recognition system wherein the input voice of a user has a duration of 2 seconds).

Senior, Moreno and Bak can be considered as analogous art as they belong to a similar field of endeavor in audio processing by artificial neural networks. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Bak (speech input is between 2 and 10 seconds in duration) with those of Senior and Moreno (Use of artificial neural networks for speech processing) as this is notoriously well known as the input duration of a microphone used in a speech processing system. It could also be an obvious matter of design choice since the applicant has not disclosed that said input duration of speech solves any stated problem or is for any particular purpose.

11.	Claims 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Senior in view of Moreno and further in view of Huang (U.S. Patent Application Publication # 2018/0293988 A1). 


With regards to claim 9, Senior and Moreno may not explicitly detail the limitation wherein the at least one microphone comprises an array of microphones, and wherein the controller is configured to process the received data from the array of microphones using a beamforming function. However, Huang teaches this (Paragraphs 22-23, 33, 35, 49-51 and figures 1-2, teach a speaker recognition method including a microphone array featuring a beam-former that can also use a neural network such as a DNN).

Senior, Moreno and Huang can be considered as analogous art as they belong to a similar field of endeavor in audio processing by artificial neural networks. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Huang (Use of a microphone array featuring a beam-former) with those of Senior and Moreno (Use of artificial neural networks for speech processing) so as to reduce background noise room reverberation in the input speech (Huang, para 33).

With regards to claim 10, Senior and Moreno may not explicitly detail the limitation wherein the speaker recognition system is comprised in a robot. However, Huang teaches this (Paragraphs 22-23, 33, 35, 49-51 and figures 1-2, teach a speaker recognition method including a microphone array featuring a beam-former that can also use a neural network such as a DNN. Para 24 and figure 1, further teach that the speaker recognition system may be comprised in a robot).

Senior, Moreno and Huang can be considered as analogous art as they belong to a similar field of endeavor in audio processing by artificial neural networks. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Huang (Speaker recognition system comprised in a robot) with those of Senior and Moreno (Use of artificial neural networks for speech processing) so as to allow low-power, low-storage interactive devices having a relatively small form factor, a small physical size and capable of moving around to be provided with speaker recognition functionality (Huang, paragraphs 24 and 38).

12.	Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Senior in view of Moreno and further in view of Kim (U.S. Patent Application Publication # 2007/0288236 A1). 

With regards to claim 17, Senior and Moreno may not explicitly detail the limitation wherein said processing comprises using only the first artificial neural network of the first and the second artificial neural networks. However, Kim teaches this (Paragraphs 79-82 and figure 1, teach a speech processing system wherein only one neural network out of two neural networks is used as decided/needed by the system).

Senior, Moreno and Kim can be considered as analogous art as they belong to a similar field of endeavor in audio processing by artificial neural networks. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Kim (Use of only one neural network out of two neural networks in a speech processing system) with those of Senior and Moreno (Use of artificial neural networks for speech processing) if an increase in recognition accuracy of the speech signal frame is not needed (Kim, para 52).

13.	Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Senior in view of Moreno and further in view of Sak (U.S. Patent Application Publication # 2019/0043508 A1). 

With regards to claim 18, Senior and Moreno may not explicitly detail the limitation comprising receiving enrolment data from the at least one microphone, the enrolment data representing enrolment speech of the speaker. However, Sak teaches this (Para 45 and figure 1, teach the process to train the speaker verification neural network. In each training cycle, the neural network processes a batch of training utterances. The batch of training utterances includes multiple groups of training utterances, where the utterances in each group were all spoken by the same speaker, but the utterances from different groups were all spoken by different speakers. Para 47, further teaches that upon receiving the enrollment utterances, the device processes the utterances with the speaker 

and training the first artificial neural network to identify the speaker using the received enrolment data, wherein the content of the speech is different from the content of the enrolment speech (Para 48, teaches that after enrollment, the user may use voice-based inputs to identify himself or herself with the client device during the "verification" phase. Para 59, further teaches that by training the neural network on training samples characterizing utterances corresponding to a wide range of words or phrases, the network may be more robustly trained for text-independent speaker verification in which the network can determine speaker representations characterizing speaking characteristics of a user independent of the underlying content of a verification utterance, even previously unseen utterances e.g., verification utterances corresponding to a phrase that was not spoken in any of the training utterances).

Senior, Moreno and Sak can be considered as analogous art as they belong to a similar field of endeavor in audio processing by artificial neural networks. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Sak (Use of enrollment process for text independent speaker recognition) with those of Senior and Moreno (Use of artificial neural Sak, para 27).

Allowable Subject Matter

14.	Claims 13-15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The current prior art of record does not disclose or suggest the use of a third and fourth neural network as outlined in claims 13-14. Claim 15 is a narrower version of claim 13 and at least allowed for the preceding reasons. 

Conclusion

15.	The following prior art, made of record but not relied upon, is considered pertinent to applicant's disclosure: Husain (U.S. Patent Application Publication # 2019/0122096 A1), Ge (U.S. Patent Application Publication # 2018/0158463 A1). These references are also included in the PTO-892 form attached with this office action.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NEERAJ SHARMA whose contact information is given below.  The examiner can normally be reached on Monday to Friday 8 am to 5 pm. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Louis-Desir can be reached on 571-272-7799 (Direct Phone).  The fax number for the organization where this application or proceeding is assigned is 571-273-8300.

/NEERAJ SHARMA/
Primary Examiner, Art Unit 2659
571-270-5487 (Direct Phone)
571-270-6487 (Direct Fax)
neeraj.sharma@uspto.gov (Direct Email)