DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 5/2/2022 has been entered.
Response to Amendment
In response to the office action from 12/2/2021, the applicant has submitted a request for continued examination, filed 5/2/2022, amending claims 1, 11, 13, 14, while arguing to traverse the prior art rejections. Applicant’s arguments have been fully considered, but as the amendments did not overcome the prior art of record, therefore in alternate the examiner identified a novel feature and recommended it as an examiner’s amendment. Therefore claims 1-16 with the examiner’s amendment below are allowable over prior art of record for the below provided reasons for allowance.
EXAMINER’S AMENDMENT
The examiner has changed the title of the invention to “SYSTEM FOR CREATING SPEAKER MODEL BASED ON VOCAL SOUNDS FOR A SPEAKER RECOGNITION SYSTEM, COMPUTER PROGRAM PRODUCT, AND CONTROLLER, USING TWO NEURAL NETWORKS”, so as to be more descriptive of the invention.

An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this examiner’s amendment was given in an interview with the attorney on file, Ms. Ma Yuefei on 7/8/2022.

Amend claims 1, 11, 13, and 14:

As Per Claim 1:

1. (Currently Amended) A system for creating a speaker model, the system comprising:
one or more processors configured to:
generate second neural networks, each of the second neural networks being generated by changing a part of network parameters from an input layer to a predetermined intermediate layer of a first neural network based on one of a plurality of patterns, the first neural network being a neural network for detecting one or more words without recognizing a speaker; 
input a piece of speech into the each of the second neural networks so as to obtain a plurality of outputs from the intermediate layer; and
create a speaker model that receives a speaker feature comprising vocal sounds as input and outputs a recognized speaker by inputting each of the outputs as the speaker feature.

As Per Claim 11:

11. (Currently Amended) A recognition system comprising:
one or more processors configured to:
receive speech and converts the speech into a feature;
input the feature to a first neural network and calculate a score that represents likelihood indicating whether the feature corresponds to one or more predetermined words;
detect the one or more words from the speech using the score;
generate second neural networks, each of the second neural networks being generated by changing a part of network parameters from an input layer to a predetermined intermediate layer of the first neural network based on one of a plurality of patterns, the first neural network being a neural network for detecting one or more words without recognizing a speaker, 
input a piece of the speech into the each of the second neural networks so as to obtain a plurality of outputs from the intermediate layer;
create a speaker model that receives a speaker feature comprising vocal sounds as input and outputs a recognized speaker by inputting each of the outputs as the speaker feature; and
recognize the speaker using the speaker model.

As Per Claim 13:

13. (Currently Amended) A computer program product having a non-transitory computer readable medium including programmed instructions stored therein, wherein the instructions, when executed by a computer, cause the computer to perform:
generating second neural networks, each of the second neural networks being generated by changing a part of network parameters from an input layer to a predetermined intermediate layer of a first neural network based on one of a plurality of patterns, the first neural network being a neural network for detecting one or more words without recognizing a speaker, 
inputting a piece of speech into the each of the second neural networks so as to obtain a plurality of outputs from the intermediate layer; and
creating a speaker model that receives a speaker feature comprising vocal sounds as input and outputs a recognized speaker by inputting each of the outputs as the speaker feature.

As Per Claim 14:

14. (Withdrawn – Currently Amended) A controller comprising:
one or more processors configured to:
generate second neural networks, each of the second neural networks being generated by changing a part of network parameters from an input layer to a predetermined intermediate layer of a first neural network based on one of a plurality of patterns, the first neural network being a neural network for detecting one or more words without recognizing a speaker,
input a piece of speech into the each of the second neural networks so as to obtain a plurality of outputs from the intermediate layer; and
create a speaker model that receives a speaker feature comprising vocal sounds as input and outputs a recognized speaker by inputting each of the outputs as the speaker feature;
acquire speech of a user and detect one or more predetermined words; 
determine whether the user is a predetermined user using the speaker model; and
output, when the user is the predetermined user, a control instruction that is defined to the one or more words.

Allowable Subject Matter
The following is an examiner’s statement of reasons for allowance: The independent claims 1, 11, 13 and 14 concern system, method, computer program having a non-transitory computer readable medium for using two “neural networks” in order to “recogniz[e] a speaker” (specification ¶ 0017 lines 8+). It begins by receiving speech (“for example” “256 samples of speech” (spec. ¶ 0043 sentence 2)). Here a “first neural network” is specifically tasked “for detecting one or more words without recognizing a speaker”; e.g., in this part the “samples of speech” are recognized.
Following this step, a “second neural network” is “generat[ed]” “by changing a part of network parameters from an input layer to a predetermined intermediate layer of the first neural network based on one of a plurality of patterns”, where the “parameters” are e.g., “weight, bias and the like” (spec. ¶ 0031 last sentence), and the “patterns” are “For example, when three sets of random patterns are generated, 3 pieces of intermediate layer (speaker feature) can be obtained” (spec. ¶ 0058 sentence 1). The “second neural network” then uses “pieces of speech” (e.g., the “one or more words” “detect[ed]” by the “first neural network”), and generates “features”, e.g., “12-dimensional mel frequency cepstrum coefficient (MFCC) feature.” (spec. ¶ 0043 sentence 2), and/or “vocal sound and phoneme forming a keyword” (spec. ¶ 0039 last sentence). Using these “features” then a “recognized speaker” is “output[ted]”.
Prior art of record Chen et al. (US 2016/0293167) does teach two different neural network models as well; i.e., “The neural network 640” (¶ 0109 line 1) as a first neural network and “deep neural networks (DNNs)” (¶ 0085 line 2) as a second neural network. Although Chen et al. do not specifically teach tasking one of these neural networks to one choir (e.g. word recognition) and another one to speaker identification, but in ¶ 0110 sentence 1 it teaches: “The neural network” “generates the desired set of speaker features may be designated as the trained neural network 642” (i.e., a set of speaker features are generated); then in ¶ 0111 lines 7+ it is further taught: “speaker verification model 644” (responsible for recognizing speakers) “include” “a subset of, the trained neural network 642” (which corresponds to the “speaker features”). In sum, Chen et al. also uses “speaker feature” and two neural networks to recognize a speaker. Chen et al. though is silent on what specific “speaker feature” it uses in achieving this goal.
Flanagan et al. (US Patent 5,737,587) on the other hand teaches in Col. 10 lines 61+: “The trained neural newrok 4 is then used to transform cepstrum coefficients of the array input to those corresponding to the close-talking microphone 8 input. The transformed cepstrum coefficients are then input to the speaker identification system” (i.e., the cepstrum speech features are used for recognizing a speaker). Flanagan et al. also uses one neural network (i.e., “neural network 4” or “NN4”) for “speaker identification” (Col. 10 line 63+) and uses “a separate neural network” (i.e., a first neural network) for “cepstral coefficients used” (Col. 5 lines 59-60). However,  Flanagan et al. do not teach using “vocal sound” and/or “phonemes” as speech features in their speaker recognition. 
Shastry et al. (US Patent 11,114,088) in Col. 24 lines 50+ does teach using “machine learning” “to access pre-learned phonemes” “to identify the first speaker in the audio stream based on the pre-learned phonemes”. However this process involves as a first step “convert[ing]” an “audio stream” of the “speaker” to be “identif[ied]” into “text”, and therefore it cannot accommodate “vocal sounds” that cannot be transcribed e.g., such as laughing and/or humming. The instant application though is not restricted to just “phonemes” and can use any speaker detected “vocal sound”  in its speaker recognition, and its speaker recognition does not involve any speech transcription.
Further search did not produce any reference teaching this phenomenon. Therefore, these independent claims became allowable. Claims 2-10 (dependent on claim 1), claim 12 (dependent on claim 11), and claims 15-16 (dependent on claim 14), further limit the scope of their allowed parent claims and are thus allowable under similar rationale. 
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARZAD KAZEMINEZHAD whose telephone number is (571)270-5860. The examiner can normally be reached 10:30 am to 11:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DANIEL C WASHBURN can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Farzad Kazeminezhad/
Art Unit 2657
July 9th 2022.