DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). 
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 3/8/2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Objections
Claim 8 is objected to because of the following informalities:  “one or more processors is configured” appears to be misspelling of “one or more processors are configured”.  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:



Claims 9-10 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claims 9-10, they recite: “the neural network used for calculating the score …”. This is in turn compared to the “neural network” that is responsible for “change network parameters”. According to claim 8, indeed it does recite a single “neural network” as being responsible for determining the “score”. However according to the claim 1, “input a piece of speech into each of neural networks” for “chang[ing]” “network parameters”; i.e., for this task there are plurality of “neural networks” required. Therefore these claims 9 and 10 are comparing one neural network responsible for “score” calculation with plurality of “neural networks” responsible for “change network parameters”. It is therefore not clear which of the plurality of the latter “neural networks” is compared against the single “neural network” responsible for “parameter” “change[s]”. The examiner therefore interpreted the claims to imply that at least one of the latter plurality of “neural networks” responsible for the “parameter” “changes” to be the same as the single “neural network” responsible for the “score” in claim 9, and at least one of the plurality of “neural networks” responsible 


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 3-5, 8-13 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Chen et al. (US 2016/0293167).
Regarding claim 1, Chen et al. do teach a system for creating a speaker model Title, Abstract), 
the system comprising:
one or more processors ( ¶ 0173 line 1: “The computing device 1400 includes a processor 1402”)
configured to:
change a part of network parameters from an input layer to a predetermined intermediate layer based on a plurality of patterns (¶ 0109 lines 1-8: “the neural network” “may include an input layer” (emerging from an input layer)  “The weights or 
and input a piece of speech into each of neural networks so as to obtain a plurality of outputs from the intermediate layer (¶ 0109 lines 1-8: “the neural network” “may include an input layer” (input) “for inputting information about the training utterances 622” (a piece of speech into each of neural networks) “several hidden layers for processing the training” (sent into the intermediate layers) “utterances 622, and an output layer for providing output” (to obtain a plurality of outputs, e.g. see Fig. 1B outputs “Spk1” … “SpkN”); ¶ 0110 sentence 1: “The neural network that generates the desired set of speaker features may be designated as the trained neural network” (this  proves there are plurality of neural networks used to receive the input from which one “that” “generates the desired set of” “features” is “designated”)), 
the part of network parameters of the each of the neural networks is changed based on one of the plurality of patterns (¶ 0109 lines 5-8: “The weight or other parameters” (network parameters) “of the one or more hidden layers” (of e.g. a 
and
create a speaker model with respect to one or more words detected from the speech based on the outputs (¶ 0110 sentence 1: “The neural network that generates the desired set of speaker features” (based on the outputs) “may be designated as the trained neural network” (a speaker model is created, e.g., ¶ 0060 sentence 2: “For enrollment, a speaker may provide a few utterances of the global password” (i.e., the “training utterances” correspond to “password” (one or more words detected from a speech)) “the d-vector from each of these utterances” (the plurality of outputs generated from neural networks) “is averaged together to form a speaker model” (to generate the speaker model); that is so since according to ¶ 0043 lines 9-11: “Deep neural network” is used “to extract” “speaker-discriminative feature, or” “d-vector”  (i.e., the “d-vector” is a result of applying “neural” “network” and they correspond to the “speaker features” in ¶ 0110 quoted above)).



Regarding claim 4, Chen et al. do teach the system for creating a speaker model according to claim 1, wherein the one or more processors create the speaker model for each partial section included in the one or more words (¶ 0043 last sentence: the utterance’s “d-vector” obtained from application of “DNN” on the “utterance” are “incrementally computed frame by frame” (i.e. they comprise of plurality of partial sections)).

Regarding claim 5, Chen et al. do teach the system for creating a speaker model according to claim 1, wherein the one or more processors change, out of the network parameters from the input layer to the intermediate layer, weight of a part of the network parameters (¶ 0109 sentence 2: “weights” (weight of a neural network) “or 

Regarding claim 8, Chen et al. do teach the system for creating a speaker model according to claim 1, wherein the one or more processors is further configured to:
receive speech and converts the speech into a feature (“FIG. 12” step “1202”: “INPUT SPEECH FEATURES TO A NEURAL NETWORK”; ¶ 0009 lines 1+: “the speech data” (receive speech) “provided to the input layer of the neural network is a set of feature values extracted from audio” (converted into a feature)); 
input the feature to a neural network (“FIG. 12” step “1202”: “INPUT SPEECH FEATURES TO A NEURAL NETWORK”; ¶ 0009 lines 1+: “provided to the input layer of the neural network is a set of feature values extracted from audio” (input the feature to a neural network))
and calculate a score that represents likelihood indicating whether the feature corresponds to one or more predetermined words (¶ 0157 lines 6+: “the system determines whether the particular utterance was spoken by the particular speaker” (determining a likelihood whether the feature corresponds to a “particular utterance” (e.g. a “password” (¶ 0042 line 10) or one or more predetermined words)) “by determining whether the distance” (based on a score calculation) “between the evaluation vector and reference vector satisfies a threshold”); 


Regarding claim 9, Chen et al. do teach the system for creating a speaker model according to claim 8, wherein the neural network used for calculating  the score is the same as neural network in which  the one or more processors change network parameters (¶ 0085 sentence 1 and line 13 respectively: “deep neural network” identified as being used for obtaining the “distance” (score); ¶ 0004 lines 7-8 teach considering “parameters” (e.g. weights) “of a”  “deep neural network” (the same neural network technique is used for “parameter” manipulations as well the one used for score)).

Regarding claim 10, Chen et al. do teach the system for creating a speaker model according to claim 8, wherein the neural network used for calculating the score is different from neural network in which the one or more processors change network parameters (¶ 0085 sentence 1 and line 13 respectively: “deep neural networks” identified as being used for obtaining the “distance” (score); ¶ 0004 sentence 1-2 teach: “in some implementations deep locally-connected networks (“CNN”)” as being used for 

Regarding claim 11, Chen et al. do teach a recognition system (Title, Abstract), 
comprising:
one or more processors ( ¶ 0173 line 1: “The computing device 1400 includes a processor 1402”)
configured to:
receive speech and converts the speech into a feature (“FIG. 12” step “1202”: “INPUT SPEECH FEATURES TO A NEURAL NETWORK”; ¶ 0009 lines 1+: “the speech data” (receive speech) “provided to the input layer of the neural network is a set of feature values extracted from audio” (converted into a feature)); 
input the feature to a neural network (“FIG. 12” step “1202”: “INPUT SPEECH FEATURES TO A NEURAL NETWORK”; ¶ 0009 lines 1+: “provided to the input layer of the neural network is a set of feature values extracted from audio” (input the feature to a neural network))
and calculate a score that represents likelihood indicating whether the feature corresponds to one or more predetermined words (¶ 0157 lines 6+: “the system determines whether the particular utterance was spoken by the particular speaker” (determining a likelihood whether the feature corresponds to a “particular utterance” 
detect the one or more words from the speech using the score (¶ 0157 lines 6+: “the system determines” (detect) “whether the particular utterance” (e.g. the “password” (the one or more words)) “was spoken by the particular speaker” “by determining the distance” (using the score));
change a part of network parameters from an input layer to a predetermined intermediate layer based on a plurality of patterns (¶ 0109 lines 1-8: “the neural network” “may include an input layer” (emerging from an input layer)  “The weights or other parameters” (network parameters) “of the one or more hidden layers” (of e.g. a predetermined intermediate layer) “may be adjusted” (are changed) “so that the trained neural network produces the desired target vector” (based on a plurality of patterns because) “target vectors may be a set of feature vectors” (e.g. ¶ 0021 “each node of the first hidden layer may be connected to between 5% and 50% of the inputs from the input layer” (a plurality of network patterns between the “input” and “hidden” (intermediate) “layers”))  
and input a piece of speech to each of neural networks so as to obtain a plurality of outputs from the intermediate layer (¶ 0109 lines 1-8: “the neural network” “may include an input layer” (input) “for inputting information about the training utterances 
the part of network parameters of the each of the neural networks is changed based on one of the plurality of patterns (¶ 0109 lines 5-8: “The weight or other parameters” (network parameters) “of the one or more hidden layers” (of e.g. a predetermined intermediate layer) “may be adjusted” (are changed) “so that the trained neural network produces the desired target vector” (based on a plurality of patterns because) “target vectors may be a set of feature vectors” (e.g. ¶ 0021 “each node of the first hidden layer may be connected to between 5% and 50% of the inputs from the input layer” (a plurality of network patterns between the “input” and “hidden” (intermediate) “layers”)); 
and
create a speaker model with respect to one or more words detected one or more words based on the outputs (¶ 0110 sentence 1: “The neural network that generates the desired set of speaker features” (based on the outputs) “may be designated as the 
and recognize a speaker using the speaker model (¶ 0060 last 3 lines: “speaker model” (using the speaker model) “is used for speaker verification” (is used to recognize a speaker)).

Regarding claim 12, Chen et al. do teach the recognition system according to claim 11, wherein the one or more processors input the outputs from the intermediate layer with respect to speech input for recognition to the speaker model so as to recognize a speaker (¶ 0060 sentence 2: “For enrollment, a speaker may provide a few utterances of the global password”(with respect to speech input) “the d-vector from each of these utterances” (the plurality of outputs generated following move from 

Regarding claim 13, Chen et al. do teach a computer program product having a non-transitory computer readable medium including programmed instructions, wherein the instructions, when executed by a computer (Abstract sentence 1: “Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing speaker verification”) , 
cause the computer to perform:
changing a part of network parameters from an input layer to a predetermined intermediate layer based on a plurality of patterns (¶ 0109 lines 1-8: “the neural network” “may include an input layer” (emerging from an input layer)  “The weights or other parameters” (network parameters) “of the one or more hidden layers” (of e.g. a predetermined intermediate layer) “may be adjusted” (are changed) “so that the trained neural network produces the desired target vector” (based on a plurality of patterns because) “target vectors may be a set of feature vectors” (e.g. ¶ 0021 “each node of the first hidden layer may be connected to between 5% and 50% of the inputs from the input layer” (a plurality of network patterns between the “input” and “hidden” (intermediate) “layers”))  

the part of network parameters of the each of the neural networks is changed based on one of the plurality of patterns (¶ 0109 lines 5-8: “The weight or other parameters” (network parameters) “of the one or more hidden layers” (of e.g. a predetermined intermediate layer) “may be adjusted” (are changed) “so that the trained neural network produces the desired target vector” (based on a plurality of patterns because) “target vectors may be a set of feature vectors” (e.g. ¶ 0021 “each node of the first hidden layer may be connected to between 5% and 50% of the inputs from the input layer” (a plurality of network patterns between the “input” and “hidden” (intermediate) “layers”)); 
and
.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al., and further in view of Khoury et al. (US Patent 9,824,692).

Khoury et al. do teach the system for creating a speaker model according to claim 1, wherein the one or more processors create Gaussian distribution represented by a mean and variance of the outputs as the speaker model (Col. 1 lines 51-55: “a system that utilizes a deep neural network” “to perform” “verification of a speaker’s identity” (a speaker model), where “deep neural network may include” “feed-forward neural network” (Col. 2 lines 2-4 (using neural network)) which depends on “comput[ing]” a “loss function” (Col. 2 lines 20-22), which according to Col. 3 lines 4+: “loss function” “defined as Loss=e –(μ+ -μ -)/(σ+ -σ-)√2 where μ+ and σ+” “μ- and σ-” “are the mean and standard deviation” (mean and variance) “of” “recognition scores based on a Gaussian distribution” (of a created Gaussian distribution)).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the “DNN” techniques of Khoury et al. into the “DNN” techniques of Chen et al. would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Chen et al. to benefit from “Loss function” “used to modify” “weight [parameters]” as disclosed in Khoury et al. Col. 2 lines 26-27.

Claims 6-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al., and further in view of Lee et al. (US 2005/0149462).

Regarding claim 6, Chen et al. do not specifically disclose the system for creating a speaker model according to claim 1, wherein the one or more processors add, out of the network parameters from the input layer to the intermediate layer, a random value to bias of a part of the network parameters.
Lee et al. do teach the system for creating a speaker model according to claim 1, wherein the one or more processors add, out of the network parameters from the input layer to the intermediate layer, a random value to bias of a part of the network parameters (¶ 0054 last sentence: “some small random values” (a random value) “may be added” (is added) “to each elements of the bias vectors” (to each bias component of the bias vector, where the “bias” according to the Abstract lines 8-9 correspond to “class parameter” (network parameters); ¶ 0134 sentence 2: “class parameters” are used for “speaker identification”).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method and algorithms used involving “mixing matrices” and “bias vectors” (¶ 0050) of Lee et al. into the speaker identification and verification of Chen et al. would enable the combined 

Regarding claim 7, Chen et al. do not specifically disclose the system for creating a speaker model according to claim 1, wherein the network parameters include bias term parameters with respect to an input value to each layer from the input layer to the intermediate layer, and
the one or more processors add a random value to a part of the bias term parameters.
Lee et al. do teach the system for creating a speaker model according to claim 1, wherein the network parameters include bias term parameters with respect to an input value to each layer from the input layer to the intermediate layer (¶ 0050 last sentence: “There are a total of K mixing matrices (A1, …, Ak) and K bias vectors (b1, …, bK)” (bias term parameters) “that are learned as described herein”; According to Abstract lines 8-9, the “bias” correspond to “class parameters” (network parameters)); ¶ 0045 lines 10-12: “scaling weights and bias weights are repeatedly adjusted to generate scaling and bias terms that are used to separate the sources” (the “bias” (bias term parameters) are 
the one or more processors add a random value to a part of the bias term parameters (¶ 0054 last sentence: “some small random values” (a random value) “may be added” (is added) “to each elements of the bias vector” (to each bias component of the bias vector, where the “bias” according to the Abstract lines 8-9 correspond to “class parameter” (network parameters); ¶ 0134 sentence 2: “class parameters” are used for “speaker identification”).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method and algorithms used involving “mixing matrices” and “bias vectors” (¶ 0050) of Lee et al. into the speaker identification and verification of Chen et al. would enable the combined systems and their associated methods to perform in combination as they do separately and to improve Chen et al.’s speaker identification since using “class parameters” help “distinguish between the speech of person and the speech of another” as disclosed in Lee et al. ¶ 0134 sentence 1.


Conclusion

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DANIEL C WASHBURN can be reached on (571)272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the 






/Farzad Kazeminezhad/
Art Unit 2657
May 7th 2021.