DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 07/23/2020 has been entered.
 
Amendments
This office action is in response to amendments filed 06/17/2020. As per applicants request, claims 1, 10, and 12 have been amended. No new claims have been added or cancelled. Claims 1-10 and 12-21 are pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 08/02/2016 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
training module configured to train the neural network in claim 12. Paragraph [0053], Fig. 5, discloses that the training module is executed by hardware processor 502.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 5 and 16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claims 5 and 16 both recite the following, “…randomly initialize weights associated with neurons in the final hidden neuron layer that do not match a state of any of the output neurons”. This limitation is inconsistent and the examiner is unable to ascertain the metes and bounds of the claim. Claims 1 and 12 already recite that weights with state matches get initialized to a pre-determined non-zero value while all other weights get initialized to zero. Since all the weights are already initialized to either a predetermined non-zero value or zero in claim 1, randomly initializing weights with no state match in claim 5 does not make any logical sense as all weights have already been initialized in claim 1. From the filed applications specification, [0020] discloses that neurons are grouped based on their CD-HMM and CI-HMM states. Furthermore, it appears that paragraphs [0032] – [0034] disclose that the system initializes the weights to three different types of values. The three weight values being a constant for state matches, zero for the weight values with states but no state matches, (i.e. the neurons and associated weights within the group are initialized to either zero or a constant) and a random value for weights with no associated states.(i.e. neurons and associated weights outside the group are initialized to a random value). The examiner suggests 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1,5,6,8,10, 12,16-17, 19, 21 are rejected under 35 U.S.C. 103 as being unpatentable over Salvador et al. (US 9153231 B1) herein referenced as “Salvador”, in view of Saxena et al. (US 2011/0321164 A1)., herein referred to as Saxena.

Regarding Claim 1,

	Salvador teaches A method for training a neural network, comprising: identifying weights in a neural network that each connect an identified neuron of a final hidden layer to a respective neuron of an output neuron layer, wherein each identified neuron of the final hidden layer has a state that matches a hidden Markov model state of the a respective neuron of the output neuron layer; (Fig. 8, col 10, lines 3-9, “Connection weights may be initially learned by the neural network during training, where given inputs are associated with known outputs. In a set of training data, a variety of training examples are fed into the network. Each example typically sets the weights of the correct connections from input to output to 1” Teaches identifying weights (in order to set weights they must be identified) between input and output layers (final hidden layer is between input and output layers as shown in figure 8) that correspond to correct connections (examiner interprets a correct connection as being equivalent to a state match. Figure 5 shows a state diagram for phoneme (sounds). Language models using neural networks use output nodes representing potential next words. If a connection is correct then an output neuron has a match with a previous neuron and is therefore considered a state match. See Col 9, lines 53-64 and 13-29. Furthermore, the final hidden layer has a state that matches a hidden markov model state of the output layer as Col. 6 lines 47-67 through Col. 7 lines 1-48, discloses that the speech recognition engine 218 uses a hidden markov model technique that has a number of states, which together represent phonemes. Sounds that are received may be represented as paths between states of the HMM and multiple paths may represent multiple possible text matches for the same sound. During input processing, the feature vectors are matched with state phonemes. Therefore, a correct connection between the input layers and output layers, must also represent a state match between hidden markov model states.)

Initializing the identified weights to a predetermined non-zero value….. (Fig. 8, col 10, lines 3-9, “Connection weights may be initially learned by the neural network during training, where given inputs are associated with known outputs. In a set of training data, a variety of training examples are fed into the network. Each example typically sets the weights of the correct connections from input to output to 1” teaches initializing the identified connection weights to 1)…the other weights (Col. 10, Lines 3-9, discloses that only correct connections are initialized to 1. therefore, the other weights are considered to be the connections that are not correct and not initialized to 1.)

Training the neural network based on a training corpus after initialization. (Col 5, lines 55-59, “The speech storage 220 may also include a training corpus that may include recorded speech and/or corresponding transcription that may be used to train and improve the models used by the ASR module 214 in speech recognition” teaches training the neural network based on a training corpus after initialization.)
Salvador teaches other weights, it does not explicitly teach initializing the other weights between the final hidden neuron layer and the output neuron layer to zero; 
However, Saxena teaches initializing….. other weights between the final hidden neuron layer and the output neuron layer to zero ([0032], “At the initial stage, the neural network weights are initialized either to zero or any random number” teaches that the weights of the neural network can be initialized to be zero or any random number. Furthermore, these weights are between the final hidden layer and the output neuron layer as shown in figure 3A which shows the neural network configurations (input,hidden,output))

Salvador and Saxena are analogous art because they are both methods of training neural networks. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to modify the neural network training method of Salvador, with the initialization method of Saxena. One of ordinary skill in the art would have been motivated to make this modification in order to have an output matching desired results. (Saxena, [0030])

Regarding Claim 5, 
Salvador in view of Saxena teach the method of claim 1.
Saxena further teaches randomly initializing weights associated with neurons in the final hidden neuron layer that do not match a state of any of the output neurons. (Saxena, [0032], “At the initial stage, the neural network weights are initialized either to zero or any random number” teaches that the weights of the neural network can be initialized to be zero or any random number. Furthermore, these weights are associated with final hidden layer and the output neuron layer as shown in figure 3A, which shows the neural network configurations(input,hidden,output))
Salvador and Saxena are analogous art because they are both methods of training neural networks.
	It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to modify the neural network training method of Salvador, with the initialization method of Saxena.
	One of ordinary skill in the art would have been motivated to make this modification in order to have an output matching desired results. (Saxena, [0030])

Regarding Claim 6, 
Salvador in view of Saxena teach the method of claim 1.
Salvador further teaches performing speech recognition using the trained neural network to convert audio voice information into textual information. (Col 1, lines 15-18, “Speech recognition may also convert a user's speech into text data which may then be provided to various textual based programs and applications. “Teaches that the neural network speech recognition models can be used to convert audio voice information into text data.)

Regarding Claim 8, 
Salvador in view of Saxena teach the method of claim 1.
Salvador further teaches performing wherein identifying weights comprises identifying multiple weights corresponding to state matches between multiple dedicated neurons in the final hidden neuron layer and multiple neurons of the output neuron layer.( Fig. 8, col 10, lines 3-9, “Connection weights may be initially learned by the neural network during training, where given inputs are associated with known outputs. In a set of training data, a variety of training examples are fed into the network. Each example typically sets the weights of the correct connections from input to output to 1” Teaches identifying multiple weights (in order to set weights they must be identified) between input and output layers (final hidden layer is between input and output layers) that correspond to multiple correct connections (examiner interprets a correct connection as being equivalent to a state match. Figure 5 shows a state diagram for phoneme (sounds). Language models using neural networks use output nodes representing potential next words. If a connection is correct then an output neuron has a match with a previous neuron and is therefore considered a state match. See Col 9, lines 53-64 and 13-29))



Regarding Claim 10, 
Salvador teaches a non-transitory computer readable storage medium comprising a computer readable program for training a neural network, wherein the computer readable program when executed on a computer causes the computer to (Col 13, lines 55-60 ” Aspects of the present disclosure may be implemented as a computer implemented method, a system, or as an article of manufacture such as a memory device or non-transitory computer readable storage medium.” teaches non-transitory computer readable storage medium comprising computer readable code capable of performing the steps of claim 1.)
Identifying weights in a neural network that each connect an identified neuron of a final hidden layer to a respective neuron of an output neuron layer, wherein each identified neuron of the final hidden layer has a state that matches a hidden Markov model state of the a respective neuron of the output neuron layer; (Fig. 8, col 10, lines 3-9, “Connection weights may be initially learned by the neural network during training, where given inputs are associated with known outputs. In a set of training data, a variety of training examples are fed into the network. Each example typically sets the weights of the correct connections from input to output to 1” Teaches identifying weights (in order to set weights they must be identified) between input and output layers (final hidden layer is between input and output layers as shown in figure 8) that correspond to correct connections (examiner interprets a correct connection as being equivalent to a state match. Figure 5 shows a state diagram for phoneme (sounds). Language models using neural networks use output nodes representing potential next words. If a connection is correct then an output neuron has a match with a previous neuron and is therefore considered a state match. See Col 9, lines 53-64 and 13-29. Furthermore, the final hidden layer has a state that matches a hidden markov model state of the output layer as paragraph as Col. 6 lines 47-67 through Col. 7 lines 1-48, discloses that the speech recognition engine 218 uses a hidden markov model technique that has a number of states, which together represent phonemes. Sounds that are received may be represented as paths between states of the HMM and multiple paths may represent multiple possible text matches for the same sound. During input processing, the feature vectors are matched with state phonemes. Therefore, a correct connection between the input layers and output layers, must also represent a state match between hidden markov model states.)

Initializing the identified weights to a predetermined non-zero value….. and (Fig. 8, col 10, lines 3-9, “Connection weights may be initially learned by the neural network during training, where given inputs are associated with known outputs. In a set of training data, a variety of training examples are fed into the network. Each example typically sets the weights of the correct connections from input to output to 1” teaches initializing the identified connection weights to 1) …the other weights (Col. 10, Lines 3-9, discloses that only correct connections are initialized to 1. therefore, the other weights are considered to be the connections that are not correct and not initialized to 1.)

Training the neural network based on a training corpus after initialization. (Col 5, lines 55-59, “The speech storage 220 may also include a training corpus that may include recorded speech and/or corresponding transcription that may be used to train and improve the models used by the ASR module 214 in speech recognition” teaches training the neural network based on a training corpus after initialization.)
Salvador does not explicitly teach initializing other weights between the final hidden neuron layer and the output neuron layer to zero; 
However, Saxena teaches initializing….. weights between the final hidden neuron layer and the output neuron layer to zero ([0032], “At the initial stage, the neural network weights are initialized either to zero or any random number” teaches that the weights of the neural network can be initialized to be zero or any random number. Furthermore, these weights are between the final hidden layer and the output neuron layer as shown in figure 3A which shows the neural network configurations(input,hidden,output))

Salvador and Saxena are analogous art because they are both methods of training neural networks.
	It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to modify the neural network training method of Salvador, with the initialization method of Saxena.
One of ordinary skill in the art would have been motivated to make this modification in order to have an output matching desired results. (Saxena, [0030])


Regarding Claim 12,
Salvador teaches a system for training a neural network, comprising:
(Fig. 2 ASR device containing neural network updater module and processor) configured to identify weights in a neural network that each connect an identified neuron of a final hidden layer to a respective neuron of an output neuron layer, wherein each identified neuron of the final hidden layer has a state that matches a hidden Markov model state of the a respective neuron of the output neuron layer; (Fig. 8, col 10, lines 3-9, “Connection weights may be initially learned by the neural network during training, where given inputs are associated with known outputs. In a set of training data, a variety of training examples are fed into the network. Each example typically sets the weights of the correct connections from input to output to 1” Teaches identifying weights (in order to set weights they must be identified) between input and output layers (final hidden layer is between input and output layers as shown in figure 8) that correspond to correct connections (examiner interprets a correct connection as being equivalent to a state match. Figure 5 shows a state diagram for phoneme (sounds). Language models using neural networks use output nodes representing potential next words. If a connection is correct then an output neuron has a match with a previous neuron and is therefore considered a state match. See Col 9, lines 53-64 and 13-29. Furthermore, the final hidden layer has a state that matches a hidden markov model state of the output layer as paragraph (34) discloses that the speech recognition engine 218 uses a hidden markov model technique that has a number of states, which together represent phonemes. Sounds that are received may be represented as paths between states of the HMM and multiple paths may represent multiple possible text matches for the same sound. During input processing, the feature vectors are matched with state phonemes. Therefore, a correct connection between the input layers and output layers, must also represent a state match between hidden markov model states.)

Initialize the identified weights to a predetermined non-zero value….. and (Fig. 8, col 10, lines 3-9, “Connection weights may be initially learned by the neural network during training, where given inputs are associated with known outputs. In a set of training data, a variety of training examples are fed into the network. Each example typically sets the weights of the correct connections from input to output to 1” teaches initializing the identified connection weights to 1) …the other weights (Col. 10, Lines 3-9, discloses that only correct connections are initialized to 1. therefore, the other weights are considered to be the connections that are not correct and not initialized to 1.)

A training module (Fig. 2 ASR device containing neural network updater module and speech storage) configured to train the neural network based on a training corpus after initialization. (Col 5, lines 55-59, “The speech storage 220 may also include a training corpus that may include recorded speech and/or corresponding transcription that may be used to train and improve the models used by the ASR module 214 in speech recognition” teaches training the neural network based on a training corpus after initialization.)
Salvador does not explicitly teach initializing other weights between the final hidden neuron layer and the output neuron layer to zero; 
However, Saxena teaches initializing….. weights between the final hidden neuron layer and the output neuron layer to zero ([0032], “At the initial stage, the neural network weights are initialized either to zero or any random number” teaches that the weights of the neural network can be initialized to be zero or any random number. Furthermore, these weights are between the final hidden layer and the output neuron layer as shown in figure 3A which shows the neural network configurations(input,hidden,output))

Salvador and Saxena are analogous art because they are both methods of training neural networks.
	It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to modify the neural network training method of Salvador, with the initialization method of Saxena.
	One of ordinary skill in the art would have been motivated to make this modification in order to have an output matching desired results. (Saxena, [0030])

Regarding Claim 16,
Salvador in view of Saxena teach the method of claim 12.
Saxena further teaches…… randomly initializing weights associated with neurons in the final hidden neuron layer that do not match a state of any of the output neurons. (Saxena, [0032], “At the initial stage, the neural network weights are initialized either to zero or any random number” teaches that the weights of the neural network can be initialized to be zero or any random number. Furthermore, these weights are associated with final hidden layer and the output neuron layer as shown in figure 3A, which shows the neural network configurations(input,hidden,output))
Salvador and Saxena are analogous art because they are both methods of training neural networks.
	It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to modify the neural network training method of Salvador, with the initialization method of Saxena.
	One of ordinary skill in the art would have been motivated to make this modification in order to have an output matching desired results. (Saxena, [0030])


Regarding Claim 17,
Salvador in view of Saxena teach the system of claim 12.
Salvador further teaches a speech recognition module(Fig. 2 ASR device containing neural network updater module and speech recognition engine) configured to perform speech recognition using the trained neural network to convert audio voice information into textual information. (Col 1, lines 15-18, “Speech recognition may also convert a user's speech into text data which may then be provided to various textual based programs and applications. “Teaches that the neural network speech recognition models can be used to convert audio voice information into text data.)

Regarding Claim 19,
Salvador in view of Saxena teach the system of claim 12.
Salvador further teaches wherein the initializing module is further configured to identify multiple weights corresponding to state matches between multiple dedicated neurons in the final hidden neuron layer and multiple neurons of the output neuron layer.( Fig. 8, col 10, lines 3-9, “Connection weights may be initially learned by the neural network during training, where given inputs are associated with known outputs. In a set of training data, a variety of training examples are fed into the network. Each example typically sets the weights of the correct connections from input to output to 1” Teaches identifying multiple weights (in order to set weights they must be identified) between input and output layers (final hidden layer is between input and output layers) that correspond to multiple correct connections (examiner interprets a correct connection as being equivalent to a state match. Figure 5 shows a state diagram for phoneme (sounds). Language models using neural networks use output nodes representing potential next words. If a connection is correct then an output neuron has a match with a previous neuron and is therefore considered a state match. See Col 9, lines 53-64 and 13-29))

Regarding Claim 21,
Salvador in view of Saxena teaches the system of claim 1.
	Salvador further teaches wherein the initializing the identified weights is completed before training begins. (Col 11, Lines 40-45, discloses that weights in the neural network may be configured to be an initial training value, and thus must have been configured before training began in order to have weights that have already been set an initial value. This is functionally the same as having two distinct steps of initializing and training)

Claims 2-4, 13-15 are rejected under 35 U.S.C. 103 as being unpatentable over Salvador et al. (US 9153231 B1) herein referenced as “Salvador”, in view of Saxena et al. (US 2011/0321164 A1)., herein referred to as Saxena, and further in view of Zhang et al. "Standalone training of context-dependent deep neural network acoustic models," herein referenced as “Zhang”.

Regarding Claim 2,
Salvador in view of Saxena teach the method of claim 1.
Salvador in view of Saxena does not explicitly teach identifying neurons on the final hidden neuron layer that have a context independent hidden Markov model state that matches a context dependent hidden Markov model state of a respective neuron of the output neuron layer.
However, Zhang teaches identifying neurons on the final hidden neuron layer that have a context independent hidden Markov model state (fig. 1, teaches hidden layers having ci-dnn-hmm states, Conclusion also specifies “ last hidden layer of the CI-DNN”)    that matches a context dependent hidden Markov model state of a .  (Conclusion, Afterwards, a Gaussian distribution with a common covariance matrix is estimated for every untied CD state based on the hidden activation vectors generated by the last hidden layer of the CI-DNN, which are clustered by decision tree state tying. These are the converted to the output layer of a CD-DNN” teaches neurons in the output layer having a context-dependent hidden markov model state that matches a Context independent state of the hidden layer since the tied states of the hidden layer are converted to the output layer of a context dependent state. (Also see abstract, fig. 1, and table 1, details that hidden markov models are used))
Salvador, Saxena, and Zhang are analogous art because they are all related to methods of training neural networks.
	It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the context dependent and context independent hidden markov model states of Zhang with the neural network that uses hidden markov model states and  parameter initialization, as taught by Salvador/Saxena.
One of ordinary skill in the art would have been motivated to make this modification in order to have state-of-the-art performance. (Zhang, Conclusion)

Regarding Claim 3,
Salvador in view of Saxena teach the method of claim 1.
Salvador in view of Saxena does not explicitly teach identifying neurons on the final hidden neuron layer that have a phone state that matches a context dependent hidden Markov model state of a respective neuron of the output neuron layer.
However, Zhang teaches identifying neurons on the final hidden neuron layer that have a phone state (Section 3.1, paragraph 1, “To train a CI-DNN-HMM, the CI state-level transcriptions are generated from the word transcriptions. This is done by expanding every word to CI phones according to its first pronunciation in the dictionary [14], and then replacing every CI phone with its HMM states.” Teaches that neurons on the final hidden layer have a phone state (phone states are replaced with their context independent hidden markov model states located in the final hidden layer as stated in figure 1) that matches a context dependent hidden Markov model state of a respective neuron of the output neuron layer. (Conclusion, Afterwards, a Gaussian distribution with a common covariance matrix is estimated for every untied CD state based on the hidden activation vectors generated by the last hidden layer of the CI-DNN, which are clustered by decision tree state tying. These are the converted to the output layer of a CD-DNN” teaches neurons in the output layer having a context-dependent hidden markov model state that matches a phone state of the hidden layer since the tied states of the hidden layer are converted to the output layer of a context dependent state. (Also see abstract, fig. 1, and table 1, details that hidden markov models are used))
Salvador, Saxena, and Zhang are analogous art because they are all related to methods of training neural networks.
Zhang with the neural network using hidden markov model states and a training method with parameter initialization of Salvador/Saxena.
One of ordinary skill in the art would have been motivated to make this modification in order to have state-of-the-art performance. (Zhang, Conclusion)

Regarding Claim 4, 
Salvador in view of Saxena teach the method of claim 1.
Salvador in view of Saxena does not explicitly teach wherein identifying weights comprises identifying neurons on the final hidden neuron layer that correspond to a branch of a phonetic decision tree that includes a context dependent hidden Markov model state of a respective neuron of the output neuron layer.
However, Zhang teaches wherein identifying weights comprises identifying neurons on the final hidden neuron layer (Conclusion specifies that the final hidden layer clustered by decision tree state tying) that correspond to a branch of a phonetic decision tree (Section 2.2, Par 2, “The decision tree is a binary tree built upon a set of pre-defined binary phonetic questions. At each non-leaf node, the states are classified into the node's children” teaches decision trees branching off) that includes a context dependent hidden Markov model state of a respective neuron of the output neuron layer. (Conclusion, Gaussian distribution with a common covariance matrix is estimated for every untied CD state based on the hidden activation vectors generated by the last hidden layer of the CI-DNN, which are clustered by decision tree state tying. These are the converted to the output layer of a CD-DNN.” Teaches context dependent hidden markov model states in the output neuron layer (Abstract specifies that the outputs are CD-DNN-HMM states))
Salvador, Saxena, and Zhang are analogous art because they are all related to methods of training neural networks.
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the decision tree of Zhang with the neural network training methods of Salvador and Saxena.
One of ordinary skill in the art would have been motivated to make this modification in order to have state-of-the-art performance. (Zhang, Conclusion)
Regarding Claim 13,
Salvador in view of Saxena teach the system of claim 12.
Salvador in view of Saxena does not explicitly teach wherein the initializing module is further configured to identify neurons on the final hidden neuron layer that have a context independent hidden Markov model state that matches a context dependent hidden Markov model state of a respective neuron of the output neuron layer.
However, Zhang teaches…… identify[ing] neurons on the final hidden neuron layer that have a context independent hidden Markov model state (fig. 1, teaches hidden layers having ci-dnn-hmm states, Conclusion also specifies “ last hidden layer of the CI-DNN”)    that matches a context dependent hidden Markov model state of a respective neuron of the output neuron layer.  (Conclusion, Afterwards, a Gaussian distribution with a common covariance matrix is estimated for every untied CD state based on the hidden activation vectors generated by the last hidden layer of the CI-DNN, which are clustered by decision tree state tying. These are the converted to the output layer of a CD-DNN” teaches neurons in the output layer having a context-dependent hidden markov model state that matches a Context independent state of the hidden layer since the tied states of the hidden layer are converted to the output layer of a context dependent state. (Also see abstract, fig. 1, and table 1, details that hidden markov models are used))
Salvador, Saxena, and Zhang are analogous art because they are all related to methods of training neural networks.
	It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the context dependent and context independent hidden markov model states of Zhang with the neural network using hidden markov models and a training method with parameter initialization of Salvador and Saxena.
One of ordinary skill in the art would have been motivated to make this modification in order to have state-of-the-art performance. (Zhang, Conclusion)

Regarding Claim 14,
Salvador in view of Saxena teach the system of claim 12.
Salvador in view of Saxena does not explicitly teach wherein the initializing module is further configured to identify neurons on the final hidden neuron layer that 
However, Zhang teaches…… identify[ing] neurons on the final hidden neuron layer that have a phone state (Section 3.1, paragraph 1, “To train a CI-DNN-HMM, the CI state-level transcriptions are generated from the word transcriptions. This is done by expanding every word to CI phones according to its first pronunciation in the dictionary [14], and then replacing every CI phone with its HMM states.” Teaches that neurons on the final hidden layer have a phone state (phone states are replaced with their context independent hidden markov model states located in the final hidden layer as stated in figure 1) that matches a context dependent hidden Markov model state of a respective neuron of the output neuron layer. (Conclusion, Afterwards, a Gaussian distribution with a common covariance matrix is estimated for every untied CD state based on the hidden activation vectors generated by the last hidden layer of the CI-DNN, which are clustered by decision tree state tying. These are the converted to the output layer of a CD-DNN” teaches neurons in the output layer having a context-dependent hidden markov model state that matches a phone state of the hidden layer since the tied states of the hidden layer are converted to the output layer of a context dependent state. (Also see abstract, fig. 1, and table 1, details that hidden markov models are used))
Salvador, Saxena, and Zhang are analogous art because they are all related to methods of training neural networks.
Zhang with the neural network using hidden markov models and  training methods with parameter initialization of Salvador and Saxena.
One of ordinary skill in the art would have been motivated to make this modification in order to have state-of-the-art performance. (Zhang, Conclusion)

Regarding Claim 15, 
Salvador in view of Saxena teach the system of claim 12.
Salvador in view of Saxena does not explicitly teach wherein the initializing module is further configured to identify weights comprises identifying neurons on the final hidden neuron layer that correspond to a branch of a phonetic decision tree that includes a context dependent hidden Markov model state of a respective neuron of the output neuron layer.
However, Zhang teaches wherein….. Identify[ing] weights comprises identifying neurons on the final hidden neuron layer (Conclusion specifies that the final hidden layer clustered by decision tree state tying) that correspond to a branch of a phonetic decision tree (Section 2.2, Par 2, “The decision tree is a binary tree built upon a set of pre-defined binary phonetic questions. At each non-leaf node, the states are classified into the node's children” teaches decision trees branching off) that includes a context dependent hidden Markov model state of a respective neuron of the output neuron layer. (Conclusion, Gaussian distribution with a common covariance matrix is estimated for every untied CD state based on the hidden activation vectors generated by the last hidden layer of the CI-DNN, which are clustered by decision tree state tying. These are the converted to the output layer of a CD-DNN.” Teaches context dependent hidden markov model states in the output neuron layer (Abstract specifies that the outputs are CD-DNN-HMM states))
Salvador, Saxena, and Zhang are analogous art because they are all related to methods of training neural networks.
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the decision tree of Zhang with the neural network training methods of Salvador and Saxena.
One of ordinary skill in the art would have been motivated to make this modification in order to have state-of-the-art performance. (Zhang, Conclusion)

Claims 7 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Salvador et al. (US 9153231 B1) herein referenced as “Salvador”, in view of Saxena et al. (US 2011/0321164 A1)., herein referred to as Saxena, and further in view of M. N. Bojnordi et al., "Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning," herein referenced as “Bojnordi”.

Regarding Claim 7,
Salvador in view of Saxena teach the method of claim 1.
Salvador in view of Saxena does not explicitly teach wherein the neural network comprises hardware weights.
Bojnordi” teaches wherein the neural network comprises hardware weights. (Section 3, “The proposed memristive Boltzmann machine is an RRAM based hardware platform capable of accelerating combinatorial optimization and neural computation tasks.” Teaches a neural network (Boltzmann machine is a type of stochastic neural network) comprising hardware weights (RRAM))
Salvador, Saxena, and Bojnordi are analogous art because they are all methods related to training neural networks.
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the hardware weights of Bojnordi with the neural network training method with weight initialization of Salvador and Saxena.
One of ordinary skill in the art would have been motivated to make this modification in order to accelerate combinatorial optimization and neural computation tasks. (Bojnordi, Section 3)

Regarding Claim 18,
Salvador in view of Saxena teach the system of claim 12.
Salvador in view of Saxena does not explicitly teach wherein the neural network comprises hardware weights.
However, “Bojnordi” teaches wherein the neural network comprises hardware weights. (Section 3, “The proposed memristive Boltzmann machine is an RRAM based hardware platform capable of accelerating combinatorial optimization and neural computation tasks.” Teaches a neural network (Boltzmann machine is a type of stochastic neural network) comprising hardware weights (RRAM))
Salvador, Saxena, and Bojnordi are analogous art because they are all methods related to training neural networks.
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the hardware weights of Bojnordi with the neural network training method with weight initialization of Salvador and Saxena.
One of ordinary skill in the art would have been motivated to make this modification in order to accelerate combinatorial optimization and neural computation tasks. (Bojnordi, Section 3)

Claims 9 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Salvador et al. (US 9153231 B1) herein referenced as “Salvador”, in view of Saxena et al. (US 2011/0321164 A1)., herein referred to as Saxena, and further in view of Bisani et al (US 2015/0095026 A1) herein referenced as “Bisani”.

Regarding Claim 9, 
Salvador in view of Saxena teach the method of claim 8.
Salvador in view of Saxena does not explicitly teach wherein a number of the multiple weights is proportional to a number of phones in a particular branch of a phonetic decision tree.
However, Bisani teaches wherein a number of the multiple weights is proportional to a number of phones in a particular branch of a phonetic decision tree. ([0047], In FIG. 7,” arcs from node 710 to nodes 720 to 726 are labeled with example words that may be recognized by the speech recognition engine 318.” Teaches wherein the number of multiple weights (scores associated with the arcs) is proportional with the number of phones (audio data corresponds to phonemes or words [0028]) in a particular branch in a phonetic decision tree (fig. 7))
Salvador, Saxena, and Bisani are analogous art because they are all methods related to training neural networks.
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the multiple weights and phones in the decision tree of Bisani with the neural network training methods of Salvador and Saxena.
One of ordinary skill in the art would have been motivated to make this modification in order to have more accurate speech recognition results. (Bisani, [0017])

Regarding Claim 20,
Salvador in view of Saxena teach the system of claim 19.
Salvador in view of Saxena does not explicitly teach wherein a number of the multiple weights is proportional to a number of phones in a particular branch of a phonetic decision tree.
However, Bisani teaches wherein a number of the multiple weights is proportional to a number of phones in a particular branch of a phonetic decision tree. ([0047], In FIG. 7,” arcs from node 710 to nodes 720 to 726 are labeled with example words that may be recognized by the speech recognition engine 318.” Teaches wherein the number of multiple weights (scores associated with the arcs) is proportional with the number of phones (audio data corresponds to phonemes or words [0028]) in a particular branch in a phonetic decision tree (fig. 7))
Salvador, Saxena, and Bisani are analogous art because they are all methods related to training neural networks.
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the multiple weights and phones in the decision tree of Bisani with the neural network training methods of Salvador and Saxena.
One of ordinary skill in the art would have been motivated to make this modification in order to have more accurate speech recognition results. (Bisani, [0017])



Response to Arguments

Applicant's arguments filed on 06/17/2020 with respect to the 35 U.S.C. 103 rejection of claims 1, 10, and 12 have been fully considered but they are not persuasive. 

	On page 8 of remarks applicant argues the following, “In the previous response, Applicant addressed claims 2 and 3, and pointed out that the rejection failed to properly account for the relationship between the identification of the weights and the matching of states between neurons. The Examiner’s interpretation of Salvador relies on the identification of “correct” connections, with no mention of HMM states. The Examiner therefore cites Zhang as addressing 
Examiners response,
	The examiner respectfully disagrees. The applicant provides no definition in the claims as to what a state match is and therefore the claim is open to the broadest reasonable interpretation. Upon further review of the references, it appears that Salvador teaches specific neurons in the hidden layers that have states that match the Hidden Markov Model (HMM) states of respective neurons in the output layer. Salvador discloses that the speech recognition model uses Hidden Markov Models and that Sounds received may be represented as paths between states of the HMM and multiple paths may represent multiple possible text matches for the same sound. During input processing, the feature vectors are matched with state phonemes. Therefore, under the broadest reasonable interpretation of the claim, a correct connection between the input layers and output layers, must also represent a state match between the Hidden Markov Model states.
Applicant’s arguments with respect to the 35 U.S.C. 103 rejection of claims 2-9 and 13-21 have been fully considered but are not persuasive, as the applicant’s arguments depend on the allowability of claims 1, 10, and 12.


Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to VASYL DYKYY whose telephone number is (571)270-5019.  The examiner can normally be reached on M-F 7:30 - 4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/VASYL DYKYY/Examiner, Art Unit 2122                                                                                                                                                                                                        
/BABOUCARR FAAL/Primary Examiner, Art Unit 2184