DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 21-40 were amended. Claims 21-40 are pending and are examined in this office action.
Claims 21-40 are rejected under 35 USC 112(a) as failing to comply with the written description requirement.
Applicant’s amendment overcomes the previous grounds of rejection of claim 31 under 35 USC 112(b). 
The rejection of claims 21-29 and 31-40 under 35 USC 103 is maintained; Applicant’s amendment necessitated the changes to the rejection. See response to arguments. 
Applicant’s amendment overcomes the previous grounds of rejection of claim 30 under 35 USC 103; however, upon further consideration, new grounds of rejection under 35 USC 103 necessitated by amendment are presented herein.
Applicant’s amendment overcomes the previous grounds of rejection of claims 21-23 and 25-40 over Non-statutory Double Patenting.

Response to Arguments
Applicant's arguments filed 01/12/2022 have been fully considered but they are not persuasive. Applicant argues, see especially pages 7-9, that Abdel-Hamid fails to teach “a second layer of the neural network that includes a second plurality of nodes each configured to receive a respective single one of the initial output vectors that is different than the respective single initial output vector received by each other node of the second plurality of nodes” because “Abdel-Hamid discloses each max pooling layer band (e.g., P1 and P2) pooling together multiple activations received from the convolution layer rather than a single activation” (emphasis in original). 
of the initial output vectors”. Applicant’s argument presupposes that the “first plurality of nodes” is mapped to by all of the h nodes (i.e., all of the nodes in the convolution layer) taught by Abdel-Hamid, so that all of the outputs of the convolution layer (i.e., all of the h values) are mapped to the initial output vectors. However, neither the current nor previous rejection is based on this mapping. This point is made explicit in both rejections. The claim does not require that the “first plurality of nodes” exhaust the nodes of the convolution layer. Applicant’s argument appears to rely on an interpretation of the claim which is closer to the interpretation that it would be given if the open-ended transitional phrases of the claim were replaced by closed transitional phrases (see MPEP 2111.03 for a discussion of transitional phrases). However, this is not what is claimed.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 21-40 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 

Claim 21 recites “each node of the second plurality of nodes configured to receive, as input, a respective single one of the initial output vectors that is different than the respective single initial output vector received by each other node of the second plurality of nodes”.
The specification does not provide support for this limitation. Applicant indicates on the first page of remarks filed 01/12/2022 that support for the amendment may be found at as-filed paragraphs [0031] and [0036]. Paragraph [0031] describes generating an output vector from the outputs 112a-d, and indicates that the generated vector, or subsets thereof, may be provided to each of the nodes in the subsequent layer. The claimed “initial output vectors” appear to correspond to the values 112a-d (i.e., 112a is a first initial output vector, 112b is a second initial output vector, and so on), rather than to the vector comprising all of the values 112a-d. The specification at [0031] does not provide support for each node of the subsequent layer receiving a single one of these values that is different from the values received by the other nodes. The specification at [0036] describes the provision of the sub-matrices to the first layer, which is not relevant to the provision of the outputs of the first layer to the second layer. The remainder of the specification does not support the limitation. A person of ordinary skill in the art would not recognize Applicant as having possession of providing “a respective single one of the initial output vectors that is different than the respective single initial output vector received by each other node of the second plurality of nodes” based on a generic description of providing some subset of the outputs from a first layer to each node of a second layer.
	Claims 36 and 40 recite substantially similar limitations and are rejected with the same rationale, mutatis mutandis. Dependent claims 22-35 and 37-39 do not resolve the issue and are rejected with the same rationale.
	The specification would provide support for “each node of the second plurality of nodes configured to receive, as input, a respective subset of the initial output vectors” in view of [0031] and [0029], in view of the broadest reasonable interpretation of “vector” encompassing a simple scalar. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 21-28 and 31-40 are rejected under 35 U.S.C. 103 as being unpatentable over “Lehman” (US 2016/0071515 A1) in view of “Abdel-Hamid” (Applying Convolutional Neural Networks Concepts to Hybrid NN-HMM Model for Speech Recognition).

	Regarding claim 21, Lehman teaches
	A method performed by one or more computing devices, the method comprising: (Lehman, Abstract describes systems, methods and products to detect keywords in speech. This may be computer-implemented as illustrated at Figure 2 and described at [0049].)
	obtaining, by the one or more computing devices, a set of values indicating acoustic characteristics of an utterance;  (Lehman, Figures 1 and 4 provide an overview of the method. Figure 
	receiving, by the one or more computing devices, the set of values as input to [a neural network] (Lehman, Steps 104 and 440, described at [0019] and [0037] (note that the “430” in [0037] appears to be a typographical error for “440”), show blocking the feature vectors. That is, the feature vectors are grouped. At steps 106 and 450, described at [0020] and [0037], the blocked feature vectors are provided to a neural network.  As described at [0037], the neural network performs keyword classification, so the computer comprising the neural network is a keyword detection system. [0049] indicates that any of the steps may be computer-implemented.)
	…determining, by the one or more computing devices, whether the utterance includes a keyword based on [an output of a neural network]. (Lehman, steps 106 and 450, described at [0020] and [0037], show the vectors being processed by the neural network. In particular, [0037] indicates that the output of each section of the neural network may be a label indicating whether the respective block of feature vectors includes the keyword. Steps 107-108 and 460, described at [0021] and [0037], shows smoothing the output of the neural network. As described at [0021], the “output of the smoothing may be a final result 108 which provides an indication whether the keyword(s) are present from the speech”. [0049] indicates that any of the steps may be computer-implemented.)
	Lehman does not appear to explicitly teach
	a first layer of a neural network comprising a first plurality of nodes, each node of the first plurality of nodes configured to receive, as input, a respective subset of the set of values and generate, as output, a corresponding initial output vector;
	receiving, by the one or more computing devices, each of the initial output vectors as input to a second layer of the neural network comprising a second plurality of nodes, each node of the second plurality of nodes configured to receive, as input, a respective single one of the initial output vectors that is different than the respective single initial output vector received by each other node of the second plurality of nodes and generate, as output, a corresponding final output vector; and
	determining, by the one or more computing devices, whether the utterance includes a keyword based on each of the final output vectors.
	However, Abdel-Hamid—directed to analogous art—teaches
	a first layer of a neural network comprising a first plurality of nodes, each node of the first plurality of nodes configured to receive, as input, a respective subset of the set of values and generate, as output, a corresponding initial output vector; (Abdel-Hamid, Abstract describes using a convolutional neural network to perform speech recognition. Details of the model are provided in section 3. Section 3.3. describes the convolutional layer used by Abdel Hamid. Figure 2 provides an overview. In particular the two sections including the h values are sections of a single convolution layer. As indicated in the caption for Figure 2, the first neuron h(1)1 receives input from bands 1-4, the next neuron receives input from bands 2-5, and so on. Moreover, the second set of neurons h(2)1 - receive inputs from bands 4-7 as illustrated. This is described in more detail in the specification with respect to equation (4), which shows the activation being based on a subset of the input bands (the v values). A first mapping of the “first plurality of nodes”, used for claim 24, is by the plurality of nodes consisting of h1(1) and h1(2) in Figure 2. A second mapping of “the first plurality of nodes”, used for claim 25, is by the plurality of nodes consisting of h1(1) and h3(2). The outputs of each of the h nodes is provided to the next layer. Each h value is a “vector” because the set of real numbers is a one-dimensional vector space over itself.)
	receiving, by the one or more computing devices, each of the initial output vectors as input to a second layer of the neural network comprising a second plurality of nodes, each node of the second plurality of nodes configured to receive, as input, a respective single one of the initial output vectors that is different than the respective single initial output vector received by each other node of the second plurality of nodes and (Abdel-Hamid, page 4279, Figure 2 shows a second layer including nodes p1 and p2. In the first mapping of the “first plurality of nodes” described above, the output of h1(1) is provided to p1 and the output of h1(2) is provided to p2. In the first mapping, each node of the second plurality of nodes (i.e., p1 and p2) receives a single one of the initial output 1(1) and h1(2)) and these values are different from each other. In the second mapping of the “first plurality of nodes” described above, the output of h1(1) is provided to p1 and the output of h3(2) is provided to p2. In the second mapping, each node of the second plurality of nodes (i.e., p1 and p2) receives a single one of the initial output vectors (i.e., the outputs h1(1) and h3(2)) and these values are different from each other. In the combination with Lehman, Lehman teaches a computer-implementation as described above. Lehman as modified by Abdel-Hamid would implement the modified neural network by computer.)
	 generate, as output, a corresponding final output vector; and (Abdel-Hamid, page 4297, first full paragraph including equation (3) shows the formula used to compute the output of the p nodes. The outputs of the p nodes are interpreted as final output vectors. The broadest reasonable interpretation of a “vector” includes a scalar since the set of real numbers is a one-dimensional vector space over itself.)
	determining, by the one or more computing devices, whether the utterance includes a keyword based on each of the final output vectors. (Abdel-Hamid, page 4277, section 2 indicates that a CNN includes at least one pair of convolution and max pooling layers, followed by higher layers, which ultimately results in a classification of the input. In particular, the max pooling layers contribute to the output, so the output of the CNN is based on the max-pooling layer outputs (which were interpreted as corresponding to the final output vectors).)
	It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which the invention pertains to modify Lehman to include a layer which has nodes configured to receive input derived from different proper subsets of frequency input as taught by Abdel-Hamid and described above because “[s]peech signals enjoy some locality characteristics along the frequency axis…As a result, filters that work on local frequency region will provide an efficient way to represent these local structures…This strategy is better than a traditional acoustic model…Another benefit of local filters is the potential to achieve better robustness against ambient noises…” as described by Abdel-Hamid in the first paragraph of section 3.1. Moreover, It would have been obvious to use the particular NN-HMM architecture for speech recognition taught by Abdel-Hamid because it can “achieve over 10% relative error reduction over regular NNs using the same number of hidden layers and comparable number of trainable weights under the same hybrid NN-HMM framework”.

	Regarding claim 22, the rejection of claim 21 is incorporated herein. Lehman does not appear to explicitly teach 
	wherein each node of the first plurality of nodes comprises a different weight.
	However, Abdel-Hamid—directed to analogous art—teaches
	wherein each node of the first plurality of nodes comprises a different weight.
 (Abdel-Hamid, Figure 2 shows two different sets of weights W(1) and W(2) being associated with different sections of the convolution layer. Equation (4) shows the different sets of weights being used to compute the outputs of the various neurons in the convolution layer based on the proper subsets of values provided to the respective neurons. As indicated in the first full paragraph of page 4279, weight sharing is limited only to those local filters that are closer to each other. That is, local filters that are not close to each other may have different weights.)
	It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention to have performed this combination for the reasons given above with respect to claim 21. Moreover, “speech signals the local structures appearing at different frequency bands may behave in a quite different way. Therefore, it may be better to limit weight sharing only to those local filters that are close to each other and will be pooled together in the maxpooling layer” as indicated by Abdel-Hamid in the first full paragraph of page 4279. That is, using different weights for different frequency bands may allow for the network to adapt to different behaviors at different bands.

	Regarding claim 23, the rejection of claim 22 is incorporated herein. Lehman does not appear to explicitly teach
	wherein each node of the first plurality of nodes is further configured to apply a respective different weight to the respective subset of the set of values to generate the corresponding initial output vector. 
	However, Abdel-Hamid—directed to analogous art—teaches
	wherein each node of the first plurality of nodes is further configured to apply a respective different weight to the respective subset of the set of values to generate the corresponding initial output vector. (Abdel-Hamid, Figure 2 shows two different sets of weights W(1) and W(2) being associated with different sections of the convolution layer. Equation (4) shows the different sets of weights being used to compute the outputs of the various neurons in the convolution layer based on the proper subsets of values provided to the respective neurons. As indicated in the first full paragraph of page 4279, weight sharing is limited only to those local filters that are closer to each other. That is, local filters that are not close to each other may have different weights.)
	It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention to have performed this combination for the reasons given above with respect to claim 21. Moreover, “speech signals the local structures appearing at different frequency bands may behave in a quite different way. Therefore, it may be better to limit weight sharing only to those local filters that are close to each other and will be pooled together in the maxpooling layer” as indicated by Abdel-Hamid in the first full paragraph of page 4279. That is, using different weights for different frequency bands may allow for the network to adapt to different behaviors at different bands.

	Regarding claim 24, the rejection of claim 21 is incorporated herein. Lehman does not appear to explicitly teach
	wherein the respective subsets of the set of values comprise partially overlapping values.
	However, Abdel-Hamid—directed to analogous art—teaches 
	wherein the respective subsets of the set of values comprise partially overlapping values.
 (Abdel-Hamid, Figure 2 and first full paragraph, as described above, show neurons which receive as inputs a respective proper subset of the input bands. If, say, nodes h(1)1 and h(2)1 are taken to be the nodes, then the convolution layer has these nodes, and these nodes receive different proper subsets of the input which are partially overlapping. Node h(1)1 receives input from bands 1-4 and node h(2)1 receives input from bands 4-7 in figure 2, which overlap at band 4.)
	It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention to have performed this combination for the reasons given above with respect to claim 21 because this is a part of the architecture which incorporates the local filters.

	Regarding claim 25, the rejection of claim 21 is incorporated herein. Lehman does not appear to explicitly teach 
	wherein the respective subsets of the set of values comprise non-overlapping values.
	However, Abdel-Hamid—directed to analogous art—teaches 
	wherein the respective subsets of the set of values comprise non-overlapping values. (Abdel-Hamid, Figure 2 and first full paragraph, as described above, show neurons which receive as inputs a respective proper subset of the input bands. If nodes h(1)1 and h(2)3 are taken to be the first plurality of nodes, then the convolution layer has these nodes, and these nodes receive different proper subsets of the input which are non-overlapping. Node h(1)1 receives input from bands 1-4. Since the number of bands input to a given node is s-1 (see Equation 4, where the index of summation over the bands goes from b=1 to s-1), which is independent of the filter section, node h(2)3 receives input from bands 6-9 in the example described in the caption to figure 2.)
	It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention to have performed this combination for the reasons given above with respect to claim 21 because this is a part of the architecture which incorporates the local filters.

	Regarding claim 26, the rejection of claim 21 is incorporated herein. Lehman does not appear to explicitly teach 
	wherein one or more nodes of the first plurality of nodes are configured to each receive a respective subset of the set of values that are localized. 
	However, Abdel-Hamid—directed to analogous art—teaches
	wherein one or more nodes of the first plurality of nodes are configured to each receive a respective subset of the set of values that are localized. (Abdel-Hamid, Section 3.1. describes considering local features. In particular, the v values which make up in the input are described in the third paragraph, where it is indicated that the “speech input to CNN is v that is divided into B frequency bands as: v = [v1 v2 ... vB], where vb is the feature vector representing band b. As shown in figure 1, this feature 
	It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention to have performed this combination for the reasons given above with respect to claim 21.

	Regarding claim 27, the rejection of claim 21 is incorporated herein. Lehman does not appear to explicitly teach
	one or more nodes of the first plurality of nodes are configured to each receive a respective subset of the set of values that are localized in frequency.
	However, Abdel-Hamid—directed to analogous art—teaches 
	one or more nodes of the first plurality of nodes are configured to each receive a respective subset of the set of values that are localized in frequency. (Abdel-Hamid, Section 3.1. describes considering local features. In particular, the v values which make up in the input are described in the third paragraph, where it is indicated that the “speech input to CNN is v that is divided into B frequency bands as: v = [v1 v2 ... vB], where vb is the feature vector representing band b. As shown in figure 1, this feature vector vb includes speech spectral features, delta and acceleration parameters from local band b of all feature frames within the current context window”. Section 3.1. indicates that the values represent data along the frequency axis. This is the “spectral features” described in the third paragraph quoted above. That is, the values are localized along the frequency axis. The v values are the values used in the respective subsets as described above regarding claim 21. )
	It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention to have performed this combination for the reasons given above with respect to claim 21.

	Regarding claim 28, the rejection of claim 21 is incorporated herein. Lehman does not appear to explicitly teach
	wherein each node of the first plurality of nodes receives a respective subset of the set of values that is different than the respective subset of the set of values received by each other node of the first plurality of nodes. 
	However, Abdel-Hamid—directed to analogous art—teaches
	wherein each node of the first plurality of nodes receives a respective subset of the set of values that is different than the respective subset of the set of values received by each other node of the first plurality of nodes. (Abdel-Hamid, Section 3.3 and Figure 2 describe the nodes receiving a proper subset of values of the values v. As described in the third paragraph of section 3.1, the values v is the input to the CNN. A first mapping of the “first plurality of nodes”, used for claim 24, is by the plurality of nodes consisting of h1(1) and h1(2) in Figure 2, which receive inputs from bands 1-4 and bands 4-9, respectively, which are different subsets of the set of values. A second mapping of “the first plurality of nodes”, used for claim 25, is by the plurality of nodes consisting of h1(1) and h3(2), which receive inputs from bands 1-4 and bands 6-9, respectively, which are also different subsets of the set of values. The outputs of each of the h nodes is provided to the next layer.)
	It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention to have performed this combination for the reasons given above with respect to claim 21 because this claim recites further details of the CNN architecture including the local filters.

	Regarding claim 31, the rejection of claim 21 is incorporated herein. Furthermore, Lehman teaches
	wherein the neural network is trained to determine whether the utterance includes the keyword. (Lehman, steps 107-108 and 460, described at [0021] and [0037], shows smoothing the output of the neural network. As described at [0021], the “output of the smoothing may be a final result 108 which provides an indication whether the keyword(s) are present from the speech”. [0049] indicates that any of the steps may be computer-implemented. [0020] indicates that the neural network may have been trained to identify one or more keywords.)

	Regarding claim 32, the rejection of claim 21 is incorporated herein. Lehman does not appear to explicitly teach 
	wherein each one of the final output vectors comprises a posterior probability score.
	However, Abdel-Hamid—directed to analogous art—teaches
	wherein each one of the final output vectors comprises a posterior probability score. (Abdel-Hamid, page 4279, equations (3) and (4) show the equations which are used by the pooling and convolution layers. Page 4280, first paragraph indicates that the networks use logistic activation functions, which takes values between 0 and 1. When the pooling layer is applied, each node selects a value between 0 and 1 based on the input, so the value determined by each node in the pooling layer maybe reasonably interpreted as a posterior probability score.)
	It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention to have performed this combination for the reasons given above with respect to claim 21.

	Regarding claim 33, the rejection of claim 21 is incorporated herein. Furthermore, Lehman teaches
	wherein the set of values comprises audio features derived from audio data of the utterance. (Lehman, Figures 1 and 4 provide an overview of the method. Figure 1, element 101 shows receiving a speech signal as described at [0017]. This may also be seen at Figure 4, step 420, described at [0036]. This speech signal is then processed at step 102 to determine features as described at [0018]. This is also shown at step 430 and described at [0036-0037]. In particular, [0036] indicates that the features may include spectral feature vectors for the speech signal. The components of the vectors are values indicating acoustic characteristics of an utterance. [0049] indicates that any of the steps may be computer-implemented.)

	Regarding claim 34, the rejection of claim 21 is incorporated herein. Lehman does not appear to explicitly teach
	wherein the first layer of the neural network comprises a first hidden layer of the neural network. 
	However, Abdel-Hamid—directed to analogous art—teaches
	wherein the first layer of the neural network comprises a first hidden layer of the neural network. (Abdel-Hamid, section 3.1., page 4278 discusses the use of local features and a baseline model which they further modify. The text following equation (1) indicates that he convolution layer is a hidden layer. The convolution layer shown in Figure 1 is modified to use multiple sections as shown in Figure 2, but the segmented convolution layer is still a hidden layer. It is the first hidden layer as it directly receives the neural network input (i.e., v))
	It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention to have performed this combination for the reasons given above with respect to claim 21.

	Regarding claim 35, the rejection of claim 21 is incorporated herein. Lehman does not appear to explicitly teach
	each node of the second plurality of nodes of the second layer corresponds to at least one node of the first plurality of nodes of the first layer. 
	However, Abdel-Hamid—directed to analogous art—teaches
	each node of the second plurality of nodes of the second layer corresponds to at least one node of the first plurality of nodes of the first layer. (Abdel-Hamid, Figure 2 shows the first few layers of a neural network. The first layer is an input layer which receives the values v. The layer discussed above regarding claim 21 is the convolution layer, which is seen to be the second layer of the neural network. The nodes of the convolution layer are configured to respectively receive input derived from different proper subsets of the v values as described in detail regarding claim 21. This is further described in sections 3.3-3.4. In the first mapping of the “first plurality of nodes” described above with respect to claim 21, the node p1 corresponds to h1(1) and p2 corresponds to h1(2). In the second mapping of the “first plurality of nodes” described above with respect to claim 21, the node p1 corresponds to h1(1) and p2 corresponds to h3(2).)

	        
	Regarding claim 36, Lehman teaches
	 A device comprising: one or more processing devices and one or more data storage devices, one or more processing devices and the one or more data storage devices being configured to implement a keyword detection function by causing the device to perform operations comprising (Lehman, Figure 2, described at [0009, 0022-0027] describes a system for performing online word-spotting which may perform the methods and techniques disclosed therein. In particular, it comprises one or more processing devices and one or more storage devices ([0023]) 
	The remainder of claim 36 is substantially similar to claim 21 and is rejected with the same rationale, mutatis mutandis.

	Claims 37-39 are substantially similar to claims 22, 23, and 25, respectively, and are rejected with the same rationale, mutatis mutandis, in view of the rejection of claim 36.

	Regarding claim 40, Lehman teaches
	One or more non-transitory data storage devices storing instructions that, when executed by one or more processing devices, cause the one or more processing devices to perform operations comprising: (Lehman, [0044-0051] describe an embodiment of the invention described therein as a computer-program product storing instructions which may be implemented by one or more processing devices. [0045] provides specific examples of non-transitory storage devices.)
	The remainder of claim 40 is substantially similar to claim 21 and is rejected with the same rationale, mutatis mutandis.

Claim 29 is rejected under 35 U.S.C. 103 as being unpatentable over “Lehman” (US 2016/0071515 A1) in view of “Abdel-Hamid” (Applying Convolutional Neural Networks Concepts to Hybrid NN-HMM Model for Speech Recognition), further in view of “Rosner” (US 2013/0339028 A1).

	Regarding claim 29, the rejection of claim 21 is incorporated herein. Furthermore, Lehman teaches
	wherein determining whether the utterance includes the keyword based on each of the final output vectors comprises determining whether the utterance includes the keyword from among a set of predetermined keywords ([0015] indicates that the system determines that the keyword may be one of a plurality of keywords. This is further described at [0020-0021] and [0026-0027].)
	The combination of Lehman and Abdel-Hamid does not appear to explicitly teach
	that are each designated as a signal that a mobile device should activate. 
	However, Rosner—directed to analogous art—teaches
	a set of predetermined keywords that are each designated as a signal that a mobile device should activate. (Rosner, Abstract describes a voice activation system. [0028-0029] describes identifying specific, predetermined words within an audio signal. In response to recognizing the specific words, the device transitions to a fully operational state. That is, the device activates. The device may be a mobile device as described at [0029].)
	It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which the invention pertains to modify Lehman/Abdel-Hamid to recognize key words or key phrases that are designated as a signal that a mobile device should activate as taught by Rosner because identifying specific keywords allows for a reduction in power usage by the device as described by Rosner at [0028]. That is, keyword recognition technology was already recognized as being useful for identifying wake words in mobile devices. As Lehman/Abdel-Hamid teaches a method for identifying keywords, it would have been obvious to apply this method to identify keywords in a context in which keyword identification is advantageous to perform.

Claim 30 is rejected under 35 U.S.C. 103 as being unpatentable over “Lehman” (US 2016/0071515 A1) in view of “Abdel-Hamid” (Applying Convolutional Neural Networks Concepts to Hybrid NN-HMM Model for Speech Recognition), further in view of “Basye” (US 2014/0163978 A1).

	Regarding claim 30, the rejection of claim 21 is incorporated herein. Furthermore, Lehman teaches
	wherein determining whether the utterance includes the keyword based on each of the final output vectors comprises determining whether the utterance contains the keyword. (Lehman, steps 107-108 and 460, described at [0021] and [0037], shows smoothing the output of the neural network. As described at [0021], the “output of the smoothing may be a final result 108 which provides an indication whether the keyword(s) are present from the speech”.)
	The combination of Lehman and Abdel-Hamid does not appear to explicitly teach 
	determining whether the utterance contains the keyword spoken by a particular user.
	However, Basye—directed to analogous art—teaches
	determining whether the utterance contains the keyword spoken by a particular user. (Chu, [0026, 0028] describes determining a keyword or phrase voiced by a particular user.)
	It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which the invention pertains to modify the combination of Lehman and Abdel-Hamid to determine a keyword spoken by a particular user as taught by Chu and described above because this allows for speaker dependent voice activation as described at [0048] of Basye.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Sainath (US 2015/0032449 A1) – Abstract describes performing speech recognition using a neural network. Figure 2 shows a layer receiving inputs which are a proper subset of the neural network input.
Maas (Building DNN Acoustic Models for Large Vocabulary Speech Recognition) – Abstract describes using neural networks for performing speech recognition. Figure 3 .

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Markus A Vasquez whose telephone number is (303)297-4432. The examiner can normally be reached Monday to Friday 9AM to 2PM MT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance 





/M.A.V./Examiner, Art Unit 2121                                                                                                                                                                                                        



/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121