Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
The instant application having Application No. 16654041 has a total of 20 claims pending in the application, all of which are ready for examination by the examiner.


I. ACKNOWLEDGEMENT OF REFERENCES CITED BY APPLICANT
Information Disclosure Statement
As required by M.P.E.P 609(c), the applicant’s submissions of the Information Disclosure Statement dated 10/16/2019 is acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending except where lined through. As required by M.P.E.P 609 C(2), a copy of the PTOL-1449 initialed and dated by the examiner is attached to the instant office action.
Any lined through references were either not received by the office or not sufficiently labeled as to be easily recognized as the reference on the IDS, and therefore were not considered by the office. 

III. REJECTIONS NOT BASED ON PRIOR ART

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-16 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-7 of U.S. Patent No. 10762894 B2. Although the claims at issue are not identical, they are not patentably distinct from each other because each of the limitations of the instant claims can be met by those of the patent. 
Instant Application
10762894 B2
Examiners Note
A speech recognition model
Claim 7: “using the convolutional neural network for keyword detection by receiving an audio signal” 
Here the audio signal is being monitored for keywords, which clearly denotes some form of speech recognition.
A convolutional neural network comprising
Claim 1: … “a convolutional neural network…”

A first convolution neural network layer configured to generate a first output from a two-dimensional set of input values, the set of input values comprising input values across a first dimension in time and input values across a second dimension in frequency, and the first output comprising a feature map
Claim 1: “a convolutional neural network that comprises a first convolutional layer and a second convolutional layer… providing a two dimensional set of input values to the convolutional neural network, the input values including values across a first dimension in time and values across a second dimension in frequency… to generate a first output comprising a feature map”

A second convolution neural network layer different than the first convolution neural network layer, the second convolution neural network layer configured to receive the feature map generated by the first convolution neural network layer, and generate a second output using the feature map 
Claim 1: “generating, by the second convolutional layer of the convolutional neural network, using the feature map, a second output” 

a linear low rank layer configured to receive the second output generated by the second convolution neural network layer and generate aa third output using the second output
Claim 1: “generating, by a linear low rank layer, using the second output, a third output.” 

A deep neural network configured to receive the third output generated by the linear low rank layer and generate a fourth output using the third output 
Claim 1: “generating, by a deep neural network, using the third output, a fourth output” 



	As can be shown above, each limitation of the independent claim can be met by limitations from patent no. 10762894 B2. This causes the claim to be rejected under nonstatutory type double patenting.
	As per claims 2-20, these claims are rejected over claims 1-7 of patent no. 10762894 B2 for similar reasons given above. 
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 3, 10, 12, and 19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
As per claim 3, these claims denote the limitation “wherein the deep neural network comprises the softmax layer” while the parent claim denotes “a softmax layer configured to receive the fourth output from the deep neural network.” How can the softmax layer be the deep neural network as well as  receive the fourth input from the deep neural network. These two things are mutually exclusive, as the parent claim requires the deep neural network to send something to the softmax layer, and claim 3 requires that the softmax layer IS the neural network. This causes the claim to be confusing, and therefore rejected under U.S.C. 112(b) for failing to particularly point out and distinctly claim the subject matter which the inventor regards as the invention. 
As per claims 10 and 19, these claims require “Wherein the first convolution neural network layer comprises a filter size in time that spans two-thirds an overall size of the input values across the first dimension in time.” This statement leads to situations which are invalid and impossible. This would lead to requirements where there would be filter sizes of fractions. If the time dimension is 6, then the filter size would be 4 and it would be fine. However, if the time is 7, then this leads to a fraction, and filters cannot have fractional sizes. This causes the claim to be unclear and confusing, as the filter cannot have a size including a fraction and that is part of what the claim covers. This causes the claim to be rejected under U.S.C. 112(b) for failing to particularly point out and claim the subject matter the applicant regards as the invention. 
As per claim 12, this claim requires, “wherein the deep neural network comprises the softmax layer.” The parent claim requires: “generating, by a softmax layer, a final output of the speech recognition model using the fourth output.” This fourth output comes from the deep neural network. How can the softmax layer be both the deep neural network, and receive an output from the deep neural network. This is mutually exclusive, and causes the claim to be confusing. This leads to a rejection of U.S.C. 112(b) for failing to particularly point out and distinctly claim the subject matter which the inventor regards as the invention.
	

IV. REJECTIONS BASED ON PRIOR ART
Examiners Note: Some rejections will be followed by an ‘EN’ that will denote an examiners note. This will be placed to further explain a rejection.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 8-14, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Miao et al (“Improvements to Speaker Adaptive Training of Deep Neural Networks”)  in view of Abdel-Hamid (Hereinafter Abdel, “Convolutional Neural Networks for Speech Recognition”) and Sainath at al (“Low-rank Matrix factorization for Deep Neural networks training with High-Dimensional Output Targets”). 
As per claim 1, Miao discloses, “a speech recognition model comprising”  (Pg.165, particularly the introduction section; EN: this denotes the neural networks being used for speech recognition). 
“a convolutional neural network comprising” (pg.167, particularly section 3.4; EN: this denotes the use of a convolutional neural network). 
“A first convolutional neural network layer configured to generate a first output” (Pg.167, particularly section 3.4; EN: this denotes two convolutional layers). “from a two-dimensional set of input values, the set of input values comprising input values across a first dimension in time and input values across a second dimension inf frequency” (pg.167, particularly section 3.4; EN: This denotes two dimensional inputs along both time and frequency). 
“a second convolution neural network layer different than the first convolution neural network layer, the second convolution neural network layer configured to receive the …. Generated by the first convolution neural network layer and generate a second output using the …” (Pg.167, particularly section 3.4; EN: this denotes two convolutional layers with one receiving data from the other). 
“A deep neural network configured to receive the … output  generated by the … layer and generate a … output using the … output” (pg.167, particularly section 3.4; EN: this denotes the use of fully connected layers after the convolutional layers). 
However, Miao fails to explicitly disclose, “and the first output comprising a feature map”, “the feature map”, “a linear low rank layer configured to receive the second output generated by the second convolution neural network layer and generate a third output using the second output”, “the third output”, “a fourth output.”
Abdel discloses, “and the first output comprising a feature map”, “the feature map” (Pg.1535, particularly Section A; EN: this denotes using feature maps with speech based convolutional neural networks with the two dimensional feature map being based on frequency and time as discussed in the Miao reference above). 
Sainath discloses, “a linear low rank layer configured to receive the second output generated by the second convolution neural network layer and generate a third output using the second output”, “the third output”, “a fourth output” (Pg.6656, particularly section 2; EN: this denotes adding a low-rank layer to the neural network). 
Miao and Abdel are analogous art because both involve speech recognition neural networks. 
Before the effective filing date it would have been obvious to one skilled in the art of speech recognition neural networks to combine the work of Miao and Abdel in order to make use of feature maps with convolutional neural network layers. 
	The motivation for doing so would be to “organize speech feature vectors into feature maps that are suitable for CNN processing” (Abdel, Pg.1535, Section A, second paragraph). 
Therefore before the effective filing date it would have been obvious to one skilled in the art of speech recognition neural networks to combine the work of Miao and Abdel in order to make use of feature maps with convolutional neural network layers.
Miao and Sainath are analogous art because both involve speech recognition neural networks. 
Before the effective filing date it would have been obvious to one skilled in the art of speech recognition neural networks to combine the work of Miao and Sainath in order to make use of a linear low rank layer.  
	The motivation for doing so would be because “a low-rank factorization reduces the number of parameters of the network by 30-50%. This results in roughly an equivalent reduction in training time, without a significant loss in final recognition accuracy, compared to a full-rank representation” (Sainath, abstract). 
Therefore before the effective filing date it would have been obvious to one skilled in the art of speech recognition neural networks to combine the work of Miao and Sainath in order to make use of a linear low rank layer.  
As per claim 2, Miao fails to explicitly disclose, “further comprising a softmax layer configured to receive the fourth output from the deep neural network and generate a final output for the neural network system.”
Sainath discloses, “further comprising a softmax layer configured to receive the fourth output from the deep neural network and generate a final output for the neural network system” (Pg.6656, particularly section 2 and figure 1; EN: This denotes a softmax layer at the end of the neural network).  
Miao and Sainath are analogous art because both involve speech recognition neural networks. 
Before the effective filing date it would have been obvious to one skilled in the art of speech recognition neural networks to combine the work of Miao and Sainath in order to make use of a softmax layer. 
	The motivation for doing so would be because “Figure 1 shows a typical neural network architecture for speech recognition problems, namely 5 hidden layers with 1,024 hidden units per layer, and a softmax layer with 2220 output targets” (Sainath, pg.6656, section 2). 
Therefore before the effective filing date it would have been obvious to one skilled in the art of speech recognition neural networks to combine the work of Miao and Sainath in order to make use of a softmax layer.
As per claim 3, Sainath discloses, “Wherein the deep neural network comprises the softmax layer” (Pg.6656, particularly section 2 and figure 1; EN: This denotes a softmax layer at the end of the neural network).  
As per claims 4 and 13, Miao discloses, “wherein an accuracy of the final output is used to update the convolution neural network” (Pg.167, particularly C2, second paragraph; EN: this denotes training via back propagation, which propagates errors back through the neural network in order to train/update the neural network). 
As per claims 5 and 14, Miao discloses, “wherein the feature … comprises a first matrix” (Pg.167, particularly section 3.4; EN: this denotes the use of matrices to store the values being passed through the neural network). 
“the second output comprises a second matrix” (Pg.167, particularly section 3.4; EN: this denotes the use of matrices to store the values being passed through the neural network).
“… creating a vector from the second matrix and generating the third output using the vector…” (Pg.166-167, particularly section 3.2; EN: this denotes the neural network passing through vectors as its operating). 
Sainath discloses, “the linear low rank layer is configured to generate the third output by…” (Pg.6656, particularly section 2; EN: this denotes adding a low-rank layer to the neural network).
As per claim 8, Miao discloses, “Wherein the convolutional neural network is configured to: receive an audio signal encoding an utterance” (pg.166, particularly section 3.1; EN: this denotes working with utterances). 
“analyze the audio signal to identify a command included in the utterance” (Abstract; EN: this denotes performing speech recognition, which would include commands given within the speech). 
As per claims  9 and 18, Miao discloses, “wherein the convolution neural network further comprises at least one max-pooling layer configured to remove variability in the input values in the first dimension and the input values in the second dimension” (pg.167, particularly section 3.4; EN: this denotes the use of a max pooling layer for normalization, which would reduce variability). 
As per claims 10 and 19, Miao discloses, “Wherein the first convolutional layer comprises a filter size in time that spans … an overall size of the input values across the first dimension in time” (pg.167, particularly section 3.4; EN: this denotes inputs of 29x29 and filter sizes of 4x4x1). 
	However, Miao fails to explicitly disclose, “two thirds of the input values” 
	Abdel discloses, “two thirds of the input values” (Pg.1535, particularly C2, last paragraph; EN: this denotes an input range of 9-15 frames; Pg.1538, particularly Fig.4; EN; this denotes a filter size of 5, which would be roughly 2/3 of an input range of 9. Pg.1541, particularly Fig.7; EN: this denotes a filter size of 8, which would be 2/3 a filter size of 12). 
Furthermore, the Examiner is taking official notice that selecting a particular stride is routine experimentation and it is not inventive to discover the optimum or workable ranges via routine experimentation. Someone of ordinary skill in the art  would be able to pick various filter sizes which meet the needs of their neural network and selecting a particular filter size for a neural network is nothing more than routine optimization. See In re Aller, 220 F.2d 454, 456, 105 USPQ 233, 235 (CCPA 1955) and MPEP 2144.05(II). The rationale is that multiple references show different filter sizes  and merely choosing a particular filter stride is a routine aspect of designing and operating a neural network. 
Miao and Abdel are analogous art because both involve speech recognition neural networks. 
Before the effective filing date it would have been obvious to one skilled in the art of speech recognition neural networks to combine the work of Miao and Abdel in order to make use of filter sizes of 2/3 the input values.  
	The motivation for doing so would be to improve the error rate by setting an appropriate filter size (See Abdel, Pg.1542, Fig.10) or in the case of Miao, allow the filter size to be adjusted to get the best performance of the neural network.  
Therefore before the effective filing date it would have been obvious to one skilled in the art of speech recognition neural networks to combine the work of Miao and Abdel in order to make use of filter sizes of 2/3 the input values.  
As per claim 11, Miao discloses, “a method for training a speech recognition model, the method comprising” (Abstract; EN: this denotes the training of the neural network). 
“generating, by a first layer of a convolution neural network” (Pg.167, particularly section 3.4; EN: this denotes two convolutional layers). “a first output from a two-dimensional set of input values, the set of input values comprising input values across a first dimension in time and input values across a second dimension in frequency” (pg.167, particularly section 3.4; EN: This denotes two dimensional inputs along both time and frequency). 
“Generating, by a second layer of the convolution neural network, a second output using the feature…” (Pg.167, particularly section 3.4; EN: this denotes two convolutional layers with one receiving data from the other).
“Generating, by a deep neural network, a fourth output using the … output” (pg.167, particularly section 3.4; EN: this denotes the use of fully connected layers after the convolutional layers).
However, Miao fails to explicitly disclose, ““and the first output comprising a feature map”, “the feature map”, “generating, by a linear low rank layer, a third output using the second output”, “third output”, , “generating, by a softmax layer, a final output of the speech recognition model using the fourth output”
Abdel discloses, “and the first output comprising a feature map” and “the feature map” (Pg.1535, particularly Section A; EN: this denotes using feature maps with speech based convolutional neural networks with the two dimensional feature map being based on frequency and time as discussed in the Miao reference above).
Sainath discloses, “generating, by a linear low rank layer, a third output using the second output” (Pg.6656, particularly section 2; EN: this denotes adding a low-rank layer to the neural network).
“generating, by a softmax layer, a final output of the speech recognition model using the fourth output” (Pg.6656, particularly section 2 and figure 1; EN: This denotes a softmax layer at the end of the neural network).  
Miao and Abdel are analogous art because both involve speech recognition neural networks. 
Before the effective filing date it would have been obvious to one skilled in the art of speech recognition neural networks to combine the work of Miao and Abdel in order to make use of feature maps with convolutional neural network layers. 
	The motivation for doing so would be to “organize speech feature vectors into feature maps that are suitable for CNN processing” (Abdel, Pg.1535, Section A, second paragraph). 
Therefore before the effective filing date it would have been obvious to one skilled in the art of speech recognition neural networks to combine the work of Miao and Abdel in order to make use of feature maps with convolutional neural network layers.
Miao and Sainath are analogous art because both involve speech recognition neural networks. 
Before the effective filing date it would have been obvious to one skilled in the art of speech recognition neural networks to combine the work of Miao and Sainath in order to make use of a linear low rank layer.  
	The motivation for doing so would be because “a low-rank factorization reduces the number of parameters of the network by 30-50%. This results in roughly an equivalent reduction in training time, without a significant loss in final recognition accuracy, compared to a full-rank representation” (Sainath, abstract). 
Therefore before the effective filing date it would have been obvious to one skilled in the art of speech recognition neural networks to combine the work of Miao and Sainath in order to make use of a linear low rank layer.  
Miao and Sainath are analogous art because both involve speech recognition neural networks. 
Before the effective filing date it would have been obvious to one skilled in the art of speech recognition neural networks to combine the work of Miao and Sainath in order to make use of a softmax layer. 
	The motivation for doing so would be because “Figure 1 shows a typical neural network architecture for speech recognition problems, namely 5 hidden layers with 1,024 hidden units per layer, and a softmax layer with 2220 output targets” (Sainath, pg.6656, section 2). 
Therefore before the effective filing date it would have been obvious to one skilled in the art of speech recognition neural networks to combine the work of Miao and Sainath in order to make use of a softmax layer.
As per claim 12, Sainath discloses, “Wherein the deep neural network comprises the softmax layer” (Pg.6656, particularly section 2 and figure 1; EN: This denotes a softmax layer at the end of the neural network).  
As per claim 20, Miao discloses, “Further comprising, after training the speech recognition model, providing the trained speech recognition model to a device for use by the device for keyword detection of one or more key phrases” (Pg.165, particularly the introduction section; EN: this denotes the neural networks being used for speech recognition).

Claim Rejections - 35 USC § 103
Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Miao et al (“Improvements to Speaker Adaptive Training of Deep Neural Networks”)  in view of Abdel-Hamid (Hereinafter Abdel, “Convolutional Neural Networks for Speech Recognition”) and Sainath at al (“Low-rank Matrix factorization for Deep Neural networks training with High-Dimensional Output Targets”) and further in view of Gibiansky (“Convolutional Neural Networks) and Toth (“Combining time and frequency domain convolution in convolutional neural network based phone recognition”)
As per claim 6, Miao discloses, “Wherein the first convolution neural network layer is configured to generate the feature … by performing convolution … on the two-dimensional set of input values for a filter that has a time span that extends over all the input values in the first dimension” (Pg.167, particularly section 3.4; EN: this denotes the performance of convolution on the two-dimensional input and has no distinction of span, so it is presumed to be the entirety of time). “and a frequency…of the input values in the second dimension” (Pg.167, particularly section 3.4; EN: this denotes the performance of convolution on the two-dimensional input and has no distinction of span, so it is presumed to be the entirety of time).
Abdel discloses, “feature map” (Pg.1535, particularly Section A; EN: this denotes using feature maps with speech based convolutional neural networks with the two dimensional feature map being based on frequency and time as discussed in the Miao reference above).
However, Miao fails to explicitly disclose, “convolution multiplication” and “a frequency span that extends over less than all of the input values in the second dimension.” 
Gibiansky discloses, “convolution multiplication” (Pg.2-3, particularly the “convolutional layers” section; EN: this denotes the actual mathematics of performing convolution, which includes multiplication). 
Toth discloses, “a frequency span that extends over less than all of the input values in the second dimension” (Pg.191, particularly C2, first paragraph; EN: this denotes a locked time window and an optimized frequency size (i.e. one that will change and not be the entire range)).
Miao and Gibiansky are analogous art because both involve convolutional  neural networks. 
At the time of invention it would have been obvious to one skilled in the art of convolutional  neural networks to combine the work of Miao and Toth in order to allow the use of matrix multiplication with convolution.   
	The motivation for doing so would be to use the mathematics needed to perform the convolution of the Miao reference by “sum[ming] up the contributions (weighted by the filter components) from the previous layer cells… this is just a convolution, which we can express via matlab…” (Gibiansky, Pg. 2-3, convolution layers section). 
Therefore at the time of invention it would have been obvious to one skilled in the art of convolutional  neural networks to combine the work of Miao and Toth in order to have a varied frequency size for the filter.  
Miao and Toth are analogous art because both involve convolutional  neural networks. 
At the time of invention it would have been obvious to one skilled in the art of convolutional  neural networks to combine the work of Miao and Toth in order to have a varied frequency size for the filter.  
	The motivation for doing so would be to find “the optimal size along frequency… experimentally” (Toth, Pg. 191, C2, first paragraph) or in the case of Miao, allow the system to choose the frequency filter range as optimal to the system. 
Therefore at the time of invention it would have been obvious to one skilled in the art of convolutional  neural networks to combine the work of Miao and Toth in order to have a varied frequency size for the filter.  
As per claim 16, Miao discloses, “wherein generating, by the first layer of the convolution neural network, the first output comprises performing convolution … on the two-dimensional set of input values for a filter that has a time span that extends over all of the input values in the first dimension” (Pg.167, particularly section 3.4; EN: this denotes the performance of convolution on the two-dimensional input and has no distinction of span, so it is presumed to be the entirety of time). “and a frequency span that extends over … the input values in the second dimension” (Pg.167, particularly section 3.4; EN: this denotes the performance of convolution on the two-dimensional input and has no distinction of span, so it is presumed to be the entirety of time).
However, Miao fails to explicitly disclose, “convolution multiplication” and “a frequency span that extends over less than all of the input values in the second dimension.”
Gibiansky discloses,  “convolution multiplication” (Pg.2-3, particularly the “convolutional layers” section; EN: this denotes the actual mathematics of performing convolution, which includes multiplication).
Toth discloses, “a frequency span that extends over less than all of the input values in the second dimension” (Pg.191, particularly C2, first paragraph; EN: this denotes a locked time window and an optimized frequency size (i.e. one that will change and not be the entire range)).
Miao and Gibiansky are analogous art because both involve convolutional  neural networks. 
At the time of invention it would have been obvious to one skilled in the art of convolutional  neural networks to combine the work of Miao and Toth in order to allow the use of matrix multiplication with convolution.   
	The motivation for doing so would be to use the mathematics needed to perform the convolution of the Miao reference by “sum[ming] up the contributions (weighted by the filter components) from the previous layer cells… this is just a convolution, which we can express via matlab…” (Gibiansky, Pg. 2-3, convolution layers section). 
Therefore at the time of invention it would have been obvious to one skilled in the art of convolutional  neural networks to combine the work of Miao and Toth in order to have a varied frequency size for the filter.  
Miao and Toth are analogous art because both involve convolutional  neural networks. 
At the time of invention it would have been obvious to one skilled in the art of convolutional  neural networks to combine the work of Miao and Toth in order to have a varied frequency size for the filter.  
	The motivation for doing so would be to find “the optimal size along frequency… experimentally” (Toth, Pg. 191, C2, first paragraph) or in the case of Miao, allow the system to choose the frequency filter range as optimal to the system. 
Therefore at the time of invention it would have been obvious to one skilled in the art of convolutional  neural networks to combine the work of Miao and Toth in order to have a varied frequency size for the filter.  

Claim Rejections - 35 USC § 103
Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Miao et al (“Improvements to Speaker Adaptive Training of Deep Neural Networks”)  in view of Abdel-Hamid (Hereinafter Abdel, “Convolutional Neural Networks for Speech Recognition”), Sainath at al (“Low-rank Matrix factorization for Deep Neural networks training with High-Dimensional Output Targets”), Gibiansky (“Convolutional Neural Networks) and Toth (“Combining time and frequency domain convolution in convolutional neural network based phone recognition”), and further in view of Chang et al (“Robust CNN-based Speech Recognition with Gabor Filter Kernels”).
As per claims 7 and 17, Miao discloses, “Wherein performing the convolution … on the two-dimensional set of input values comprises performing the convolution … on the two-dimensional set of input values for the filter using a frequency stride… and a time stride equal to one” (pg.167, particularly section 3.4; EN: This denotes performing the convolution on the time and frequency, and further denotes a stride of 1). 
	Gibiansky discloses, “convolution multiplication” (Pg.2-3, particularly the “convolutional layers” section; EN: this denotes the actual mathematics of performing convolution, which includes multiplication).
	Miao fails to explicitly disclose, “using a frequency stride greater than one” 
	Chang discloses, “using a frequency stride greater than one” (pg.3, particularly C2, third paragraph; EN: this denotes a stride of 2 for frequency in a convolutional neural network). 
	Furthermore, the Examiner is taking official notice that selecting a particular stride is routine experimentation and it is not inventive to discover the optimum or workable ranges via routine experimentation. Someone of ordinary skill in the art  would be able to pick various strides which meet the needs of their neural network and selecting a particular stride number for a neural network is nothing more than routine optimization. See In re Aller, 220 F.2d 454, 456, 105 USPQ 233, 235 (CCPA 1955) and MPEP 2144.05(II). The rationale is that multiple references show different strides, and merely choosing a particular stride is a routine aspect of designing and operating a neural network. 
Miao and Chang are analogous art because both involve convolutional  neural networks. 
At the time of invention it would have been obvious to one skilled in the art of convolutional  neural networks to combine the work of Miao and Chang in order to pick a stride greater than one.   
	The motivation for doing so would be to “reduce[] dimensionality by a factor of 2” (Chang, Pg.3, C2, third paragraph) or in the case of Miao, allow the system to be more efficient via reduced dimensionality. 
Therefore at the time of invention it would have been obvious to one skilled in the art of convolutional  neural networks to combine the work of Miao and Chang in order to pick a stride greater than one.   

Claim Rejections - 35 USC § 103
Claims 15  is rejected under 35 U.S.C. 103 as being unpatentable over Miao et al (“Improvements to Speaker Adaptive Training of Deep Neural Networks”)  in view of Abdel-Hamid (Hereinafter Abdel, “Convolutional Neural Networks for Speech Recognition”) and Sainath at al (“Low-rank Matrix factorization for Deep Neural networks training with High-Dimensional Output Targets”) and further in view of Wheeler et al (“Voice recognition will always be stupid”).
As per claim 15, Miao discloses, “further comprising using the convolution neural network for keyword detection by: receiving an audio signal encoding an utterance” (pg.166, particularly section 3.1; EN: this denotes working with utterances).
“analyzing the audio signal to identify a command included in the utterance” (Abstract; EN: this denotes performing speech recognition, which would include commands given within the speech).
However, Miao fails to explicitly disclose, “performing an action that corresponds to the command” 
Wheeler discloses, “Performing an action that corresponds to the command" (pg.1; EN: this denotes speech commands being used for customer support). 
Wheeler and Miao are analogous art because both involve speech recognition.
At the time of invention it would have been obvious to one skilled in the art of speech detection to combine the work of Miao and Wheeler in order to make use of speech detection in a device. 
	The motivation for doing so would be to provide “non-human customer service” (Wheeler, Pg.1) or in the case of Miao, allow the systems speech recognition to be used for customer service or other machine based responses. 
Therefore at the time of invention it would have been obvious to one skilled in the art of speech detection to combine the work of Miao and Wheeler in order to make use of speech detection in a device.

Conclusion
The examiner requests, in response to this Office action, support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line no(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.
When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections See 37 CFR 1.111(c). 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BEN M RIFKIN whose telephone number is (571)272-9768. The examiner can normally be reached Monday-Friday 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, James Trujillo can be reached on (571) 272-3677. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BEN M RIFKIN/Primary Examiner, Art Unit 2198