DETAILED ACTION
Claims 1-43 have been examined.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
 (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 37-40, 42, and 43 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Jaderberg et al. “Speeding up Convolutional Neural Networks with Low Rank Expansions” (hereinafter Jaderberg).

As per claim 37, Jaderberg teaches the claims as recited, including a convolutional neural network (CNN) processing apparatus comprising:
a processor configured to: 
perform a first convolution operation between a first part of an input and a first part of a kernel (i.e., exploit redundancy between different filters and feature channels to approximate the filter set by a linear combination of a smaller basis of M filters, perform low-rank decomposition in the channel dimension as well, filters operating on each input cannel can be approximated as a linear combination of a basis of M separable filters, see at least pages 2-5, sections 2, 2.1; EN: one the separable filter operating on an input is performing a first convolution operation), and
perform a second convolution operation between a second part of the input and a second part of the kernel in response to a result of the first convolution operation meeting a predetermined criterion (i.e., perform low-rank decomposition in the channel dimension as well, filters operating on each input cannel can be approximated as a linear combination of a basis of M separable filters, if the approximation is efficient, see at least pages 2-5, sections 2, 2.1, pages 5-7, section 2.2; EN: one the separable filter operating on an input is performing a second convolution operation);
wherein the kernel used in performing the first convolution operation is also used in performing the second convolution operation (i.e., approximate filter set by a linear combination of smaller basis of M filters, see at least pages 2-5, sections 2, 2.1).

As per claim 38, Jaderberg teaches wherein a combination of the first part of the input and the second part of the input is an entirety of the input (i.e., low-rank decomposition in the channel dimension, see at least pages 2-5, sections 2, 2.1); and 
a combination of the first part of the kernel and the second part of the kernel is an entirety of the kernel (i.e., approximate filter set by a linear combination of smaller basis of M filters, see at least pages 2-5, sections 2, 2.1).

As per claim 39, Jaderberg teaches wherein a sum of processing resources required to perform the first convolution operation and processing resources required to perform the second convolution operation is less than processing resources required to perform a convolution operation between the entirety of the input and the entirety of the kernel (i.e., speedup in computation, see at least pages 2-5, sections 2, 2.1).
As per claim 40, Jaderberg teaches wherein a sum of a processing time required to perform the first convolution operation and a processing time required to perform the second convolution operation is less than a processing time required to perform a convolution operation between the entirety of the input and the entirety of the kernel (i.e., speedup in computation, see at least pages 2-5, sections 2, 2.1).

As per claim 42, Jaderberg teaches wherein the kernel comprises a kernel map having a size of K x K kernel elements (i.e., filter of size d x d, see at least pages 3-5, section 2.1); and
	the first part of the kernel is a sub-kernel map having a size of 1 x 1 or 1 x K or K x 1 kernel elements selected from the Kx K kernel elements of the kernel map (i.e., separable filters of size d x 1, see at least pages 3-5, section 2.1).

As per claim 43, Jaderberg teaches wherein the input is an image or a voice (see at least pages 7-10, section 3); and
the processor is further configured to perform a recognition operation or an authentication operation based on a combination of a result of the first convolution operation and a result of the second convolution operation (i.e., basis feature maps are then linearly combined, see at least pages 2-5, sections 2, 2.1, pages 7-10, section 3).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 6, 9, 18-22, 25, 34, and 35 are rejected under 35 U.S.C. 103 as being unpatentable over Almahairi et al., “Dynamic Capacity Networks” (hereinafter Almahairi), in view of Shoaib et al. (US 2017/0132496, hereinafter Shoaib), further in view of Shan et al. “A Dynamic Multi-Precision Fixed-Point Data Quantization Strategy for Convolutional Neural Network” (hereinafter Shan).

As per claim 1, Almahairi teaches the invention as claimed, including a processor-implemented convolutional neural network (CNN) processing method (i.e., we evaluate all models on an NVIDIA Titan Black GPU card, see at least Chapter 6) comprising: 
selecting a survival network in a precision convolutional network based on a result of performing a high speed convolution operation between an input and a kernel using a high speed convolutional network; (i.e., our model applies the coarse layers on the whole image to get fc(x), chooses a set of salient patches xs, applies the fine layers only on the salient patches xs to obtain a set of few fine representation vectors ff(Xs), and finally combines them to make its prediction, see at least Fig. 1, chapter 6.1; EN: coarse layer is the high speed convolutional network, fine layer is the precision convolutional network, combining them to make its prediction is selecting the survival network; applying the coarse layers on input is the operation between an input and a kernel (filter) using a high speed convolutional network); and
performing a precision convolution operation between the input and the kernel using the survival network (i.e.,  the DCN can leverage the capacity of ff, but at a lower computational cost, by applying the fine layers only on a small portion of the input, see at least chapters 2, 6.1; EN:  [chapter 6.1] Fine layers: applying the fine layers only on a small portion of the input is performing a precision convolution operation between the input and the kernel).
Almahairi does not explicitly state wherein the kernel used in performing the high-speed convolution operation is also used in performing the precision convolution operation. 
Shoaib teaches a same convolutional kernel is applied in each convolutional layers (see at least [0051], [0072]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention such that in Almahairi, the kernel used in performing the high-speed convolution operation is also used in performing the precision convolution operation because Almahairi does not limit the filter that is used at the coarse and fine layers, it would have been obvious to allow known techniques of using filters in the art such as where a same kernel is applied in different convolutional layers (see at least [0051], [0072] of Shoaib).
Almahairi does not explicitly teach a convolutional layer in which the high speed convolution operation is performed is the same as a convolutional layer in which the precision convolution operation is performed.
Shan teaches a convolutional layer in which the high speed convolution operation is performed is the same as a convolutional layer in which the precision convolution operation is performed (i.e., dynamic multi-precision fixed-point data quantization for every inner-layer computation, see at least page 103, paragraph 5, pages 107-109, sections 4.1, 4.2).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Almahairi such that a convolutional layer in which the high speed convolution operation is performed is the same as a convolutional layer in which the precision convolution operation is performed as similarly taught by Shan because dynamic multi-precision fixed-point data quantization for every inner-layer computation can improve accuracy of a CNN (see at least page 103, paragraph 5, page 111, section 6).

As per claim 2, Almahairi teaches the method of claim 1, wherein the selecting comprises: 
obtaining an index of an input element and an index of a kernel element based on the result of the high speed convolution operation, the input element and the kernel element each contributing to the precision convolution operation (see at least chapter 2.1, Ci,j is a representation vector associated with the input region (i,j) . . . we apply the fine layers only on the selected patches - EN: (i,j) is the index of input elements calculated using the coarse (high speed) convolution operation; the kernel of coarse layer is the index of kernel element contributing to the selection of the selected patches that are passed to the fine layers); and 
selecting the survival network based on the index of the input element and the index of-the kernel element (see at least chapter 2.1, the DCN output is obtained by feeding the refined representation into the top layers – EN: DCN is the survival network; refined representation is acquired from the saliency map M(i, j); M is calculated using the index of the input element and the kernel of coarse layer).

As per claim 3, Almahairi teaches the method of claim 1, and wherein the selecting comprises: 
obtaining an approximate output by performing the high speed convolution operation (see at least chapter 2.1, given an input image x, we first apply the coarse layers on all input regions to compute the coarse representation vectors - EN: the coarse representation vectors is the approximate output);
selecting an output element contributing to the precision convolution operation from output elements of the approximate output (see at least chapter 2.1, Using the saliency map M, we select a set of k input region positions with the highest saliency values - EN: a set of k input regions is the output element contributing to the precision convolution operation); and
backwardly selecting a survival network associated with the selected output element in the precision convolutional network (see at least chapter 2.1, the DCN output is obtained by feeding the refined representation into the top layers, g(fr(x)) And We denote the representation resulting from combining vectors from both fc(x) and ff(Xs) as the refined representation fr(x), see at least chapter 2.2, Gradients are computed by standard back-propagation through the refined model - EN: ff is the precision network; ff(Xs) is the selected output element in the precision convolutional network; DCN is the survival network; DCN is selected backwardly using back propagation).

As per claim 6, Almahairi teaches wherein the selecting of the output element comprises:
performing a rectified linear unit (ReLU) operation on the output elements of the approximate output using an ReLU layer (see at least chapter 6.1, We use rectifier non-linearities in all layers; EN: rectifier non-linearities is ReLU);
performing a max pooling operation on output elements of result of the ReLU operation using a max pooling layer (see at least chapter 6.1, We use 2 convolutional layers as coarse
layers ... followed by global max pooling); and
selecting, from output elements of a result of the max pooling operation, an output element having a representative value representative of a region of output elements of the result of the ReLU operation (see at least chapter 2.1, Note that computing all entries in matrix M can be done through the top layers, and see at least chapter 2, Top layers g consider as input the bottom layers' representations f(x), EN: M is the output element having a representative value
representative of a region of output elements of the result of the ReLU operation; A bottom layer performs a maxpooling operation and the result is passed to the top layer).

As per claim 9, Almahairi teaches wherein the selecting comprises:
obtaining an approximate output by performing the high speed convolution operation
(see at least chapter 2.1, given an input image x, we first apply the coarse layers on all input regions to compute the coarse representation vectors - EN: the coarse representation vectors is the approximate output);
selecting an output element contributing to the precision convolution operation from output elements of the approximate output (see at least chapter 2.1, Using the saliency map M, we select a set of k input region positions with the highest saliency values - EN: a set of k input regions is the output element contributing to the precision convolution operation); and 
selecting the survival network by backwardly eliminating a redundant network not associated with the selected output element from the precision convolutional network (see at least chapter 2.1, the DCN output is obtained by feeding the refined representation into the top layers – EN: eliminating a redundant network not associated with the selected output element is same as not selecting, for example, the coarse network for processing salient regions)

As per claim 18, Almahairi teaches wherein the performing of the high speed convolution operation comprises performing the high speed convolution operation using an approximation algorithm of a matrix multiplication operation (see at least chapter 4.2.2, The representation, produced by either the fine or coarse layers, is a probability map, and see at least equation 12; EN: the weight matrix (w) is multiplied with the coarse representation (p); probability map is calculated using approximation).

As per claim 19, Almahairi teaches non-transitory computer-readable medium storing instructions that, when executed by a processor, control the processor to perform the method of claim 1 (see at least chapter 4).

As per claim 20, Almahairi teaches a processor-implemented convolutional neural network (CNN) processing method (see at least chapter 6, We evaluate all models on an NVIDIA Titan Black GPU card) comprising: 
generating an approximate output by performing a high speed convolution operation between an input and a kernel (see at least chapter 2.1, Given an input image x, we first apply the coarse layers on all input regions to compute the coarse representation vectors, and see at least chapter 6.1, Coarse layers: 2 convolutional layers – EN: the coarse representation vectors is the approximate output);
selecting a survival network in a convolutional layer based on the approximate output; and (see at least chapter 2.1, the DCN output is obtained by feeding the refined representation into the top layers, g(fr(x)) – EN: DCN is the survival network; the refined representation is acquired from approximate output);
performing a precision convolution operation between the input and the kernel using the survival network (see at least chapter 2, This way, the DCN can leverage the capacity of ff, but at a lower computational cost, by applying the fine layers only on a small portion of the input – EN: applying the fine layers only on a small portion of the input is performing a precision convolution operation between the input and the kernel using the survival network).
Almahairi does not explicitly state wherein the kernel used in performing the high-speed convolution operation is also used in performing the precision convolution operation. 
Shoaib teaches a same convolutional kernel is applied in each convolutional layers (see at least [0051], [0072]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that in Almahairi, the kernel used in performing the high-speed convolution operation is also used in performing the precision convolution operation because Almahairi does not limit the filter that is used at the coarse and fine layers, it would have been obvious to allow known techniques of using filters in the art such as where a same kernel is applied in different convolutional layers (see at least [0051], [0072] of Shoaib).
Almahairi does not explicitly teach a convolutional layer in which the high speed convolution operation is performed is the same as a convolutional layer in which the precision convolution operation is performed.
Shan teaches a convolutional layer in which the high speed convolution operation is performed is the same as a convolutional layer in which the precision convolution operation is performed (i.e., dynamic multi-precision fixed-point data quantization for every inner-layer computation, see at least page 103, paragraph 5, pages 107-109, sections 4.1, 4.2).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Almahairi such that a convolutional layer in which the high speed convolution operation is performed is the same as a convolutional layer in which the precision convolution operation is performed as similarly taught by Shan because dynamic multi-precision fixed-point data quantization for every inner-layer computation can improve accuracy of a CNN (see at least page 103, paragraph 5, page 111, section 6).

As per claim 21, Almahairi teaches the method of claim 20 and wherein the selecting comprises: 
	selecting an output element contributing to the precision convolution operation from output elements of the approximate output (see at least chapter 2.1, Using the saliency map M, we select a set of k input region positions with the highest saliency values And we apply the fine layers ff only on the selected patches – EN: M is the output element contributing to the precision convolution operation from output elements of the approximate output); and
backwardly selecting a survival network associated with the selected output element from networks in the convolutional layer (see at least chapter 2.1, the DCN output is obtained by feeding the refined representation into the top layers, g(fr(x)) And We denote the representation resulting from combining vectors from both fc(x) and ff(Xs) as the refined representation fr(x), see at least chapter 2.2, Gradients are computed by standard back-propagation through the refined model - EN: ff is the precision network; ff(Xs) is the selected output element in the precision convolutional network; DCN is the survival network; DCN is selected backwardly using back propagation).

As per claim 22, Almahairi teaches the method of claim 21 and wherein the backwardly selecting of the survival network comprises selecting the survival network based on an input element associated with the selected output element (see at least chapter 2.1, The saliency M of an input region position (i, j) is given by the norm of the gradient of the entropy H with respect to the coarse vector ci,j And the DCN output is obtained by feeding the refined representation into the top layers – EN: DCN is the survival network; refined representation is acquired from M(i,j); M is associated with the input elements xi,j) and a kernel element associated with the selected output element (see at least Fig. 1, chapter 2.1, applies the fine layers only on the salient patches Xs – EN: kernel elements of the fine layers are the kernel element associated with the selected output element).

As per claim 25, the limitations recited in this claim are substantially similar to the limitations recited in claim 6. Therefore, claim 25 is rejected using the same reasons as claim 6. 

As per claim 34, the limitations recited in this claim are substantially similar to the limitations recited in claim 1. Therefore, claim 34 is rejected using the same reasons as claim 1. 

As per claim 35, the limitations recited in this claim are substantially similar to the limitations recited in claim 20. Therefore, claim 35 is rejected using the same reasons as claim 20. 

Claims 4, 5, 7, 8, 23, 24, 26, and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Almahairi, in view of Shoaib, further in view of Shan, further in view of Reagen et al., “Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators” (hereafter Reagen).

As per claim 4, Almahairi teaches wherein the selecting of the output element comprises: performing a rectified linear unit (ReLU) operation on the output elements of the approximate output using an ReLU layer; and selecting an output element having value from output elements of a result of the ReLU operation. (see at least chapter 6.1, We use rectifier non-linearities in all layers – EN: rectifier non-linearities is ReLU).
Almahairi does not appear to explicitly recite “a non-zero value”
However, Reagen teaches a non-zero value (see at least chapter 3.1, a thresholding operation is added to the activation function of each DNN layer. This function checks each activity value and zeros all activities below the threshold, removing them from the prediction computation).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Almahairi such that the value is a non-zero value because many variations of ReLU are used in practice, and it would have been obvious to substitute the ReLU of Almahairi with the modified ReLU of Reagen to select non-zero elements.

As per claim 5, Almahairi teaches the method of claim 3 wherein the selecting of the output element comprises:  and selecting an output element having a non-zero value from output elements of the approximate output determined to exceed the threshold in the comparison operation.(see at least [chapter 2.1] Using the saliency map M, we select a set of k input region positions with the highest saliency values – Note: M is the approximate output).
Almahairi does not appear to explicitly recite performing a comparison operation to determine whether the output elements of the approximate output exceed a threshold, having a non-zero value, and determined to exceed the threshold in the comparison operation.
Reagen teaches: performing a comparison operation to determine whether the output elements of the approximate output exceed a threshold ... having a non-zero value ... determined to exceed the threshold in the comparison operation ([chapter 3.1] a thresholding operation is added to the activation function of each DNN layer. This function checks each activity value and zeros all activities below the threshold, removing them from the prediction computation).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Almahairi to perform a comparison operation to determine whether the output elements of the approximate output exceed a threshold, having a non-zero value, and determined to exceed the threshold in the comparison operation as similarly taught by Reagen to add activation function of each DNN layer, to zero all activities below the threshold, removing them from the prediction computation.

As per claim 7, Almahairi teaches 	wherein the selecting of the output element comprises: 
performing a rectified linear unit (ReLU) operation on the output elements of the approximate output using an ReLU layer (see at least [chapter 6.1] We use rectifier non-linearities in all layers)
selecting, from the output elements in the group, an output element having a maximum value among output elements (see at least [chapter 6.1] We use 2 convolutional layers as coarse layers, 5 convolutional layers as fine layers and one convolutional layer followed by global max pooling – Note: max pooling selects a maximum value).
Almahairi does not appear to explicitly recite “grouping output elements of a result of the ReLU operation into at least one group; and among output elements having non-zero values in the group, or an output element having a value less than the maximum value by a difference less than a threshold.”
Reagen teaches grouping output elements of a result of the ReLU operation into at least one group; and ([chapter 3.1] a thresholding operation is added to the activation function of each DNN layer. This function checks each activity value and zeros all activities below the threshold, removing them from the prediction computation – Note: non-zero output of the function is a group), among output elements having non-zero values in the group, or an output element having a value less than the maximum value by a difference less than a threshold ([chapter 3.1] a thresholding operation is added to the activation function of each DNN layer. This function checks each activity value and zeros all activities below the threshold, removing them from the prediction computation – Note: This claim includes alternative limitations. Only one limitation is required for the purpose of claim interpretation)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Almahairi to group output elements of a result of the ReLU operation into at least one group; and among output elements having non-zero values in the group, or an output element having a value less than the maximum value by a difference less than a threshold as similarly taught by Reagen to add activation function of each DNN layer, to zero all activities below the threshold, removing them from the prediction computation.

As per claim 8, Almahairi teaches selecting, from the output elements in the group, an output element having a maximum value among output elements ([chapter 6.1] We use 2 convolutional layers as coarse ... followed by global max pooling – Note: max pooling selects a maximum value).
Almahairi does not appear to explicitly recite “wherein the selecting of the output element comprises: performing a comparison operation to determine whether the output elements of the approximate output exceed a first threshold; grouping output elements of the approximate output determined to exceed the first threshold in the comparison operation into at least one group; and among output elements having non-zero values in the group, or an output element having a value less than the maximum value by a difference less than a second threshold”
Reagen teaches wherein the selecting of the output element comprises: performing a comparison operation to determine whether the output elements of the approximate output exceed a first threshold ([chapter 3.1] a thresholding operation is added to the activation function of each DNN layer. This function checks each activity value and zeros all activities below the threshold, removing them from the prediction computation – Note: checking each activity value and zeroing all activities below the threshold, removing them from the prediction computation is performing a comparison operation to determine whether the output elements of the approximate output exceed a first threshold)
	grouping output elements of the approximate output determined to exceed the first threshold in the comparison operation into at least one group; and ([chapter 3.1] a thresholding operation is added to the activation function of each DNN layer. This function checks each activity value and zeros all activities below the threshold, removing them from the prediction computation – Note: non-zero output of the function is a group)
	output elements having non-zero values in the group ([chapter 3.1] This function checks each activity value and zeros all activities below the threshold, removing them from the prediction computation – Note: This claim includes alternative limitations. Only one limitation is required for the purpose of claim interpretation).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Almahairi to perform a comparison operation to determine whether the output elements of the approximate output exceed a first threshold; grouping output elements of the approximate output determined to exceed the first threshold in the comparison operation into at least one group; and among output elements having non-zero values in the group, or an output element having a value less than the maximum value by a difference less than a second threshold das similarly taught by Reagen to add activation function of each DNN layer, to zero all activities below the threshold, removing them from the prediction computation.

As per claims 23, 24, 26, and 27, the limitations recited in these claim are substantially similar to the limitations recited in claims 4, 5, 7, and 8. Therefore, claims 23, 24, 26, and 27 are rejected using the same reasons as claims 4, 5, 7, and 8.

Claims 10-13, 17, 28, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Almahairi, in view of Shoaib, further in view of Shan, further in view of Jaderberg.

As per claim 10, Almahairi does not appear to explicitly recite wherein the selecting comprises performing the high speed convolution operation by performing sequential convolution operations between the input and sub-kernels generated by decomposing the kernel.
Jaderberg teaches wherein the selecting comprises performing the high speed convolution operation by performing sequential convolution operations between the input and sub-kernels generated by decomposing the kernel (see at least [chapter 2] This means that each basis filter can be decomposed in to a sequence of horizontal and vertical filters  ... Using this decomposition, the convolution of a separable filter si can be performed in O(2dH’W’) operations instead of O(d2H’W’)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Almahairi to performing the high speed convolution operation by performing sequential convolution operations between the input and sub-kernels generated by decomposing the kernel to modify the coarse (high speed) network described in Almahairi using the teachings of Jaderberg to achieve sequential convolution operations on the high speed network.

As per claim 11, Jaderberg further teaches wherein the performing of the sequential convolution operations comprises: performing a first convolution operation between the input and a first sub-kernel of the sub-kernels; and performing a second convolution operation between a result of the first convolution operation and a second sub-kernel of the sub-kernels ([chapter 2.1 and Fig.1 (c)] we are assuming that the full rank original convolutional filter bank can be decomposed in to a linear combination of a set of separable basis filter. And which is the sum of separable filters hkn ∗ vck – Note: v is performing the first convolution operation between the input and a first sub-kernel of the sub-kernels; h is performing the second convolution operation between a result of the first convolution operation and a second sub-kernel of the sub-kernels).

As per claim 12, Jaderberg further teaches wherein the sub-kernels are defined to minimize a difference between a result of a rectified linear unit (ReLU) operation performed on a result of the sequential convolution operations between the input and the sub-kernels and a result of an ReLU operation performed on a result of a convolution operation between the input and the kernel ([chapter 2.1 and equation 3] The output is a new feature map zi+1 ∈ RH’×W’×N such that zni+1 = hi(Win ∗ zi + bin) ∀n ∈ [1...N], where ... hi is a non-linear activation function such as the Rectified Linear Unit (ReLU). And All the parameters of the model are jointly optimized to minimize a loss over the training set using Stochastic Gradient Descent (SGD) with back-propagation – Note: as shown in equation 3, it minimizes the error between the original filters (W) and the decomposed filters (h and v) using the back propagation).

As per claim 13, Jaderberg further teaches wherein the sub-kernels are defined to minimize a difference between a result of the sequential convolution operations between the input and the sub-kernels and a result of a rectified linear unit (ReLU) operation performed on a result of a convolution operation between the input and the kernel (as described in claim 12 – Note: ReLU does not have a meta parameter updated based on the loss function. Therefore, the operation described in claim 12 can be applied to claim 13).

As per claim 17, Almahairi teaches wherein the kernel comprises at least one filter; the filter comprises at least one kernel map corresponding to at least one input channel ([chapter 6.1] 12 and 24 filters And [chapter 4.1] We use the 100 × 100 Cluttered MNIST digit classification – Note: MNIST is a set of images files that have one channel), the kernel map comprising kernel elements ([chapter 6.1] 3 × 3 filter sizes);
the input comprises at least one input feature map corresponding to the input channel, the input feature map comprising input elements ([chapter 2] ci,j = fc(xi,j)- Note: ci,j is the feature map corresponding to the input channel and elements; MNIST is a single channel image file) 
the selecting comprises selecting the survival network based on the output element ([chapter 2.1] the DCN output is obtained by feeding the refined representation into the top layers, g(fr(x)) – Note: DCN is the survival network; fr(x) contains output element).
Almahairi does not explicitly recite “the performing of the high speed convolution operation comprises: performing a first operation between a first portion of kernel elements in a first kernel map corresponding to a first input channel and at least one input element corresponding to the first portion; performing a second operation between a second portion of kernel elements in a second kernel map corresponding to a second input channel and at least one input element corresponding to the second portion after the first operation is performed; and generating an output element corresponding to the first kernel map and the second kernel map based on a result of the first operation and a result of the second operation.”
Jaderberg teaches 	the performing of the high speed convolution operation comprises: performing a first operation between a first portion of kernel elements in a first kernel map corresponding to a first input channel and at least one input element corresponding to the first portion;([chapter 2.1] Note that there are N filters operating on each input channel zc. These can be approximated as linear combinations of a basis of M < N (separable) filters... yielding the approximation Wn ∗ z – Note: Wn ∗ z is performing a first operation between a first portion of kernel elements in a first kernel map corresponding to a first input channel and at least one input element corresponding to the first portion)
performing a second operation between a second portion of kernel elements in a second kernel map corresponding to a second input channel and at least one input element corresponding to the second portion after the first operation is performed; and ([chapter 2.1] Note that there are N filters operating on each input channel zc. These can be approximated as linear combinations of a basis of M < N (separable) filters... yielding the approximation Wn ∗ z – Note: Wn ∗ z is performing the second operation between the second portion of kernel elements in the second kernel map corresponding to the second input channel and at least one input element corresponding to the second portion)
	generating an output element corresponding to the first kernel map and the second kernel map based on a result of the first operation and a result of the second operation; ([chapter 2.1] These can be approximated as linear combinations of a basis of M < N (separable) filters...yielding the approximation Wn ∗ z = ∑C c=1 Wcn ∗ zc – Note: Wn is the first and second kernel map; ∑C c=1 is generating an output element on a result of the first operation and a result of the second operation).

As per claim 28, Almahairi does not explicitly recite “wherein the generating of the approximate output comprises performing sequential convolution operations between the input and sub-kernels generated by decomposing the kernel.”
Jaderberg teaches wherein the generating of the approximate output comprises performing sequential convolution operations between the input and sub-kernels generated by decomposing the kernel ([chapter 2] One way to exploit this redundancy is to approximate the filter set by a linear combination of a smaller basis set of M filters ... This means that each basis filter can be decomposed in to a sequence of horizontal and vertical filters  ... Using this decomposition, the convolution of a separable filter si can be performed in O(2dH’W’) operations instead of O(d2H’W’).

As per claim 30, Almahairi teaches wherein the kernel comprises at least one filter; the filter comprises at least one kernel map corresponding to at least one input channel, the kernel map comprising kernel elements, the kernel map comprising kernel elements ([chapter 6.1] 12 and 24 filters And [chapter 4.1] We use the 100 × 100 Cluttered MNIST digit classification – Note: MNIST is a set of images files that have one channel), the kernel map comprising kernel elements ([chapter 6.1] 3 × 3 filter sizes)
	the input comprises at least one input feature map corresponding to the input channel, the input feature map comprising input elements ([chapter 2] ci,j = fc(xi,j)- Note: ci,j is the feature map corresponding to the input channel and elements; MNIST is a single channel image file).
	the selecting comprises selecting the survival network in the convolution layer based on the output element ([chapter 2.1] the DCN output is obtained by feeding the refined representation into the top layers, g(fr(x)). And [chapter 6.1] Coarse layers: 2 convolutional layers ... Fine layers: 5 convolutional layers  – Note: DCN is the survival network; fr(x) contains output element; salient objects are processed using the fine convolution layers; non-salient objects are processed using the coarse convolution layers).
Almahairi does not appear to explicitly recite “the generating of the approximate output comprises: performing a first operation between a first portion of kernel elements in a first kernel map corresponding to a first input channel and at least one input element corresponding to the first portion; performing a second operation between a second portion of kernel elements in a second kernel map corresponding to a second input channel and at least one input element corresponding to the second portion after the first operation is performed; and generating an output element corresponding to the first kernel map and the second kernel map based on a result of the first operation and a result of the second operation; and”
Jaderberg teaches 	the performing of the high speed convolution operation comprises: performing a first operation between a first portion of kernel elements in a first kernel map corresponding to a first input channel and at least one input element corresponding to the first portion;(as described in claim 17)
	performing a second operation between a second portion of kernel elements in a second kernel map corresponding to a second input channel and at least one input element corresponding to the second portion after the first operation is performed; and (as described in claim 17)
	generating an output element corresponding to the first kernel map and the second kernel map based on a result of the first operation and a result of the second operation; (as described in claim 17).

Claims 14-16 and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Almahairi, in view of Shoaib, further in view of Shan, further in view of Sakaguchi (US 20190130245).

As per claim 14, Almahairi does not appear to explicitly recite “wherein the performing of the high speed convolution operation comprises performing a convolution operation between high-order bits of an input element of the input and high-order bits of a kernel element of the kernel.”
Sakaguchi teaches wherein the performing of the high speed convolution operation comprises performing a convolution operation between high-order bits of an input element of the input and high-order bits of a kernel element of the kernel ([0050] In FIG.4, X data and Y data are used in arithmetic operations of the neural network and, among these pieces of data, for example, X can be assigned as input data and Y can be assigned as a weighting coefficient. And [0052] By multiplying the high-order bits of X [14:0] by the high-order bits of Y [14:0] in a multiplier 111).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Almahairi such that performing of the high speed convolution operation comprises performing a convolution operation between high-order bits of an input element of the input and high-order bits of a kernel element of the kernel as similarly taught by Sakaguchi to modify the convolutional operations (multiplication between input and weight) of the high speed network of Almahairi using the arithmetic operations taught by Sakaguchi to perform the high speed convolution operation.

As per claim 15, Almahairi does not appear to explicitly recite “wherein the performing of the high speed convolution operation comprises: separating the input into a high-order bits input corresponding to high-order bits of the input and a low-order bits input corresponding to low-order bits of the input; separating the kernel into a high-order bits kernel corresponding to high-order bits of the kernel and a low-order bits kernel corresponding to low-order bits of the kernel; and performing a convolution operation between the high-order bits input and the high-order bits kernel”
Sakaguchi teaches wherein the performing of the high speed convolution operation comprises: separating the input into a high-order bits input corresponding to high-order bits of the input and a low-order bits input corresponding to low-order bits of the input; separating the kernel into a high-order bits kernel corresponding to high-order bits of the kernel and a low-order bits kernel corresponding to low-order bits of the kernel; ([0051] Here, X [14:0], which is 15-bit data, can be represented by X [14:9] of high-order six bits and X [8:0] of low-order nine bits. In addition, Y [14:0], which is 15-bit data, can be represented by Y [14:9] of high-order six bits and Y [8:0] of low-order nine bits) and 
performing a convolution operation between the high-order bits input and the high-order bits kernel ([0052] By multiplying the high-order bits of X [14:0] by the high-order bits of Y [14:0] in a multiplier 111).

As per claim 16, Sakaguchi further teaches wherein the performing of the precision convolution operation comprises performing a second convolution operation between the high-order bits input and the low-order bits kernel ([0053] by multiplying the low-order bits of Y [14 : 0] by the high-order bits of X [14 : 0] in a multiplier 113)
	performing a third convolution operation between the low-order bits input and the high- order bits kernel ([0053] By multiplying the low-order bits of X [14:0] by the high-order bits of Y [14:0] in a multiplier 112) 
	performing a fourth convolution operation between the low-order bits input and the low-order bits kernel; and ([0055] By multiplying the low-order bits of X [14:0] by the low-order bits of Y [14:0] in a multiplier 115) 
combining a result of the high speed convolution operation, a result of the second convolution operation, a result of the third convolution operation, and a result of the fourth convolution operation ([0056] In an adder 116, the 30-bit data obtained by the arithmetic operation of the multiplier 111, the 25-bit data obtained by the arithmetic operations of the multiplier 112 to the adder 114, and the 18-bit data obtained by the arithmetic operation of the multiplier 115 are added. As a result, Z [30:0] is obtained as 31-bit data).

As per claim 29, Almahairi does not appear to explicitly recite “generating the approximate output based on high-order bits of an input element of the input and high-order bits of a kernel element of the kernel.”
Sakaguchi teaches generating the approximate output ([0049] FIG.4 is a diagram illustrating an example of arithmetic operations in a case where redundancy of high order bits is not implemented in multiplication data of one's complement – Note: no implementation of multiplication data of one's complement is the approximate output) based on high-order bits of an input element of the input and high-order bits of a kernel element of the kernel ([0052] By multiplying the high-order bits of X [14:0] by the high-order bits of Y [14:0] in a multiplier 111).

Claims 31-33 and 36 are rejected under 35 U.S.C. 103 as being unpatentable over Courbariaux et al. “Training Deep Neural Network with Low Precision Multiplications” (hereinafter Courbariaux), in view of Zhang et al. “ApproxANN: An Approximate Computing Framework for Artificial Neural Network” (hereinafter Zhang), further in view of Lupon et al. (US 2015/0170021).

As per claim 31, Courbariaux teaches the invention as claimed, including a processor-implemented convolutional neural network (CNN) processing method comprising:
performing a high speed convolution operation between reduced precision input and reduced precision kernel (i.e., reduce precision of the parameters and the inputs, apply convolution, see at least page 2, section 2, page 4, section 7).
Courbariaux does not explicitly teach the convolution operation is performed between high-order bits of an input and high-order bits of a kernel while ignoring lower-order bits of the input and low-order bits of the kernel.
Zhang teaches reducing precision can be achieved by discarding a specific number of least significant bits of the data (see at least page 703, right column, paragraph 1).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Courbariaux such that the convolution operation is performed between high-order bits of an input and high-order bits of a kernel while ignoring lower-order bits of the input and low-order bits of the kernel. Courbariaux teaches the convolution operation is performed between reduced precision input and reduced precision kernel, and it would have been obvious to use known technique in the art to reduce precision such as by ignoring lower-order bits as taught by Zhang, where computational quality is slighted degraded, but energy saving and performance can be significantly improved (see at least page 703, right column, paragraph 1 of Zhang).  When the least significant bits are discarded, performing the convolution with the reduced precision input and kernel would be performing the convolution operation with the high-order bits of an input and a kernel. 
Courbariaux does not explicitly teach performing a precision convolution operation based on at least the low-order bits of the input and low-order bits of the kernel.
Lupon teaches performing a precision convolution operation based on at least the low-order bits of the input and low-order bits of the kernel (i.e., higher precision multiplier, lower 8 bits are supplied to multiplier, see at least Fig. 3D, [0011], [0041]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Courbariaux to perform a precision convolution operation based on at least the low-order bits of the input and low-order bits of the kernel as similarly taught by Lupon because it is known that lower bits are used in a higher precision operation (see at least Fig. 3D, [0011], [0041] of Lupon).

As per claim 32, Courbariaux teaches reduce precision of the input; and reducing precision of kernel; wherein the performing of the high speed convolution operation comprises performing a first convolution operation between the reduced precision input and the reduced precision kernel (i.e., reduce precision of the parameters and the inputs, apply convolution, see at least page 2, section 2, page 4, section 7).
Courbariaux does not explicitly teach separating the input into a high-order bits input corresponding to the high-order bits of the input and a low-order bits input corresponding to the low-order bits of the input; and separating the kernel into a high-order bits kernel corresponding to the high-order bits of the kernel and a low-order bits kernel corresponding to the low-order bits of the kernel; wherein the performing of the high speed convolution operation comprises performing a first convolution operation between the high-order bits input and the high-order bits kernel, and the performing of the precision convolution operation comprise performing the precision convolution based on at least the low-order bits input and the low-order bits kernel. 
Lupon teaches separating the input into a high-order bits input corresponding to the high-order bits of the input and a low-order bits input corresponding to the low-order bits of the input (i.e., a single 16-bit input may be split into higher 8 bits and lower 8 bits, see at least [0041]); and 
separating the kernel into a high-order bits kernel corresponding to the high-order bits of the kernel and a low-order bits kernel corresponding to the low-order bits of the kernel (i.e., 16-bit weight is split into higher 8 bits and lower 8 bits, see at least [0041]); 
the performing of the precision convolution operation comprise performing the precision convolution based on at least the low-order bits input and the low-order bits kernel (i.e., higher precision multiplier, lower 8 bits are supplied to multiplier, see at least Fig. 3D, [0011], [0041]).
Zhang teaches reducing precision can be achieved by separating data into a high-order bits data corresponding to the high-order bits of the data and a low-order bits data corresponding to the low-order bits of the data (see at least page 703, right column, paragraph 1; EN: discarding a specific number of least significant bits of the data would separate bits of the data into high-order bits and low-order bits).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Courbariaux such that the precision is reduced by separating input and kernel into high-order bits and low-order bits input and kernel, and performing the high speed convolution operation comprises performing a first convolution operation between high-order bits of an input and high-order bits of a kernel.  Courbariaux teaches the convolution operation is performed between reduced precision input and reduced precision kernel, and it would have been obvious to use known technique in the art to reduce precision such as by separating the input and kernel into high-order bits and low-order bits as taught by Zhang, where computational quality is slighted degraded, but energy saving and performance can be significantly improved (see at least page 703, right column, paragraph 1 of Zhang).  Further, it is known that performing a precision convolution operation would be based on the low-order bits of the input and low-order bits of the kernel as similarly taught by Lupon (see at least Fig. 3D, [0011], [0041] of Lupon).

As per claim 33, Courbariaux and Zhang do not explicitly teach wherein the performing the precision convolution operation further comprises: performing a second convolution operation between the high-order bits input and the low-order bits kernel; performing a third convolution operation between the low-order bits input and the high-order bits kernel; and performing a fourth convolution operation between the low-order bits input and
the low-order bits kernel, and the method further comprises generating an output by combining a result of the first convolution operation, a result of the second convolution operation, a result of the third convolution operation, and a result of the fourth convolution operation.
Lupon teach wherein the performing the precision convolution operation further comprises: 
performing  a first convolution operation between the high-order bits input and the high-order bits kernel (i.e., operation performed by multiplier 306.1, see at least Fig. 3D, [0041]);
performing a second convolution operation between the high-order bits input and the low-order bits kernel (i.e., operation performed by multiplier 306.3, see at least Fig. 3D, [0041]);
performing a third convolution operation between the low-order bits input and the high-order bits kernel (i.e., operation performed by multiplier 306.2, see at least Fig. 3D, [0041]); and 
performing a fourth convolution operation between the low-order bits input and the low-order bits kernel (i.e., operation performed by multiplier 306.4, see at least Fig. 3D, [0041]), and 
the method further comprises generating an output by combining a result of the first convolution operation, a result of the second convolution operation, a result of the third convolution operation, and a result of the fourth convolution operation (i.e., product outcomes from multipliers 306.1-306.4 may be fed into reconfigurable accumulator 304).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Courbariaux and Zhang such that wherein the performing the precision convolution operation further comprises: performing a second convolution operation between the high-order bits input and the low-order bits kernel; performing a third convolution operation between the low-order bits input and the high-order bits kernel; and performing a fourth convolution operation between the low-order bits input and
the low-order bits kernel, and the method further comprises generating an output by combining a result of the first convolution operation, a result of the second convolution operation, a result of the third convolution operation, and a result of the fourth convolution operation as similarly taught by Lupon to use known methods in the art for performing multiplication operation between an input and a kernel.

As per claim 36, the limitations recited in this claim are substantially similar to claim 31. Therefore, claim 36 is rejected using the same reasons as claim 31. 

Claim 41 is rejected under 35 U.S.C. 103 as being unpatentable over Jaderberg, further in view of Sakaguchi.

As per claim 41, Jaderberg does not explicitly teach wherein the first part of the input is high-order bits of the input; the first part of the kernel is high-order bits of the kernel; the second part of the input is low-order bits of the input; and the second part of the kernel is low-order bits of the kernel.
Sakaguchi teaches wherein the first part of the input is high-order bits of the input; the first part of the kernel is high-order bits of the kernel; the second part of the input is low-order bits of the input; and the second part of the kernel is low-order bits of the kernel. ([0050] In FIG.4, X data and Y data are used in arithmetic operations of the neural network and, among these pieces of data, for example, X can be assigned as input data and Y can be assigned as a weighting coefficient. And [0051] Here, X [14:0], which is 15-bit data, can be represented by X [14:9] of high-order six bits and X [8:0] of low-order nine bits. In addition, Y [14:0], which is 15-bit data, can be represented by Y [14:9] of high-order six bits and Y [8:0] of low-order nine bits).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Jaderberg the first part of the input is high-order bits of the input; the first part of the kernel is high-order bits of the kernel; the second part of the input is low-order bits of the input; and the second part of the kernel is low-order bits of the kernel as similarly taught by Sakaguchi to modify the convolutional operations (multiplication between input and weight) of the high speed network of Courbariaux using the arithmetic operations taught by Sakaguchi to perform the high speed convolution operation.
Response to Arguments
Rejection of claims under §103: 
As per independent claim 37, Applicants argued that a first convolution operation is performed using an entire input Z, rather than a first part of the input Z, and a second convolution operation is performed using an entire result of the first convolution operation, rather than a second part of the input Z.
Examiner respectfully disagrees. Jaderberg teaches there are N filters operating on each input channel, and the filters can be approximated as linear combination of a basis of separable filters (see at least page 4, scheme 1). Each input channel is part of an input.  The separable filters operate using a first or second part of the input as they operate on an input channel. 
Applicant further argued that the second convolution operation in Jaderberg is performed regardless of the result of the first convolution operation, rather than “in response to a result of the first convolution operation meeting a predetermined criterion.”  Applicant argued that the first criterion or the second criterion is used only to obtain the separable filters in a training process that is performed before the convolution operations in Fig. 1(b) are performed.
Examiner respectfully disagrees. Jaderberg teaches attaining optimal separable basis representation by minimizing the reconstruction error of the filter output, the set of filters are learnt by minimizing reconstruction error. (see at least pages 5-7, section 2.2). Minimizing reconstruction error is a predetermined criterion that is met when determining the optimal separable basis representation. The set of filters reconstructs the original filter. Thus, one filter being part of an optimal separable basis representation is as a result of another filter in the set minimizing reconstruction error. 

As per independent claims 1, 20, 31, 34, 35, and 36, Applicant’s arguments have been fully considered, but are moot in light of the new grounds of rejection.
 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jue Louie whose telephone number is 571-270-1655.  The examiner can normally be reached on M-F 9:30 am - 5:00pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on 571-272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Jue Louie/
Primary Examiner
Art Unit 2121