DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
The following claims is/are pending in this office action: 1-20
The following claim(s) is/are amended: 1, 7, 8, 10, and 16
The following claim(s) is/are new: None
The following claim(s) is/are cancelled:
Claim(s) rejected: 1-20

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (Sparse Convolutional Neural Networks; hereafter “Liu” ) in view of Ioannou et al. (“Training cnns with low-rank filters for efficient image classification”; hereafter “Ioannou”).

Regarding claim 1, Liu teaches a system, comprising: a memory that stores computer executable components; a processor that executes computer executable components stored in the memory, wherein the computer executable components comprise (Section 4.1: “We implemented our method on x86 64 CPU microarchitecture with the Advanced Vector Extension (AVX), which is available on both Intel and AMD’s CPUs after 2011, although we expect that this approach could be extended to GPU architectures.” Section 4.2 Para 2: “To maximally reduce the memory latency, the input matrices are first divided into blocks that can fit into the L2 cache of CPU.)
an analysis component that analyzes an initial convolutional layer in a network architecture of a convolutional neural network and one or more subsequent convolutional layers in the network architecture (Section 6.1, para 1: “The model consists of 5 convolutional layers and two fully connected layers, interlaced with subsampling layers, local normalizing layers, max pooling layers, rectified linear unit layers and dropout layers. The first convolutional layer has relatively large 11 × 11 kernels and only 3 input channels; the second convolutional layer has 5 × 5 kernels; The third, fourth and fifth convolutional layers have very small 3 × 3 kernels. The difference of kernel sizes as well as the number of input kernels affects the possible sparsity that can be achieved.” Figure 1. In the model more than one convolutional layers are used and analyzed. Also figure 1 shows five convolutional layers.)
a replacement component that replaces original convolutional kernels in the initial convolutional layer with initial sparse convolutional kernels, and replaces subsequent convolutional kernels in one or more subsequent convolutional layers with complementary (Figure 1: “Overview of our sparse convolutional neural network… We apply two stage decompositions over the channels and the convolutional kernels, obtaining a remarkably (more than 90%) sparse kernel matrix and converting the operation of convolutional layer to sparse matrix multiplication.” Convolutional kernels are converted to sparse kernels.).
But Liu does not explicitly teach wherein the complementary sparse convolutional kernels have a complementary pattern with respect to sparse kernels of a previous convolutional layer, wherein the complementary pattern is a pattern wherein a join of a one of the original convolutional kernels in the initial convolutional layer and a one of the complementary sparse convolutional kernels in the one or more subsequent convolutional layers in a spatial domain results in a full coverage of a receptive field of a kernel.
Ioannou, however, teaches wherein the complementary sparse convolutional kernels have a complementary pattern with respect to sparse kernels of a previous convolutional layer (“Our contributions include a novel method of learning a set of small basis filters that are combined to represent larger filters efficiently.” “Specifically, we show that by representing convolutional filters using a basis space comprising groups of filters (or kernels) of different spatial dimensions (examples shown in Fig. 1c and d).” Also shown in Fig. 1c and 1d that a big filter is broken to small set of filters to process an image. Both filters are connected to the same image, pattern of one filter depends on the other filters used previously (i.e. 1x3 and then 3x1.) wherein the complementary pattern is a pattern wherein a join of a one of the original convolutional kernels in the initial convolutional layer and a one of the complementary sparse convolutional kernels in the one or more subsequent convolutional layers in a spatial (Section 2.3 Para 1: “This scheme uses composite layers comprising several sets of filters where the filters in each set have different spatial dimensions (see Fig. 5). The outputs of these basis filters may be combined in a subsequent layer containing filters with spatial dimensions 1x1. This is illustrated in Fig. 2c. Here, our composite layer contains horizontal wx1 and vertical 1xh filters, the outputs of which are concatenated in the channel dimension, resulting in an intermediate m-channel feature map. These filter responses are then linearly combined by the next layer of d 1x1 filters to give a d-channel output feature map.” Page 6 second last para: “To demonstrate the efficacy of the simple low rank filter representation illustrated in Fig. 2c, we created a new network architecture (vgg-gmp-lr-join) by replacing each of the convolutional layers in VGG-11 (original filter dimensions were 3x3) with a sequence of two layers. The first layer comprises half 1x3 filters and half 3x1 filters whilst the second layer comprises the same number of 1x1 filters.” Original filter/kernel is split into low rank filters/kernels having spatial domain or dimension, when combined gives full coverage as original filter as shown in Fig. 2.).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine convolutional neural network of Liu with the sparse Kernels or filters of Ioannou as used in SCNN to reduce the computational complexity of CNNs without compromising on accuracy (Ioannou, Page 1 second last para).

Regarding claim 2, Liu and Ioannou teach the method of claim 1.
(Section 1 Para 2: “Thus, as illustrated in Fig. 1, in comparison with fully connected network layers (Fig. 1a), convolutional layers have a much sparser connection structure and use fewer parameters (Fig. 1b). This leads to faster training and test, better generalization, and higher accuracy.” Page 7 Para 1: “To answer this question, we trained a network, vgg-gmp-lr-join-wfull, with a mixture of 25% 3x3 and 75% 1x3 and 3x1 filters, while preserving the total number of filters of the baseline network.” Network was trained using Sparse Kernels or filters. vgg-gmp-lr-join-wfull network was used to determine Sparse Kernels.).
Same motivation to combine the teaching of Liu and Ioannou as claim 1.

Regarding claim 3, Liu and Ioannou teach the method of claim 2.
Ioannou also teaches wherein the analysis component determines the sparse kernels based on weight data (Page 5 Figure 3: “The cross-shaped filters (c) learned as weighted linear combination of (b) 1x3 and (c) 3x1 basis filters in the first convolutional layer of the the ‘vgggmp-lr-join’ model trained using the ILSVRC dataset.” Section 2.2 Last Para: “In effect, we learn the separable basis filters and their combination weights simultaneously during network training.” Section 2.1 Para 1: “Each filter is represented by hwc independent weights.” Also, can be seen in Fig. 2, each filter/kernel is represented by its independent weights (including sparse Kernels or filters.).
Same motivation to combine the teaching of Liu and Ioannou as claim 1.

Regarding claim 4, Liu and Ioannou teach the method of claim 2.
Liu also teaches wherein the analysis component determines the sparse kernels based on a similarity measure with respect to a most similar sparse pattern inherited from pre-trained weights relative to the original convolutional kernels, and the replacement component uses most similar sparse kernels to replace the original convolutional kernels in the initial convolutional layer (Section 6.1 para 1: “We trained our model on the ImageNet LSVRC 2012 dataset. We start from a pre-trained Caffe[13] reference CNN model…”. Section 6.5 para 2: “We compare the convolution kernels in the original model and the ones that are reconstructed from our fine-tuned sparse model in Figure 5. We also measure the average similarity between the original and reconstructed kernels by first deduct the mean values from both kernels, and then calculate the cosine similarity measurement.” Figure 5: Sparsification applied on pre-trained convolutional and reconstructed sparse kernels are compared with original convolutional kernels based on similarity measurement. The convolutional layers can be replaced with sparse kernel matrix as shown in Figure 1 in Liu.).

Regarding claim 5, Liu and Ioannou teach the method of claim 1.
Lui also teaches further comprising a training component that uses the initial sparse convolutional kernels and complementary convolutional kernels to train another convolutional neural network model (Section 3.3 para 1: “The parameters of Sparse Convolutional Neural Networks are learned in two phases: initial decomposition and fine-tuning… Sparse matrix decomposition algorithm is an intuitive choice for initialization.”…Page 807 para 1: “…by sparse decompositions of the convolutional kernels. As Figure 1 illustrate, two-stage decompositions are applied to explore the inter-channel and intra-channel redundancy of convolution kernels. We first perform an initial decomposition based on the reconstruction error of kernel weights, then fine-tune the network while imposing the sparsity constraint”. Section 3.3 para 2: “…In the fine-tuning phase, we impose sparsity constraints over the network parameters, while continuing to train the whole network.” Initial sparse kernel resulting from initial decomposition and subsequent (complementary) sparse kernels resulting from applying sparsity constraint are used to train and fine-tune the network.).

Regarding claim 6, Liu and Ioannou teach the method of claim 1.
Lui also teaches further comprising a tuning component that uses the initial sparse convolutional kernels and complementary convolutional kernels to tune a convolutional neural network model (Section 3.3 para 1: “The parameters of Sparse Convolutional Neural Networks are learned in two phases: initial decomposition and fine-tuning… Sparse matrix decomposition algorithm is an intuitive choice for initialization.”…Page 807 para 1: “…by sparse decompositions of the convolutional kernels. As Figure 1 illustrate, two-stage decompositions are applied to explore the inter-channel and intra-channel redundancy of convolution kernels. We first perform an initial decomposition based on the reconstruction error of kernel weights, then fine-tune the network while imposing the sparsity constraint.” Section 3.3 para 2: “…In the fine-tuning phase, we impose sparsity constraints over the network parameters, while continuing to train the whole network.” Initial sparse kernel resulting from initial decomposition and subsequent (complementary) sparse kernels resulting from applying sparsity constraint are used to fine tune the network.).

Regarding claim 7, Liu and Ioannou teach the method of claim 1.
Ioannou also teaches wherein relative to a previous convolutional layer of the one or more subsequent convolutional layers, for a kernel, the replacement component replaces the kernel with a complementary sparse kernel relative to the previous convolutional layer (Page 6 Section Separable Filters: To evaluate the separable filter approach described in x2.2 (illustrated in Fig. 2b), we replaced each convolutional layer in VGG-11 with a sequence of two layers, the first containing horizontally oriented 1x3 filters and the second containing vertically oriented 3x1 filters (vgg-gmp-sf).” Each convolutional layer is replaced with two or more low rank sparse filters or Kernels) or [[a]] with another sparse kernel having a most similar sparse pattern (Section 4.2 Para 2: “For the googlenet-lr network, within only the inception modules we replaced each the 3x3 filters with low-rank 3x1 and 1x3 filters, and replaced the layer of 5x5 filters with a set of low-rank 5x1 and 1x5 filters.” The sparse layers which replaced the convolution layers have most similar low rank filter patterns  (1x3 or 3x1 for a 3x3 convolution layer.).
Same motivation to combine the teaching of Liu and Ioannou as claim 1.

Regarding claim 8, Liu and Ioannou teach the method of claim 1.
Ioannou also teaches wherein at least one of the sparse kernels comprises a 3 x 3 kernel size (Section 4.2 Para 2: “For the googlenet-lr network, within only the inception modules we replaced each the 3x3 filters with low-rank 3x1 and 1x3 filters, and replaced the layer of 5x5 filters with a set of low-rank 5x1 and 1x5 filters.”) and wherein a first center point of the one (As shown in Fig. 2b, c, d. center point of both convolutional and complimentary sparse kernel (second low rank kernel) is nonzero.).
Same motivation to combine the teaching of Liu and Ioannou as claim 1.

Regarding claim 9, Liu and Ioannou teach the method of claim 1.
Liu also teaches wherein at least one of the sparse kernels comprises a 5x5 kernel size (Section 6.1: “The first convolutional layer has relatively large 11 × 11 kernels and only 3 input channels; the second convolutional layer has 5 × 5 kernels; The third, fourth and fifth convolutional layers have very small 3 × 3 kernels. The difference of kernel sizes as well as the number of input kernels affects the possible sparsity that can be achieved.”).

Regarding claims 10, it is substantially similar to claims 1, and is rejected in the same manner, the same art, and reasoning applying.

Regarding claim 11, Liu and Ioannou teach the method of claim 10. 
Liu further teaches wherein the analyzing further comprises analyzing a trained model to determine the original convolutional kernels (Page 29 para 3: “To measure the weight and activation sparsity, we used the Caffe framework [4] to prune and train the three networks listed in Table 1, using the pruning algorithm of [17]. We then instrumented the Caffe framework to inspect the activations between the convolutional layers. Figure 1 shows the weight and activation density (fraction of non-zeros or complement of sparsity) of the layers of the networks, referenced to the left-hand y-axes”. Convolutional layers are pruned to convert it into sparse layers and also get trained. Figure 1 shows original sparse Kernels used in training.).

Regarding claim 12, Liu and Ioannou teach the method of claim 11.
Liu also teaches wherein the analyzing determines the sparse kernels based on a similarity measure with respect to a most similar sparse pattern inherited from pre-trained weights relative to the original convolutional kernels (Section 6.1 para 1: “We trained our model on the ImageNet LSVRC 2012 dataset. We start from a pre-trained Caffe [13] reference CNN model…”. Section 6.5 para 2: “We compare the convolution kernels in the original model and the ones that are reconstructed from our fine-tuned sparse model in Figure 5. We also measure the average similarity between the original and reconstructed kernels by first deduct the mean values from both kernels, and then calculate the cosine similarity measurement.” Reconstructed sparse kernels are obtained by fine tuning of the original pre-trained model. The similarity between the convolution layers and reconstructed sparse layers is analyzed. From figure 5, it is evident conv1 (convolution layer 1) has the highest similarity (most similar.).

Regarding claims 13 and 14, they are substantially similar to claims 5 and 6, and are rejected in the same manner, the same art, and reasoning applying.

Regarding claim 15, Liu and Ioannou teach the method of claim 10.
(Page 6 Section Separable Filters: “To evaluate the separable filter approach described in x2.2 (illustrated in Fig. 2b), we replaced each convolutional layer in VGG-11 with a sequence of two layers, the first containing horizontally oriented 1x3 filters and the second containing vertically oriented 3x1 filters (vgg-gmp-sf).” Each convolutional layer is replaced with two or more low rank sparse filters or Kernels.).
Same motivation to combine the teaching of Liu and Ioannou as claim 1.

 Regarding claim 16, it is substantially similar to claim 1, and is rejected in the same manner, the same art, and reasoning applying.

Regarding claim 17, Liu and Ioannou teach the method of claim 16.
Ioannou also teaches determine the original convolutional kernels and the sparse kernels to replace the original convolutional kernels based on weight data (Page 5 Figure 3: “The cross-shaped filters (c) learned as weighted linear combination of (b) 1x3 and (c) 3x1 basis filters in the first convolutional layer of the the ‘vgggmp-lr-join’ model trained using the ILSVRC dataset.” Section 2.2 Last Para: “In effect, we learn the separable basis filters and their combination weights simultaneously during network training.” Section 2.1 Para 1: “Each filter is represented by hwc independent weights.” Page 6 Section Separable Filters: “To evaluate the separable filter approach described in Section 2.2 (illustrated in Fig. 2b), we replaced each convolutional layer in VGG-11 with a sequence of two layers, the first containing horizontally oriented 1x3 filters and the second containing vertically oriented 3x1 filters (vgg-gmp-sf).”  Also, can be seen in Fig. 2, each filter/kernel is represented by its independent weights (both original convolutional kernels and sparse Kernels or filters.).
Same motivation to combine the teaching of Liu and Ioannou as claim 1.

Regarding claim 18, Liu and Ioannou teach the method of claim 16.
Liu also teaches determine the original convolutional kernels and the sparse kernels to replace the original convolutional kernels based on similarity data (Section 6.5 para 2: “We compare the convolution kernels in the original model and the ones that are reconstructed from our fine-tuned sparse model in Figure 5. We also measure the average similarity between the original and reconstructed kernels by first deduct the mean values from both kernels, and then calculate the cosine similarity measurement.” Figure 5. Figure 5 shows the original convolutional kernels and sparse Kernels based on similarity data. The convolutional kernels can be replaced with sparse kernel as shown in Figure 1 in Liu.).

Regarding claim 19, Liu and Ioannou teach the method of claim 16.
Liu also teaches tune a convolutional neural network model based on the complementary sparse convolutional kernels in the convolutional layers (Section 3.3 para 1: “The parameters of Sparse Convolutional Neural Networks are learned in two phases: initial decomposition and fine-tuning… Sparse matrix decomposition algorithm is an intuitive choice for initialization.”…Page 807 para 1: “…by sparse decompositions of the convolutional kernels. As Figure 1 illustrate, two-stage decompositions are applied to explore the inter-channel and intra-channel redundancy of convolution kernels. We first perform an initial decomposition based on the reconstruction error of kernel weights, then fine-tune the network while imposing the sparsity constraint”. Section 3.3 para 2: “…In the fine-tuning phase, we impose sparsity constraints over the network parameters, while continuing to train the whole network.” Initial sparse kernel resulting from initial decomposition and subsequent (complementary) sparse kernels resulting from applying sparsity constraint are used to fine tune the network.).

Regarding claim 20, Liu and Ioannou teach the method of claim 16.
Liu also teaches train a convolutional neural network model based on the complementary sparse convolutional kernels in the convolutional layers (Section 6.1: “We trained our model on the ImageNet LSVRC 2012 [2] dataset. We start from a pre-trained Caffe[13] reference CNN model, which is almost identical to the model described in [14]. The model consists of 5 convolutional layers and two fully connected layers, interlaced with subsampling layers, local normalizing layers, max pooling layers, rectified linear unit layers and dropout layers. The first convolutional layer has relatively large 11 × 11 kernels and only 3 input channels; the second convolutional layer has 5 × 5 kernels; The third, fourth and fifth convolutional layers have very small 3 × 3 kernels. The difference of kernel sizes as well as the number of input kernels affects the possible sparsity that can be achieved. All 5 convolutional layers are optimized simultaneously according to Equation 7 using stochastic gradient decent with momentum. The base learning rate is initially set to 0.001, while sparsifying the network parameters. To stabilize the training process, we adopt a thresholding function that sets parameters smaller than 1e−4 to zero during training.” Convolutional kernels are sparsed in the convolutional layers to form sparse convolutional kernels during training. This way the whole network gets trained.).

Response to Arguments
Applicant's arguments filed on 03/01/2021 with respect to the 35 U.S.C. 103 rejections
have been fully considered. Claims 1, 7, 8, 10, and 16 have been amended by the applicant to
address 35 U.S.C. 103 rejections in previous Office Action. Applicant also made arguments that
references in previous Office Action do not teach new amendments. Examiner agrees and
added a new reference Ioannou et al. to teach the amendments added onto these claims. They are addressed in 103 rejections section in this Office Action. All claims remain rejected.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) 
An inquiry concerning this communication or earlier communication from the examiner should be directed QAMAR IQBAL whose telephone number is (571)272-2563. The examiner can normally be reached on M-F 10-6pm (EST). 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on (571)270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 

/Q.I/ 
Examiner 
Art unit 2123
04/11/2021

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123