DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status
Claims 1-3, 6-13, and 16-20 are pending. Claims 4-5 and 14-15 are cancelled. 
Claims 1, 9-11, 19, & 20 are rejected under 35 U.S.C. 103 as being unpatentable over (Papyan, Vardan & Romano, Yaniv & Elad, Michael. (2016). Convolutional Neural Networks Analyzed via Convolutional Sparse Coding. Journal of Machine Learning Research. 18) Herein after Papyan. In view of Erdem et al. (WO2019199244A1) herein after Erdem. further in view of (Li Yuan, Wei Liu, and Yang Li. 2016. Non-negative dictionary based sparse representation classification for ear recognition with occlusion. Neurocomput. 171, January 2016) herein after Yuan and further in view of (Chen, Jingbo & Wang, Chengyi & Zhong, Ma & Chen, Jiansheng & He, Dongxu & Ackland, Stephen. (2018). Remote Sensing Scene Classification Based on Convolutional Neural Networks Pre-Trained Using Attention-Guided Sparse Filters. Remote Sensing. 10. 290. 10.3390/rs10020290.) hereinafter Chen and further in view of  (Zhang, Ruijie & Shen, Jian & Wei, Fushan & Li, Xiong & Sangaiah, Arun. (2017). Medical image classification based on multi-scale non-negative sparse coding. Artificial Intelligence in Medicine. 83. 10.1016/j.artmed.2017.05.006. ) herein after Zhang. 
Claims 2 & 12 are rejected under 35 U.S.C. 103 as being unpatentable over Papyan in view of Erdem, Yuan, Chen, Zhang and further in view of Wang et all (US20190339359A1) hereinafter Wang.  
Claims 3 & 13 are rejected under 35 U.S.C. 103 as being unpatentable over Papyan in view of Erdem, Yuan, Chen, Zhang, and further in view of  (Syed Zubair, Fei Yan, Wenwu Wang, Dictionary learning based sparse coefficients for audio classification with max and average pooling, Digital Signal Processing, Volume 23, Issue 3, 2013) herein after Zubair. 
Claims 6 & 16 are rejected under 35 U.S.C. 103 as being unpatentable over Papyan in view of Erdem, Yuan, Chen, Zhang, and further in view of (Mounir, Hammouche & Ghorbel, Enjie & Fleury, Anthony & Ambellouis, Sebastien. (2016). Toward a Real Time View-invariant 3D Action Recognition. 10.5220/0005843607450754.) Herein after Mounier. 
Claims 7, 8, 17 & 18 are rejected under 35 U.S.C. 103 as being unpatentable over Papyan in view of Erdem, Chan, Zhang, Yang, Mounir and further in view of (“S. D. S. Al-Shaikhli, M. Y. Yang and B. Rosenhahn, "Brain tumor classification using sparse coding and dictionary learning," 2014 IEEE International Conference on Image Processing (ICIP), 2014, pp. 2774-2778, doi: 10.1109/ICIP.2014.7025561.”) Herein after Al-Shaikhli. 

Response to Amendment
The amended claims 1 and 11 overcome the rejection under 35 USC 101; the rejection of claims 1-3, 9-13, and 19-20 under 35 USC 101 is withdrawn. In response to the amendment, the grounds of rejection under 35 USC 103 have been updated.

Response to Arguments
Applicant's arguments filed 09 August 2022 have been fully considered but they are not persuasive. The arguments are substantially the same as those filed 06 April 2022 and are responded to in much the same way.
The applicant argues:
In other words, Papyan does not disclose the feature of “sparse coding layer”. … However, Applicant thinks that Papyan does not disclose the feature of “sparse coding layer” and also does not teach the feature “ the sparse coding layer uses a dictionary atom to reconstruct a signal on a projection of the normalized input signal passing through the convolutional layer”. … However, Papyan does not disclose sparse coding layer.
	The argument is not persuasive. Papyan does teach a sparse coding layer as shown in the clarified 35 USC 103 rejection below.  Especially note: [Abstract] “In parallel, within the wide field of sparse approximation, Convolutional Sparse Coding (CSC) has gained increasing attention in recent years. […] Herein, we propose a novel multi-layer model, ML-CSC, in which signals are assumed to emerge from a cascade of CSC layers.” Within the novel multi-layer model, convolutional sparse coding (CSC) layers exist.
The dependent claims are argued to be allowable because the independent claims are supposedly allowable.  However, the independent claims are not allowable; therefore, the dependent claims are not allowable since they do not add any further allowable limitations.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 9-11, 19, & 20 are rejected under 35 U.S.C. 103 as being unpatentable over (Papyan, Vardan & Romano, Yaniv & Elad, Michael. (2016). Convolutional Neural Networks Analyzed via Convolutional Sparse Coding. Journal of Machine Learning Research. 18) Herein after Papyan. In view of Erdem et al. (WO2019199244A1) herein after Erdem. further in view of (Li Yuan, Wei Liu, and Yang Li. 2016. Non-negative dictionary based sparse representation classification for ear recognition with occlusion. Neurocomput. 171, January 2016) herein after Yuan and further in view of (Chen, Jingbo & Wang, Chengyi & Zhong, Ma & Chen, Jiansheng & He, Dongxu & Ackland, Stephen. (2018). Remote Sensing Scene Classification Based on Convolutional Neural Networks Pre-Trained Using Attention-Guided Sparse Filters. Remote Sensing. 10. 290. 10.3390/rs10020290.) hereinafter Chen and further in view of  (Zhang, Ruijie & Shen, Jian & Wei, Fushan & Li, Xiong & Sangaiah, Arun. (2017). Medical image classification based on multi-scale non-negative sparse coding. Artificial Intelligence in Medicine. 83. 10.1016/j.artmed.2017.05.006. ) herein after Zhang.

Regarding claim 1: 
Papyan teaches:
“and adding a sparse coding layer after the convolutional layer,” ([Page 2] “The first convolves the input with a set of learned filters, resulting in a set of feature (or kernel) maps. These then undergo a point wise non-linear function, in a second step, often resulting in a sparse outcome (Glorot et al., 2011). A third (and optional) down-sampling step, termed pooling, is then applied on the result in order to reduce its dimensions. The output of this layer is then fed into another one, thus forming the multi-layer structure, often termed forward pass” The convolving of the input is passing the input through a convolutional layer the second step that results in a sparse representation is the sparse coding layer. The sparse representation is can also be called a projection of the input signal. [Page 12] “3. From Atoms to Molecules: Multi-Layer Convolutional Sparse Model” The model has multiple layers and is sparse. [Abstract] “In parallel, within the wide field of sparse approximation, Convolutional Sparse Coding (CSC) has gained increasing attention in recent years. […] Herein, we propose a novel multi-layer model, ML-CSC, in which signals are assumed to emerge from a cascade of CSC layers.” Within the novel multi-layer model, there exists convolutional sparse coding (CSC) layers. [Page 14] “Relying on this, we now extend the dictionary learning problem, as presented in Section 2.2.3, to the multi-layer convolutional sparse representation setting.” )
“wherein the sparse coding layer uses a dictionary atom to reconstruct a signal on a projection of the normalized input signal passing through the convolutional layer,”  ([Page 2]“In this framework, one assumes that a signal can be represented as a linear combination of a few columns (called atoms) from a matrix termed a dictionary. Put differently, the signal is equal to a multiplication of a dictionary by a sparse vector. The task of retrieving the sparsest representation of a signal over a dictionary is called sparse coding” The sparse vector is a projection of the input signal) 
“and the sparse coding layer receives a mini-batch input to refresh the dictionary atom.” ([Page 9 section 2.2.3] “Although the sparse coding problem under these can be done very efficiently, over the years many have shifted to a data driven approach - adapting the dictionary D to a set of training signals at hand via some learning procedure.”, [Page 6] “assume that the parameters of the CNN model are pre-trained and fixed. These, for example, could have been obtained by minimizing the above objective via the backpropagation algorithm and the stochastic gradient descent, as in the VGG network” when training the dictionary via stochastic gradient descent the updating of the dictionary via the mini-batch input is inherent to the process.) 
“wherein when the convolutional neural network ([Abstract] “Convolutional neural networks (CNN) have led to many state-of-the-art results spanning through various fields.”) is a single-channel (pg. 20 Table 1 m-i is the number of channels and can equal 1. “Notice that m0 = 1.”) sparse coding convolutional ([Abstract] “In parallel, within the wide field of sparse approximation, Convolutional Sparse Coding (CSC) has gained increasing attention in recent years. […] Herein, we propose a novel multi-layer model, ML-CSC, in which signals are assumed to emerge from a cascade of CSC layers.”) neural network ([Abstract] “Convolutional neural networks (CNN) have led to many state-of-the-art results spanning through various fields.”)  ,” 
“the sparse coding layer is located after a fully connected layer,” ([Page 2] “neural networks were proven to preserve the metric structure of the input data as it propagates through the layers of the network. This, in turn, was shown to allow a stable recovery of the data from the features obtained from the network.”, [Page 23 section 5.5] “One should note that the convolutional structure imposed on the dictionaries in our model could be removed, and the theoretical guarantees we have provided above would still hold. The reason being is that the unconstrained dictionary can be regarded as a convolutional one, constructed from a single shift of a local matrix with no circular boundary. In the context of CNN, this is analogous to a fully connected layer. As such, the theoretical analysis provided here sheds light on both convolutional and fully connected networks. A different point of view on the same matter can also be proposed; fully connected layers can be viewed as convolutional ones with filters that cover their entire input” Since the sparse coding is the reconstruction of a signal from the sparse representation of that signal it must occur after the convolutional network because the sparse signal is propagated to the end of the convolutional neural network. It is also noted that the structure of the convolutional neural network used in the art is equivalent to a fully connected network that the reference effectively teaches the use of a sparse coding layer after a fully connected layer.)
“wherein when the convolutional neural network ([Abstract] “Convolutional neural networks (CNN) have led to many state-of-the-art results spanning through various fields.”) is a multi-channel (pg. 20 Table 1 m-i is the number of channels. m-I can be values other than 1, making it multi-channel.) sparse coding convolutional ([Abstract] “In parallel, within the wide field of sparse approximation, Convolutional Sparse Coding (CSC) has gained increasing attention in recent years. […] Herein, we propose a novel multi-layer model, ML-CSC, in which signals are assumed to emerge from a cascade of CSC layers.”) neural network ([Abstract] “Convolutional neural networks (CNN) have led to many state-of-the-art results spanning through various fields.”)  ,” 

Papyan does not distinctly disclose:
“receiving an input signal and performing normalization on the input signal;”
“transmitting the normalized input signal to a convolutional layer;” 
“the sparse coding layer performs a sparse non-negative coding with respect to the dictionary trained through a feature vector set, 
“and performs classification on the type of the dictionary atom having a minimum residual or a maximum coefficient,”
 “the sparse coding layer is located before the fully connected layer,”
“the sparse coding layer respectively trains the dictionary with respect to a plurality of convolutional diagrams of a plurality of channels output by the convolutional layer, 
“uses a sparse non-negative coding algorithm to obtain a coefficient corresponding to the dictionary atom, and performs classification on the type of the dictionary atom through a channel-wise voting.”
However, 
Erdem teaches:
“receiving an input signal and performing normalization on the input signal;” ([claim 1] “Normalizing input patches and filter coefficients of convolutional neural network layers for providing faster convergence in limited database,” the input patches are an input signal, [Page 5 ln 10] “The NCC layer is a variation of a convolutional layer of a neural network, with the exception that the input is normalized prior to being convolved with the filters (or kernels) of that layer.) 
“transmitting the normalized input signal to a convolutional layer;” ([Page 4 ln11]“Normalizing input patches and filter coefficients of convolutional neural network layers for providing faster convergence in limited database,” The normalized input is  clearly described as being the input to a convolutional neural network.)
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine Convolutional sparse coding of Papyan with the normalized convolutional layer of Erdem to improve the handling of infrared signals. ([Page 3 ln 33] “Thus, the NCC layer, when infrared detection and recognition tasks are considered, has more generalization power compared to a convolutional layer”)
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine Convolutional sparse coding of Papyan with the normalized convolutional layer of Erdem to improve the handling of infrared signals. ([Page 3 ln 33] “Thus, the NCC layer, when infrared detection and recognition tasks are considered, has more generalization power compared to a convolutional layer”)

Erdem does not distinctly disclose:
“the sparse coding layer performs a sparse non-negative coding with respect to the dictionary trained through a feature vector set,”
“and performs classification on the type of the dictionary atom having a minimum residual or a maximum coefficient”
“the sparse coding layer is located before the fully connected layer,”
“the sparse coding layer respectively trains the dictionary with respect to a plurality of convolutional diagrams of a plurality of channels output by the convolutional layer, 
“uses a sparse non-negative coding algorithm to obtain a coefficient corresponding to the dictionary atom, and performs classification on the type of the dictionary atom through a channel-wise voting.”

Yuan Teaches: 
“the sparse coding layer performs a sparse non-negative coding with respect to the dictionary trained through a feature vector set,”  ([Page 540-541]“In the SRC model, the dictionary is consisted of two parts: the feature dictionary and the occlusion dictionary. The feature dictionary is usually constructed with feature vectors extracted from the source images” The sparse representation coding model is clearly described as using feature vectors to train a dictionary)
“and performs classification on the type of the dictionary atom having a minimum residual or a maximum coefficient.” ([Page 542 left column] “Then this non-negative dictionary is applied for solving the non-negative sparse representation classification model.”, [figure 1],  “[Page 541 right column] “A desirable solution to α will be that all the coefficients […] are nearly zero and only the coefficients in α have significant values.”,  [Page 545 right column] “the horizontal axis represents the class label of the atoms in the feature dictionary, the vertical axis represents the average sparse coefficients for each class” The atom is associated with the coefficient. Thus classifying the coefficient is classifying the atom. The non-negative sparse coefficient is the input to the classification module. The classification module uses the coefficient to perform the classification. It is also shown that the coefficients are minimized (nearly 0) or maximized (have significant value)).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine Convolutional sparse coding of Papyan modified by the normalized convolutional layer of Erdem with the non-negative sparse coding taught by Yuan for the purpose of making the dictionary atoms more meaningful ([Page 541 left column] “The motivation of using non-negative dictionary is to make the basis atoms in the dictionary more visual and physical meaningful”)

Yuan does not distinctly disclose:
“the sparse coding layer is located before the fully connected layer,”
“the sparse coding layer respectively trains the dictionary with respect to a plurality of convolutional diagrams of a plurality of channels output by the convolutional layer, 
“uses a sparse non-negative coding algorithm to obtain a coefficient corresponding to the dictionary atom, and performs classification on the type of the dictionary atom through a channel-wise voting.”
However Chen teaches:
“the sparse coding layer is located before a fully connected layer”, ([Page 15] “The discriminative features learnt by sparse filters under the guidance of saliency are used to initialize the convolutional kernels of a CNN”, [Figure 3]  figure 3 clearly shows the use of a convolutional neural network(CNN) containing a fully connected layer. Since the sparse filters are before the CNN the sparse coding layer must be before the fully connected layer.)

Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine Convolutional sparse coding of Papyan modified by the normalized convolutional layer of Erdem further modified by the non-negative sparse coding taught by Yuan with the sparse coding layer placed before fully connected layer of Chen for the purpose of using unsupervised training.([Page 3 section 1.3] “in this paper we propose a land-use scene classification method based on a CNN pre-trained using unsupervised attention-guided sparse filters. With the aim of being totally unsupervised in the process of feature learning”)

Chen does not distinctly disclose:
“the sparse coding layer respectively trains 17File: 69478usf the dictionary with respect to a plurality of convolutional diagrams of a plurality of channels output by the convolutional layer,”
“uses a sparse non-negative coding algorithm to obtain a coefficient corresponding to the dictionary atom, and performs classification on the type of the dictionary atom through a channel-wise voting.”
However,
Zhang teaches:
“the sparse coding layer respectively trains 17File: 69478usf the dictionary with respect to a plurality of convolutional diagrams of a plurality of channels output by the convolutional layer,” ([Page 2 right column] “In this paper, we employ the Gaussian function to perform medical images’ multi-scale transformation. Firstly, utilize Gaussian filters to smooth images, assuming that I(x,y) denotes the input image, the smooth images L(x, y, σ) is obtained by the convolution of input image I(x, y) and Gaussian function”, [Page 4 right column]“Finally, sparse representation coefficients X can be generated through training and will be employed to train the SVM classifier.”, [Page 5 right column] “We randomly select 10,000 SIFT descriptors to train the dictionary. Due to the support vector machine (SVM)[36–38] has been successfully used in image classification fields and has an excellent performance [39]. We also employ SVM as our classifier.” Convolutional layers are just filters and the Gaussian filters recited in Zhang perform a convolutional function on the input which is then used to train the classifier which produces the dictionary.)

“uses a sparse non-negative coding algorithm to obtain a coefficient corresponding to the dictionary atom, and performs classification on the type of the dictionary atom through a channel-wise voting.” ([Page 48 left column] “Finally, integrate all the sparse feature vectors {α˛i} through average fusion method and the sparse representation vector P of the medical image will be obtained. For each medical image type, L SVM models are trained. Thus each test medical image will get L classification results. We employ a max-voting strategy [33, 34] to integrate L results and the image type with maximum votes is viewed as the final classification result”. Zhang clearly shows the use of a sparse coding algorithm to classify dictionary atoms, a max voting strategy involved finding the channel with the most votes.)

Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine Convolutional sparse coding of Papyan modified by the normalized convolutional layer of Erdem further modified by the non-negative sparse coding taught by Yuan further modified by the sparse and fully connected layers of Chen with the sparse coding dictionary training and classification of Zhang for the purpose of making the system suitable for images ([Page 3 left column]“The conventional dictionary learning algorithm is based on minimizing the signal reconstruction error. However, this is not suitable for image classification tasks. In this paper, introducing multi-scale image decomposition and combing the non-negative locality sparse coding algorithm, we propose a multi-scale sparse encoding model”)

Regarding claim 9:
Papyan, Erdem, Yuan, Chen, and Zhang teach the method of claim 1 as above,
Papyan further teaches:
“wherein the sparse coding layer comprises a dictionary learning portion and a reconstruction portion,” ([Page 9 section 2.2.3] “over the years many have shifted to a data driven approach - adapting the dictionary D to a set of training signals at hand via some learning procedure.”, [Page 2] “In this framework, one assumes that a signal can be represented as a linear combination of a few columns (called atoms) from a matrix termed a dictionary. Put differently, the signal is equal to a multiplication of a dictionary by a sparse vector. The task of retrieving the sparsest representation of a signal over a dictionary is called sparse coding” Papyan clearly teaches the use of a sparse coding system that uses a dictionary learning portion and a reconstruction portion)
“when a residual of the dictionary learning portion is smaller than a threshold value, the reconstruction portion uses a product of the dictionary and a coefficient corresponding to the dictionary atom to output a reconstructed data.” ([Page 10] “The above formulation is an unsupervised learning procedure, and it was later extended to a supervised setting. In this context, given a set of signals {Xj}j , one attempts to predict their corresponding labels h{Xj}j . A common approach for tackling this is first solving a pursuit problem for each signal Xj over a dictionary D, […] and then feeding these sparse representations into a simple classier, defined by the parameters.”, [Page 10] “The loss function` in the above objective penalizes the estimated label if it is different from the true h(Xj ), similar to what we have seen in Section 2.1. The above formulation contains in it the unsupervised option as a special case, in which U is of no importance, and the loss function is the representation error” The normal operation of the system taught by Papyan will after training use a dictionary  and an atom to output reconstructed data. The penalizing of the objective based on the estimated label is a training function that must meet a threshold value of accuracy in order to not be penalized and ensure the normal operation of the system qua outputting reconstructed data)

Regarding claim 10:
Papyan, Erdem, Yuan, Chen, and Zhang teach the method of claim 1 as above,
Papyan further teaches:
“wherein the sparse coding layer refreshes the dictionary atom according a feature of the mini-batch.” ([Page 9 section 2.2.3] “Although the sparse coding problem under these can be done very efficiently, over the years many have shifted to a data driven approach - adapting the dictionary D to a set of training signals at hand via some learning procedure.”, [Page 6] “ assume that the parameters of the CNN model are pre-trained and fixed. These, for example, could have been obtained by minimizing the above objective via the backpropagation algorithm and the stochastic gradient descent, as in the VGG network” when training the dictionary via stochastic gradient descent the updating of the dictionary via the mini-batch input is inherent to the process.) 


Regarding claim 11:
Papyan teaches:
“and adding a sparse coding layer after the convolutional layer,” ([Page 2] “The first convolves the input with a set of learned filters, resulting in a set of feature (or kernel) maps. These then undergo a point wise non-linear function, in a second step, often resulting in a sparse outcome (Glorot et al., 2011). A third (and optional) down-sampling step, termed pooling, is then applied on the result in order to reduce its dimensions. The output of this layer is then fed into another one, thus forming the multi-layer structure, often termed forward pass” The convolving of the input is passing the input through a convolutional layer the second step that results in a sparse representation is the sparse coding layer. The sparse representation is can also be called a projection of the input signal. [Page 12] “3. From Atoms to Molecules: Multi-Layer Convolutional Sparse Model” The model has multiple layers and is sparse. [Abstract] “In parallel, within the wide field of sparse approximation, Convolutional Sparse Coding (CSC) has gained increasing attention in recent years. […] Herein, we propose a novel multi-layer model, ML-CSC, in which signals are assumed to emerge from a cascade of CSC layers.” Within the novel multi-layer model, there exists convolutional sparse coding (CSC) layers. [Page 14] “Relying on this, we now extend the dictionary learning problem, as presented in Section 2.2.3, to the multi-layer convolutional sparse representation setting.” )
“wherein the sparse coding layer uses a dictionary atom to reconstruct a signal on a projection of the normalized input signal passing through the convolutional layer,”  ([Page 2]“In this framework, one assumes that a signal can be represented as a linear combination of a few columns (called atoms) from a matrix termed a dictionary. Put differently, the signal is equal to a multiplication of a dictionary by a sparse vector. The task of retrieving the sparsest representation of a signal over a dictionary is called sparse coding” The sparse vector is a projection of the input signal) 
“and the sparse coding layer receives a mini-batch input to refresh the dictionary atom.” ([Page 9 section 2.2.3] “Although the sparse coding problem under these can be done very efficiently, over the years many have shifted to a data driven approach - adapting the dictionary D to a set of training signals at hand via some learning procedure.”, [Page 6] “assume that the parameters of the CNN model are pre-trained and fixed. These, for example, could have been obtained by minimizing the above objective via the backpropagation algorithm and the stochastic gradient descent, as in the VGG network” when training the dictionary via stochastic gradient descent the updating of the dictionary via the mini-batch input is inherent to the process.) 
“wherein when the convolutional neural network ([Abstract] “Convolutional neural networks (CNN) have led to many state-of-the-art results spanning through various fields.”) is a single-channel (pg. 20 Table 1 m-i is the number of channels and can equal 1. “Notice that m0 = 1.”) sparse coding convolutional ([Abstract] “In parallel, within the wide field of sparse approximation, Convolutional Sparse Coding (CSC) has gained increasing attention in recent years. […] Herein, we propose a novel multi-layer model, ML-CSC, in which signals are assumed to emerge from a cascade of CSC layers.”) neural network ([Abstract] “Convolutional neural networks (CNN) have led to many state-of-the-art results spanning through various fields.”)  ,” 
“the sparse coding layer is located after a fully connected layer,” ([Page 2] “neural networks were proven to preserve the metric structure of the input data as it propagates through the layers of the network. This, in turn, was shown to allow a stable recovery of the data from the features obtained from the network.”, [Page 23 section 5.5] “One should note that the convolutional structure imposed on the dictionaries in our model could be removed, and the theoretical guarantees we have provided above would still hold. The reason being is that the unconstrained dictionary can be regarded as a convolutional one, constructed from a single shift of a local matrix with no circular boundary. In the context of CNN, this is analogous to a fully connected layer. As such, the theoretical analysis provided here sheds light on both convolutional and fully connected networks. A different point of view on the same matter can also be proposed; fully connected layers can be viewed as convolutional ones with filters that cover their entire input” Since the sparse coding is the reconstruction of a signal from the sparse representation of that signal it must occur after the convolutional network because the sparse signal is propagated to the end of the convolutional neural network. It is also noted that the structure of the convolutional neural network used in the art is equivalent to a fully connected network that the reference effectively teaches the use of a sparse coding layer after a fully connected layer.)
“wherein when the convolutional neural network ([Abstract] “Convolutional neural networks (CNN) have led to many state-of-the-art results spanning through various fields.”) is a multi-channel (pg. 20 Table 1 m-i is the number of channels. m-I can be values other than 1, making it multi-channel.) sparse coding convolutional ([Abstract] “In parallel, within the wide field of sparse approximation, Convolutional Sparse Coding (CSC) has gained increasing attention in recent years. […] Herein, we propose a novel multi-layer model, ML-CSC, in which signals are assumed to emerge from a cascade of CSC layers.”) neural network ([Abstract] “Convolutional neural networks (CNN) have led to many state-of-the-art results spanning through various fields.”)  ,” 

Papyan does not distinctly disclose:
“A machine learning device, comprising: a processor; and a memory, coupled to the processor,”
“wherein the processor receives an input signal and performs normalization on the input signal;”
 “transmitting the normalized input signal to a convolutional layer;” 
“the sparse coding layer performs a sparse non-negative coding with respect to the dictionary trained through a feature vector set, 
“and performs classification on the type of the dictionary atom having a minimum residual or a maximum coefficient,”
 “the sparse coding layer is located before the fully connected layer,”
“the sparse coding layer respectively trains the dictionary with respect to a plurality of convolutional diagrams of a plurality of channels output by the convolutional layer, 
“uses a sparse non-negative coding algorithm to obtain a coefficient corresponding to the dictionary atom, and performs classification on the type of the dictionary atom through a channel-wise voting.”
However, 
Erdem teaches:
“A machine learning device, comprising: a processor, configured to integrate a convolutional neural network and a sparse coding algorithm; and a memory, coupled to the processor,” ([Page 6 ln 3] “The reason we chose to separate these two formulas is practical. Extremely fast GPU-based solutions exist for forward and backward convolution operations in CNNs. Thus, instead of constructing the function for this new layer from scratch, it is practically much more convenient to detach two operations, derive functions for normalization only, append these functions to a convolutional layer of an existing CNN library (such as MatConvNet [24])” A GPU is in essence a processor connected to memory. The use of the MatConvNet library also implies the use of a computer consisting in essence of a processor connected to memory. MatConvNet is a CNN library that configures the processor.)

“receiving an input signal and performing normalization on the input signal;” ([claim 1] “Normalizing input patches and filter coefficients of convolutional neural network layers for providing faster convergence in limited database,” the input patches are an input signal, [Page 5 ln 10] “The NCC layer is a variation of a convolutional layer of a neural network, with the exception that the input is normalized prior to being convolved with the filters (or kernels) of that layer.) 
“transmitting the normalized input signal to a convolutional layer;” ([Page 4 ln 11]“Normalizing input patches and filter coefficients of convolutional neural network layers for providing faster convergence in limited database,” The normalized input is  clearly described as being the input to a convolutional neural network.)
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine Convolutional sparse coding of Papyan with the normalized convolutional layer of Erdem to improve the handling of infrared signals. ([Page 3 ln 33] “Thus, the NCC layer, when infrared detection and recognition tasks are considered, has more generalization power compared to a convolutional layer”)

Erdem does not distinctly disclose:
“the sparse coding layer performs a sparse non-negative coding with respect to the dictionary trained through a feature vector set,”
“and performs classification on the type of the dictionary atom having a minimum residual or a maximum coefficient”
“the sparse coding layer is located before the fully connected layer,”
“the sparse coding layer respectively trains the dictionary with respect to a plurality of convolutional diagrams of a plurality of channels output by the convolutional layer, 
“uses a sparse non-negative coding algorithm to obtain a coefficient corresponding to the dictionary atom, and performs classification on the type of the dictionary atom through a channel-wise voting.”

Yuan Teaches: 
“the sparse coding layer performs a sparse non-negative coding with respect to the dictionary trained through a feature vector set,”  ([Page 540-541]“In the SRC model, the dictionary is consisted of two parts: the feature dictionary and the occlusion dictionary. The feature dictionary is usually constructed with feature vectors extracted from the source images” The sparse representation coding model is clearly described as using feature vectors to train a dictionary)
“and performs classification on the type of the dictionary atom having a minimum residual or a maximum coefficient.” ([Page 542 left column] “Then this non-negative dictionary is applied for solving the non-negative sparse representation classification model.”, [figure 1],  “[Page 541 right column] “A desirable solution to α will be that all the coefficients […] are nearly zero and only the coefficients in α have significant values.”,  [Page 545 right column] “the horizontal axis represents the class label of the atoms in the feature dictionary, the vertical axis represents the average sparse coefficients for each class” The atom is associated with the coefficient. Thus classifying the coefficient is classifying the atom. The non-negative sparse coefficient is the input to the classification module. The classification module uses the coefficient to perform the classification. It is also shown that the coefficients are minimized (nearly 0) or maximized (have significant value)).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine Convolutional sparse coding of Papyan modified by the normalized convolutional layer of Erdem with the non-negative sparse coding taught by Yuan for the purpose of making the dictionary atoms more meaningful ([Page 541 left column] “The motivation of using non-negative dictionary is to make the basis atoms in the dictionary more visual and physical meaningful”)

Yuan does not distinctly disclose:
“the sparse coding layer is located before the fully connected layer,”
“the sparse coding layer respectively trains the dictionary with respect to a plurality of convolutional diagrams of a plurality of channels output by the convolutional layer, 
“uses a sparse non-negative coding algorithm to obtain a coefficient corresponding to the dictionary atom, and performs classification on the type of the dictionary atom through a channel-wise voting.”
However Chen teaches:
“the sparse coding layer is located before a fully connected layer”, ([Page 15] “The discriminative features learnt by sparse filters under the guidance of saliency are used to initialize the convolutional kernels of a CNN”, [Figure 3]  figure 3 clearly shows the use of a convolutional neural network(CNN) containing a fully connected layer. Since the sparse filters are before the CNN the sparse coding layer must be before the fully connected layer.)

Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine Convolutional sparse coding of Papyan modified by the normalized convolutional layer of Erdem further modified by the non-negative sparse coding taught by Yuan with the sparse coding layer placed before fully connected layer of Chen for the purpose of using unsupervised training.([Page 3 section 1.3] “in this paper we propose a land-use scene classification method based on a CNN pre-trained using unsupervised attention-guided sparse filters. With the aim of being totally unsupervised in the process of feature learning”)

Chen does not distinctly disclose:
“the sparse coding layer respectively trains 17File: 69478usf the dictionary with respect to a plurality of convolutional diagrams of a plurality of channels output by the convolutional layer,”
“uses a sparse non-negative coding algorithm to obtain a coefficient corresponding to the dictionary atom, and performs classification on the type of the dictionary atom through a channel-wise voting.”
However,
Zhang teaches:
“the sparse coding layer respectively trains 17File: 69478usf the dictionary with respect to a plurality of convolutional diagrams of a plurality of channels output by the convolutional layer,” ([Page 2 right column] “In this paper, we employ the Gaussian function to perform medical images’ multi-scale transformation. Firstly, utilize Gaussian filters to smooth images, assuming that I(x,y) denotes the input image, the smooth images L(x, y, σ) is obtained by the convolution of input image I(x, y) and Gaussian function”, [Page 4 right column]“Finally, sparse representation coefficients X can be generated through training and will be employed to train the SVM classifier.”, [Page 5 right column] “We randomly select 10,000 SIFT descriptors to train the dictionary. Due to the support vector machine (SVM)[36–38] has been successfully used in image classification fields and has an excellent performance [39]. We also employ SVM as our classifier.” Convolutional layers are just filters and the Gaussian filters recited in Zhang perform a convolutional function on the input which is then used to train the classifier which produces the dictionary.)

“uses a sparse non-negative coding algorithm to obtain a coefficient corresponding to the dictionary atom, and performs classification on the type of the dictionary atom through a channel-wise voting.” ([Page 48 left column] “Finally, integrate all the sparse feature vectors {α˛i} through average fusion method and the sparse representation vector P of the medical image will be obtained. For each medical image type, L SVM models are trained. Thus each test medical image will get L classification results. We employ a max-voting strategy [33, 34] to integrate L results and the image type with maximum votes is viewed as the final classification result”. Zhang clearly shows the use of a sparse coding algorithm to classify dictionary atoms, a max voting strategy involved finding the channel with the most votes.)

Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine Convolutional sparse coding of Papyan modified by the normalized convolutional layer of Erdem further modified by the non-negative sparse coding taught by Yuan further modified by the sparse and fully connected layers of Chen with the sparse coding dictionary training and classification of Zhang for the purpose of making the system suitable for images ([Page 3 left column]“The conventional dictionary learning algorithm is based on minimizing the signal reconstruction error. However, this is not suitable for image classification tasks. In this paper, introducing multi-scale image decomposition and combing the non-negative locality sparse coding algorithm, we propose a multi-scale sparse encoding model”)

Regarding claim 19:
Papyan, Erdem, Yuan, Chen, and Zhang teach the machine learning device of claim 11,
Papyan further teaches:
 “wherein the sparse coding layer comprises a dictionary learning portion and a reconstruction portion,” ([Page 9, section 2.2.3] “over the years many have shifted to a data driven approach - adapting the dictionary D to a set of training signals at hand via some learning procedure.”, [Page 2] “In this framework, one assumes that a signal can be represented as a linear combination of a few columns (called atoms) from a matrix termed a dictionary. Put differently, the signal is equal to a multiplication of a dictionary by a sparse vector. The task of retrieving the sparsest representation of a signal over a dictionary is called sparse coding” Papyan clearly teaches the use of a sparse coding system that uses a dictionary learning portion and a reconstruction portion)
“when a residual of the dictionary learning portion is smaller than a threshold value, the reconstruction portion uses a product of the dictionary and a coefficient corresponding to the dictionary atom to output a reconstructed data.” ([Page 10], “The above formulation is an unsupervised learning procedure, and it was later extended to a supervised setting. In this context, given a set of signals {Xj}j , one attempts to predict their corresponding labels h{Xj}j . A common approach for tackling this is  first solving a pursuit problem for each signal Xj over a dictionary D, […] and then feeding these sparse representations into a simple classier, defined by the parameters.”,[Page 10] “The loss function` in the above objective penalizes the estimated label if it is different from the true h(Xj ), similar to what we have seen in Section 2.1. The above formulation contains in it the unsupervised option as a special case, in which U is of no importance, and the loss function is the representation error” The normal operation of the system taught by Papyan will after training use a dictionary  and an atom to output reconstructed data. The penalizing of the objective based on the estimated label is a training function that must meet a threshold value of accuracy in order to not be penalized and ensure the normal operation of the system qua outputting reconstructed data)

Regarding claim 20:
Papyan, Erdem, Yuan, Chen, and Zhang teach the machine learning device of claim 11,
Papyan further teaches:
“wherein the sparse coding layer refreshes the dictionary atom according a feature of the mini-batch.” ([Page 9 section 2.2.3] “Although the sparse coding problem under these can be done very efficiently, over the years many have shifted to a data driven approach - adapting the dictionary D to a set of training signals at hand via some learning procedure.”, [Page 6] “ assume that the parameters of the CNN model are pre-trained and fixed. These, for example, could have been obtained by minimizing the above objective via the backpropagation algorithm and the stochastic gradient descent, as in the VGG network” when training the dictionary via stochastic gradient descent the updating of the dictionary via the mini-batch input is inherent to the process.) 

Claims 2 & 12 are rejected under 35 U.S.C. 103 as being unpatentable over Papyan in view of Erdem, Yuan, Chen, Zhang and further in view of Wang et all (US20190339359A1) hereinafter Wang. 

Regarding claim 2:
Papyan, Erdem, Yuan, Chen, and Zhang teach the method of claim 1 as above,
Erdem further teaches:
“receiving an input signal and performing normalization on the input signal;” ([claim 1] “Normalizing input patches and filter coefficients of convolutional neural network layers for providing faster convergence in limited database,” the input patches are an input signal, [Page 5 ln 10] “The NCC layer is a variation of a convolutional layer of a neural network, with the exception that the input is normalized prior to being convolved with the filters (or kernels) of that layer.” Convolving an input samples the input. A normalized input being convolved is an input being sampled.) 
The motivation for combination is substantially the same as in claim 1
Papyan, Erdem, Yuan, Chen, and Zhang do not distinctly disclose: 
“converting the input signal into a time-frequency diagram;”  
“using a polynomial to perform a fitting of a frequency-wise strength on the time-frequency diagram;”
However,
Wang teaches:
“converting the input signal into a time-frequency diagram;”  ([0003]“Since the frequency of the beat signal is proportional to the distance of object, a standard fast Fourier transform (FFT) of the beat signal can be used to identify peaks and estimate the distance” Converting a signal into a time frequency diagram is inherent to performing an FFT)
“using a polynomial to perform a fitting of a frequency-wise strength on the time-frequency diagram;” ([0017] “Similarly, another embodiment approximates the non-linearity function of the modulation in the phase domain using a polynomial basis function. This approximation allows to decompose general smooth non-linearity function by a few number of unknown coefficients within a small approximation error and, hence, recovers the unknown non-linearity function with fewer samples of the beat signal. ” The modulation in the phase domain is the conversion to a time –frequency diagram. The spectrogram in figure 15 shows that the fitting is being performed on the frequency. All polynomial functions are nonlinear and the terms are interchangeable)
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine Convolutional sparse coding of Papyan modified by the normalized convolutional layer of Erdem with the transformation and fitting of Wang for the purpose of recovering the signal with fewer samples. ([0017] “This approximation allows to decompose general smooth non-linearity function by a few number of unknown coefficients within a small approximation error and, hence, recovers the unknown non-linearity function with fewer samples of the beat signal”)

Regarding claim 12:
Papyan, Erdem, Yuan, Chen, and Zhang teach the machine learning device of claim 11,
Erdem further teaches:
“receiving an input signal and performing normalization on the input signal;” ([claim 1] “Normalizing input patches and filter coefficients of convolutional neural network layers for providing faster convergence in limited database,” the input patches are an input signal, [Page 5 ln 10] “The NCC layer is a variation of a convolutional layer of a neural network, with the exception that the input is normalized prior to being convolved with the filters (or kernels) of that layer.” Convolving an input samples the input. A normalized input being convolved is an input being sampled.) 
The motivation for combination is substantially the same as in claim 1
Papyan, Erdem, Yuan, Chen, and Zhang do not distinctly disclose: 
“converting the input signal into a time-frequency diagram;”  
“using a polynomial to perform a fitting of a frequency-wise strength on the time-frequency diagram;”
However,
Wang teaches:
“converting the input signal into a time-frequency diagram;”  ([0003]“Since the frequency of the beat signal is proportional to the distance of object, a standard fast Fourier transform (FFT) of the beat signal can be used to identify peaks and estimate the distance” Converting a signal into a time frequency diagram is inherent to performing an FFT)
“using a polynomial to perform a fitting of a frequency-wise strength on the time-frequency diagram;” ([0017] “Similarly, another embodiment approximates the non-linearity function of the modulation in the phase domain using a polynomial basis function. This approximation allows to decompose general smooth non-linearity function by a few number of unknown coefficients within a small approximation error and, hence, recovers the unknown non-linearity function with fewer samples of the beat signal. ” The modulation in the phase domain is the conversion to a time –frequency diagram. The spectrogram in figure 15 shows that the fitting is being performed on the frequency. All polynomial functions are nonlinear and the terms are interchangeable)
The motivation for combination is substantially the same as in claim 2

Claims 3 & 13 are rejected under 35 U.S.C. 103 as being unpatentable over Papyan in view of Erdem, Yuan, Chen, Zhang, and further in view of  (Syed Zubair, Fei Yan, Wenwu Wang, Dictionary learning based sparse coefficients for audio classification with max and average pooling, Digital Signal Processing, Volume 23, Issue 3, 2013) herein after Zubair.

Regarding claim 3:
Papyan, Erdem, Yuan, Chen, and Zhang teach the method of claim 1 as above,
Papyan, Erdem, Yuan, Chen, and Zhang do not distinctly disclose:
“wherein a coefficient corresponding to the dictionary atom is a real number ranging from -1 to 1.”
However,
Zubair does teach: 
“wherein a coefficient corresponding to the dictionary atom is a real number ranging from -1 to 1.” ([Page 3 right column] “To calculate sparse coefficients of an input signal with a given dictionary, the OMP algorithm [29] projects the input signal on the subspace spanned by the dictionary atoms. The atom which strongly correlates with the signal or its residual is selected and used for calculation of the coefficients. The whole algorithm works as follows: Initialize the residual r0 be the input signal vector Yq and coefficient vector X0 to zero” The coefficient can be zero. Zero is a number between -1 and 1. The coefficient can then be a number between -1 and 1.
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine Convolutional sparse coding of Papyan modified by the normalized convolutional layer of Erdem with the limiting of coefficients taught by Zubair for the purpose of better performance when processing noisy data. “The experimental results show that the sparse (max-pooled and average-pooled) coefficients perform better than the classical MFCCs features, in particular, for noisy audio data”

Regarding claim 13:
Papyan, Erdem, Yuan, Chen, and Zhang teach the machine learning device of claim 11,
Papyan, Erdem, Yuan, Chen, and Zhang do not distinctly disclose:
“wherein a coefficient corresponding to the dictionary atom is a real number ranging from -1 to 1.”
However,
Zubair does teach: 
“wherein a coefficient corresponding to the dictionary atom is a real number ranging from -1 to 1.” ([Page 3 right column] “To calculate sparse coefficients of an input signal with a given dictionary, the OMP algorithm [29] projects the input signal on the subspace spanned by the dictionary atoms. The atom which strongly correlates with the signal or its residual is selected and used for calculation of the coefficients. The whole algorithm works as follows: Initialize the residual r0 be the input signal vector Yq and coefficient vector X0 to zero” The coefficient can be zero. Zero is a number between -1 and 1. The coefficient can then be a number between -1 and 1.
The motivation for combination is substantially the same as in claim 3

Claims 6 & 16 are rejected under 35 U.S.C. 103 as being unpatentable over Papyan in view of Erdem, Yuan, Chen, Zhang, and further in view of (Mounir, Hammouche & Ghorbel, Enjie & Fleury, Anthony & Ambellouis, Sebastien. (2016). Toward a Real Time View-invariant 3D Action Recognition. 10.5220/0005843607450754.) Herein after Mounier.

Regarding claim 6:
Papyan and Erdem and Chen and Zhang teach the method of claim 1 as above,
Papyan and Erdem and Chen and Zhang do not distinctly disclose:
“Wherein the sparse coding layer uses a membership function to calculate a trust level of the different channels with respect to the type of the different dictionary atoms”
“and performs the channel-wise voting according to the trust level.”
However
Yang teaches:
“Wherein the sparse coding layer uses a membership function to calculate a trust level of the different channels with respect to the type of the different dictionary atoms” ([Page3 left column] “This function can be generalized by assigning the values to a specified range and indicate the membership degree of these elements in the set. Such a function is called a membership function,” [Page 4 left column] “In this paper, the FLS is used in the similarity measure between two mutually selected image blocks.”  The similarity measure between two image blocks is a trust measure. The Fuzzy logic system uses this measurement to determine if the image blocks are members of the same set using a membership function.)
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine Convolutional sparse coding of Papyan modified by the normalized convolutional layer of Erdem further modified by the sparse and fully connected layers of Chen further modified by the sparse coding dictionary training and classification of Zhang with the membership function of Yang for the purpose of removing noise from the input. ([Abstract] “An evolutionary fuzzy block-matching-based image denoising algorithm is proposed to remove noise from a camera raw image.”)
Yang does not distinctly disclose:
“and performs the channel-wise voting according to the trust level.” 
However,
Mounir teaches: 
“and performs the channel-wise voting according to the trust level.” ([Page 8 right column] “The aggregation of the three depth-based classifiers is based on the class likelihood provided by each base-classifier (SVMs). The class likelihood information are obtained using libsvm. After training the three base-classifiers separately, we consider two linear combination techniques: the Majority Voting and the Investment.”, [Page 9 left column] “The depth-based classifiers trustworthiness is computed by the sum of the confidence in their decisions, weighted by the ratio of trust previously contributed to each (relative to the other base classifiers s 0 ), as described in (eq 8).” The use of multiple classifiers is because there are multiple channels of data whose combination is arrived at via voting based on the likelihood provided by each classifier which incorporates how accurate the classifier thinks its prediction is or how much it trusts its prediction.)

Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine Convolutional sparse coding of Papyan modified by the normalized convolutional layer of Erdem further modified by the sparse and fully connected layers of Chen further modified by the sparse coding dictionary training and classification of Zhang  further modified by the membership function of Yang with the channel wise voting of Mounir for the purpose of improving human action recognition ([Abstract] “The experimental results show that an efficient combination strategy of base classifiers improves the accuracy and the computational efficiency for human action recognition.”)

Regarding clam 16:
Papyan and Erdem and Chen and Zhang teach the method of claim 11 as above,
Papyan and Erdem and Chen and Zhang do not distinctly disclose:
“Wherein the sparse coding layer uses a membership function to calculate a trust level of the different channels with respect to the type of the different dictionary atoms”
“and performs the channel-wise voting according to the trust level.”
However
Yang teaches:
“Wherein the sparse coding layer uses a membership function to calculate a trust level of the different channels with respect to the type of the different dictionary atoms” ([Page3 left column] “This function can be generalized by assigning the values to a specified range and indicate the membership degree of these elements in the set. Such a function is called a membership function,” [Page 4 left column] “In this paper, the FLS is used in the similarity measure between two mutually selected image blocks.”  The similarity measure between two image blocks is a trust measure. The Fuzzy logic system uses this measurement to determine if the image blocks are members of the same set using a membership function. )
The motivation for combination is substantially the same as in claim 6.
Yang does not distinctly disclose:
“and performs the channel-wise voting according to the trust level.” 
However,
Mounir teaches: 
“and performs the channel-wise voting according to the trust level.” ([Page 8 right column] “The aggregation of the three depth-based classifiers is based on the class likelihood provided by each base-classifier (SVMs). The class likelihood information are obtained using libsvm. After training the three base-classifiers separately, we consider two linear combination techniques: the Majority Voting and the Investment.”, [Page 9 left column] “The depth-based classifiers trustworthiness is computed by the sum of the confidence in their decisions, weighted by the ratio of trust previously contributed to each (relative to the other base classifiers s 0 ), as described in (eq 8).” The use of multiple classifiers is because there are multiple channels of data whose combination is arrived at via voting based on the likelihood provided by each classifier which incorporates how accurate the classifier thinks its prediction is or how much it trusts its prediction.)
The motivation for combination is substantially the same as in claim 6.

Claims 7, 8, 17 & 18 are rejected under 35 U.S.C. 103 as being unpatentable over Papyan in view of Erdem, Chan, Zhang, Yang, Mounir and further in view of (“S. D. S. Al-Shaikhli, M. Y. Yang and B. Rosenhahn, "Brain tumor classification using sparse coding and dictionary learning," 2014 IEEE International Conference on Image Processing (ICIP), 2014, pp. 2774-2778, doi: 10.1109/ICIP.2014.7025561.”) Herein after Al-Shaikhli. 

Regarding claim 7:
Papyan, Erdem, Chen, Zhang, Yang and Mounir teach the method of claim 6 as above,
Papyan, Erdem, Chen, Zhang, Yang and Mounir do not distinctly disclose:
“wherein the membership function comprises a true positive parameter and a true negative parameter”
However, 
Al-Shaikhli teaches:
“wherein the membership function comprises a true positive parameter and a true negative parameter” ([page 4 left column] “The performance for multi-class classification (Recall, Precision, Average Accuracy (AA)) are computed by computing the True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN) using the algorithm [21]” The quote clearly shows the use of a true positive and a true negative parameter.)

Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine Convolutional sparse coding of Papyan modified by the normalized convolutional layer of Erdem further modified by the sparse and fully connected layers of Chen further modified by the sparse coding dictionary training and classification of Zhang further modified by the membership function of Yang further modified by the channel wise voting of Mounir with the true positive and true negative parameters of Al-Shaikhli for the purpose of using both topological and texture features when building a dictionary. ([Page 1 abstract] “We propose an individual (per-class) dictionary learning and sparse coding classification using K-SVD algorithm. This approach combines topological and texture features to build and learn a dictionary.”)

Regarding claim 8:
Papyan, Erdem, Chen, Zhang, Yang and Mounir teach the method of claim 6 as above.
Papyan, Erdem, Chen, Zhang, Yang and Mounir do not distinctly disclose:
“wherein the membership function comprises a precision parameter and a recall parameter.”
However, 
Al-Shaikhli teaches:
“wherein the membership function comprises a precision parameter and a recall parameter.” ([Page 4 left column] “The performance for multi-class classification (Recall, Precision, Average Accuracy (AA)) are computed by computing the True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN) using the algorithm [21]” The quote clearly shows the use of a true positive and a true negative parameter.)
The motivation for combination is substantially the same as in claim 7.

Regarding claim 17:
Papyan, Erdem, Chen, Zhang, Yang and Mounir teach the method of claim 16 as above,
Papyan, Erdem, Chen, Zhang, Yang and Mounir do not distinctly disclose:
“wherein the membership function comprises a true positive parameter and a true negative parameter”
However, 
Al-Shaikhli teaches:
“wherein the membership function comprises a true positive parameter and a true negative parameter” ([page 4 left column] “The performance for multi-class classification (Recall, Precision, Average Accuracy (AA)) are computed by computing the True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN) using the algorithm [21]” The quote clearly shows the use of a true positive and a true negative parameter.)
The motivation for combination is substantially the same as in claim 7.

Regarding claim 18:
Papyan, Erdem, Chen, Zhang, Yang and Mounir teach the method of claim 16 as above,
Papyan, Erdem, Chen, Zhang, Yang and Mounir do not distinctly disclose:
“wherein the membership function comprises a precision parameter and a recall parameter.”
However, 
Al-Shaikhli teaches:
“wherein the membership function comprises a precision parameter and a recall parameter.” ([Page 4 left column] “The performance for multi-class classification (Recall, Precision, Average Accuracy (AA)) are computed by computing the True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN) using the algorithm [21]” The quote clearly shows the use of a true positive and a true negative parameter.)
The motivation for combination is substantially the same as in claim 7.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TROY A MAUST whose telephone number is (571)272-1931. The examiner can normally be reached Monday-Friday 8am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Rehana Perveen can be reached on (571) 272-3676. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/T.A.M./Examiner, Art Unit 2148                                                                                                                                                                                                        
/REHANA PERVEEN/Supervisory Patent Examiner, Art Unit 2148