DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The filing date of the present application is 06/03/2019.
This action is in response to amendment and/or arguments filed on 07/05/2022, claims 1, 12-16 and 18-19 have been amended and claims 10 and 17 have been cancelled. Claims 1-9, 11-16 and 18-20 are currently pending and have been examined. 
In response to amendments and/or remarks filed on 02/28/2022, the 35 U.S.C 112(f) claim interpretations made in the previous Office Action has been withdrawn. 
In response to amendments and/or remarks filed on 02/28/2022, the 35 U.S.C 112(b) rejections made in the previous Office Action has been withdrawn. 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 07/05/2022 has been entered.
 

Response to Arguments
Applicant's arguments filed on 07/05/2022 have been fully considered but they are not persuasive. 
Applicant asserts that “The Office Action concedes that Sevakula, Yumer, and Doctor, alone or in any combination, fail to teach or suggest at least a neural network that comprises a classification layer. The Office Action then attempts to rely on Petridis as allegedly teaching a neural network comprising a classification layer. Petridis states at that " [i]n the second stage, we remove the decoding layers and add two classification layers followed by an output softmax layer." Petridis further states that "[f]irst, a deep autoencoder is trained (a) in order to compress the high dimensional image to a low dimensional representation which is the output of the bottleneck layer. Then, the decoding layers are replaced by classification layers (b) aiming to make the bottleneck features more discriminative. Optionally, the DCT features can be appended in the bottleneck layer in order to make the bottleneck features complementary to DCT features. Finally, both types AMENDMENT AND RESPONSE TO FINAL OFFICE ACTIONPage - 9 - of 10Application Number: 16/466,118of features are augmented with their first and second derivatives, concatenated and fed to an LSTM-RNM classifier (c) which models the temporal dynamics." However, Petridis is entirely silent as to the classification layer back propagating an existing error between an estimated output from the neural network and actual output from labels associated with labeled data. The fact that Petridis employs "classification layers" to "make the bottleneck features more discriminative" is not analogous to Applicant's claims, which include the classification layer back propagating an existing error between an estimated output from the neural network and actual output from labels associated with labeled data. Support for these amendments may be found in at least paragraphs [0035] of Applicant's specification.”

Examiner’s response: 
	The Examiner respectfully disagrees. First, Doctor teaches neural network as evidence by abstract which include plurality of layers(input, hidden and output) and under its broadest reasonable interpretation(BRI) the output layer has been interpreted as classification layer because it classify the input data and output a possibility distribution of the control actions from the fuzzy inference engine as evidence by FIG. 3 and FIG. 6 col 9-10 lines 6-11.  
	Regarding the arguments “Petridis is entirely silent as to the classification layer back propagating an existing error between an estimated output from the neural network and actual output from labels associated with labeled data.”, Doctor reference is now used to reject the above claim limitations on pg. 9. Doctor teaches feedforward backpropagation neural network for each given output decision for the classification or value S and label the output data as evidence by col 14 lines 1-5. 



Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 12-16 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the claims could be considered software per se.
Claim 12 recites " A system comprising: a computing device configured to…" The broadest reasonable interpretation of a claim drawn to "computing device" in view of the present specification on para [0058] says that, the systems can be implemented in software. When the broadest reasonable interpretation of a claim covers a software per se, the claim must be rejected under 35 U.S.C. § 101. See MPEP 2106.03.  Examiner suggested to add “a computing device comprising a processor and memory…”. 
Dependent claims 13-16 are rejected for dependency of claim 12. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-8, 11-16 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Sevakula et al. (“Data Preprocessing methods for Sparse Auto-encoder based Fuzzy Rule Classifier”, hereinafter: Sevakula) in view of Doctor et al. (US Pat. No. 8515884 B2).
Regarding claim 1 
Sevakula teaches a method for reducing dimensionality and improving neural network operation in light of uncertainty or noise, (abstract “This paper proposes data preprocessing methods to improve the performance of SAs during fuzzy rule reduction. The proposed approach enables the SA based fuzzy rule classifier to work on both real, as well as categorical attribute type data sets, and also with improved performance.”)
…
raw data comprising a plurality of samples, (pg. 2 right col section A second paragraph “The second term in (5) is used as a regularization term and the term λ controls its importance in the equation. m refers to the number of training samples, y refers to desired output, nl refers to number of neurons in hidden layer l while sl refers to number of hidden layers.” Also see FIG. 2 where there are plurality of samples real data features and categorical data features)
wherein each sample comprises a plurality of input features; (Examiner notes that Fig. 2 shows input features for both real data feature and categorical get preprocessed new feature which corresponds to generating fuzzy data based on the input features)
generating, at an activation level component of the neural network, (see pg. 4 left col section A “We plan to use fuzzy membership functions (MFs) for this process as using them is a form of non-linear normalization. Second case which happens often is that data may be naturally segregated in groups, with gaps in between. Therefore normalizing the entire data with a single set of parameters may not be best; instead normalizing each group of data with its own set of parameters might show the variety in data in a much better manner”)
fuzzy data based on the raw data; (Examiner notes that Fig. 2 shows input features for both real data feature and categorical get preprocessed new feature which corresponds to generating fuzzy data based on the input feature[corresponds to raw data] and the preprocessing steps include clustering using fuzzy membership function see pg. 4 left col “Our goal for preprocessing is to enable the SA model to work for all data sets. With real data, two cases are possible. First case is when numerical range of values is different for different attributes, which is treated by some kind of data normalization. We plan to use fuzzy membership functions (MFs) for this process as using them is a form of non-linear normalization. Second case which happens often is that data may be naturally segregated in groups, with gaps in between. Therefore normalizing the entire data with a single set of parameters may not be best; instead normalizing each group of data with its own set of parameters might show the variety in data in a much better manner. To account for this case, we plan to use clustering methods to naturally recognize the segregated groups and accordingly find their parameters for normalization. The algorithm for the entire method is given below.”)
…
the raw data and the fuzzy data into an input layer of a neural network autoencoder. (Examiner notes that the input features[correspond to raw data] and the preprocessed New features are feed into autoencoder neural network model as shown on Fig. 2 “Stacked Sparse Auto-Encoder Model with pre-processing”)
Sevakula does not teach the method comprising: receiving, at a training data component of a neural network processing component, 
…
wherein the neural network further comprises a classification layer, and wherein the classification layer provides an output indicating a classification for crisp input of a sample, and wherein the classification layer back propagates an existing error between an estimated output from the neural network and actual output from labels associated with labeled data; and inputting, at a crisp input component and a fuzzy input component of the neural network processing component. 
Doctor teaches the method comprising: receiving, at a training data component of a neural network processing component, (abstract “the neural network including a plurality of layers having at least an input layer, one or more hidden layers and an output layer, each layer comprising a plurality of interconnected neurons, the number of hidden neurons utilized being adaptive, the ANN determining the most important input data and defining therefrom a second ANN,”)
wherein the neural network further comprises a classification layer, (abstract “the neural network including a plurality of layers having at least an input layer, one or more hidden layers and an output layer, each layer comprising a plurality of interconnected neurons, the number of hidden neurons utilized being adaptive, the ANN determining the most important input data and defining therefrom a second ANN,”)
and wherein the classification layer provides an output indicating a classification for crisp input of a sample, (Examiner notes that the crisp input gets feed into the fuzzy neural network(with hidden/output layer[corresponds to classification layer]) as evidence by FIG. 3 and FIG. 6 col 9-10 lines 6-11 “The outputs of the fuzzy inference engine are fuzzy sets that specify a possibility distribution of the control actions. These fuzzy outputs need to be converted to nonfuzzy (crisp) control values that can then be used to operate the various …The type-1 FLC can be completely described using a mathematical formula that maps a crisp input vector X into a crisp output y=f(x). Such a formula can be obtained by following the signal x through the fuzzifier, where it is converted into the fuzzy set A, into the inference block where it is converted into the fuzzy set B”)
and wherein the classification layer back propagates an existing error between an estimated output from the neural network and actual output from labels associated with labeled data; (col 14-15 lines 1-5 “The system employs a Multilayer Preceptron feedforward backpropagation Neural Network net for each given output decision, classification or value S that is designated by the defined linguistic labels s. This Artificial Neural Networks (ANN) model was chosen as it has been shown to be a universal approximator and thus this ANN can find the mapping from the input values to the given output values using only one hidden layer. In addition, this ANN is relatively easy to be rapidly trained. Once, trained, this ANN model can very rapidly map inputs to outputs.”)
Sevakula and Doctor are analogous art because they are both directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified sparse auto-encoder based fuzzy rule classifier of Sevakula to include classification layer that back propagates an existing error between an estimated output from a neural network of Doctor in order to improve the accuracy of prediction and assigned labels correctly as disclosed by Doctor (col 14-15 lines 1-5 “The system employs a Multilayer Preceptron feedforward backpropagation Neural Network net for each given output decision, classification or value S that is designated by the defined linguistic labels s. This Artificial Neural Networks (ANN) model was chosen as it has been shown to be a universal approximator and thus this ANN can find the mapping from the input values to the given output values using only one hidden layer. In addition, this ANN is relatively easy to be rapidly trained. Once, trained, this ANN model can very rapidly map inputs to outputs.”). 

Regarding claim 2
Sevakula in view of Doctor teaches the method of claim 1. 
Sevakula further teaches wherein generating the fuzzy data comprises determining a plurality of clusters based on a body of training data comprising a plurality of samples. (pg. 4 left col step 2 “Using LLoyd’s k-means clustering algorithm, the data is clustered into K groups. The value for K is either given by user or chosen using Silhouette index [20] or set as a default value.”)


Regarding claim 3
Sevakula in view of Doctor teaches the method of claim 2. 
Sevakula further teaches wherein generating the fuzzy data further comprises generating a plurality of membership functions, wherein the plurality of membership functions comprises a membership function for each of the plurality of clusters. (Step 2-3 on pg. 4 left col “Using LLoyd’s k-means clustering algorithm, the data is clustered into K groups. The value for K is either given by user or chosen using Silhouette index [20] or set as a default value. Step 3 Gaussian Bell membership function is defined for each cluster, and a new feature is made for each of the K clusters.”)

Regarding claim 4 (Previously Presented)
Sevakula in view of Doctor teaches the method of claim 3. 
Sevakula further teaches wherein generating the fuzzy data comprises calculating a degree of activation for one or more of the plurality of membership functions for a specific sample, (see training Phase step 2-3 “Given a sample, the fuzzy regions where the input and output vector lie, i.e. where their membership value is maximum, those regions are recognized and a fuzzy rule is formulated. In this manner all fuzzy rules are formulated. Steps 3 and 4 are followed for resolving conflicts amongst rules. Step 3 Calculate βP where P is the class label and varies from {1,2,3,....}. In (1), k ∈ P refers to all samples for a given rule that has P as the output class label and j refers to dimension of input data… where µkj (xkj ) is activation of max membership function for feature j of an input x having n features.”) 
wherein the specific sample comprises a training sample or a real-world sample. (pg. 2 right col second paragraph “The cost function for an auto-encoder is given by (5). W and b refers to weight matrix and bias vector respectively. The second term in (5) is used as a regularization term and the term λ controls its importance in the equation. m refers to the number of training samples, y refers to desired output, nl refers to number of neurons in hidden layer l while sl refers to number of hidden layers.”)
Doctor further teaches and wherein the degree of activation is determined based at least in part on a plurality of specified rules. (Col 9 lines 57-65 “The fuzzy inference engine then maps input fuzzy sets into output fuzzy sets and handles the way in which rules are activated and combined. Rules are activated or fired if there is a non-zero degree of similarity between the system inputs and the antecedents of the rules. The results of such rule firing are outputs that have a non-Zero degree of similarity to the rule's consequents. The outputs of the fuzzy inference engine are fuzzy sets that specify a possibility distribution of the control actions.” also see col 20 lines 48-57 “Whenever the input conditions change, a Snapshot of the state of the current inputs is recorded and passed to the rule adaptation routine. Each input parameter in the input vector X is compared to each of the antecedent sets A' of a given rule in the rule base to determine its membership value. The weight of the rule is then calculated to determine if the product of the input membership function (degree of firing of the rule) in Equation (19) wo, meaning that the rule fired, and would therefore have contributed to the overall control response generated by the FLC.”)
Sevakula and Doctor are analogous art because they are all directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sevakula to incorporate the teaching of Doctor to include a method or system for multipurpose intelligent data analysis and decision support systems.
One of ordinary skill in the art would have been motivated to make this modification in order to improve “controller that can suggest optimised energy set points that will help to reduce the energy consumption in homes and offices” as disclosed by Doctor (col 2 lines 8-15 “In the area of energy management. There is a need to identify the relations between the various factors (such as inside/outside temperature, activity, cloud cover, wind speed, number of occupants, etc) and energy consumption to create an accurate model of the system. This model could be used afterwards to develop a controller that can suggest optimised energy set points that will help to reduce the energy consumption in homes and offices.”). 

Regarding claim 5 (Original)
Sevakula in view of Doctor teaches the method of claim 4. 
Sevakula further teaches wherein inputting the fuzzy data comprises inputting the degree of activation for one or more of the plurality of membership functions into one or more input nodes in an input layer of the autoencoder. (See training Phase step 2-3 on pg. 4 “Given a sample, the fuzzy regions where the input and output vector lie, i.e. where their membership value is maximum, those regions are recognized and a fuzzy rule is formulated. In this manner all fuzzy rules are formulated. Steps 3 and 4 are followed for resolving conflicts amongst rules. Step 3 Calculate βP where P is the class label and varies from {1,2,3,....}. In (1), k ∈ P refers to all samples for a given rule that has P as the output class label and j refers to dimension of input data… where µkj (xkj ) is activation of max membership function for feature j of an input x having n features.” Examiner notes that both input feature and preprocessed New Features are input into autoencoder neural network model on FIG. 2 and also see pg. 4 left col “Our goal for preprocessing is to enable the SA model to work for all data sets. With real data, two cases are possible. First case is when numerical range of values is different for different attributes, which is treated by some kind of data normalization. We plan to use fuzzy membership functions (MFs) for this process as using them is a form of non-linear normalization. Second case which happens often is that data may be naturally segregated in groups, with gaps in between. Therefore normalizing the entire data with a single set of parameters may not be best; instead normalizing each group of data with its own set of parameters might show the variety in data in a much better manner”)

Regarding claim 6 (Original)
Sevakula in view of Doctor teaches the method of claim 1. 
Sevakula further teaches wherein generating the fuzzy data comprises calculating a degree of activation for one or more membership functions determined based on training data, (see training Phase step 2-3 on pg. 4 “Given a sample, the fuzzy regions where the input and output vector lie, i.e. where their membership value is maximum, those regions are recognized and a fuzzy rule is formulated. In this manner all fuzzy rules are formulated. Steps 3 and 4 are followed for resolving conflicts amongst rules. Step 3 Calculate βP where P is the class label and varies from {1,2,3,....}. In (1), k ∈ P refers to all samples for a given rule that has P as the output class label and j refers to dimension of input data… where µkj (xkj ) is activation of max membership function for feature j of an input x having n features.”)
wherein the specific sample comprises a training sample or a real-world sample. (Pg. 2 right col second paragraph “The cost function for an auto-encoder is given by (5). W and b refers to weight matrix and bias vector respectively. The second term in (5) is used as a regularization term and the term λ controls its importance in the equation. m refers to the number of training samples, y refers to desired output, nl refers to number of neurons in hidden layer l while sl refers to number of hidden layers.”)
Doctor further teaches and wherein the degree of activation is determined based at least in part on a plurality of specified rules. (Col 9 lines 57-65 “The fuzzy inference engine then maps input fuzzy sets into output fuzzy sets and handles the way in which rules are activated and combined. Rules are activated or fired if there is a non-zero degree of similarity between the system inputs and the antecedents of the rules. The results of such rule firing are outputs that have a non-Zero degree of similarity to the rule's consequents. The outputs of the fuzzy inference engine are fuzzy sets that specify a possibility distribution of the control actions.” also see col 20 lines 48-57 “Whenever the input conditions change, a Snapshot of the state of the current inputs is recorded and passed to the rule adaptation routine. Each input parameter in the input vector X is compared to each of the antecedent sets A' of a given rule in the rule base to determine its membership value. The weight of the rule is then calculated to determine if the product of the input membership function (degree of firing of the rule) in Equation (19) wo, meaning that the rule fired, and would therefore have contributed to the overall control response generated by the FLC.”)
Sevakula and Doctor are analogous art because they are both directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified sparse auto-encoder based fuzzy rule classifier of Sevakula to include classification layer that back propagates an existing error between an estimated output from a neural network of Doctor in order to improve the accuracy of prediction and assigned labels correctly as disclosed by Doctor (col 14-15 lines 1-5 “The system employs a Multilayer Preceptron feedforward backpropagation Neural Network net for each given output decision, classification or value S that is designated by the defined linguistic labels s. This Artificial Neural Networks (ANN) model was chosen as it has been shown to be a universal approximator and thus this ANN can find the mapping from the input values to the given output values using only one hidden layer. In addition, this ANN is relatively easy to be rapidly trained. Once, trained, this ANN model can very rapidly map inputs to outputs.”). 

Regarding claim 7
Sevakula in view of Doctor teaches the method of claim 1. 
Sevakula further teaches wherein inputting the fuzzy data comprises inputting the degree of activation for one or more of the plurality of membership functions into one or more input nodes in an input layer of the autoencoder. (See training Phase step 2-3 on pg. 4 “Given a sample, the fuzzy regions where the input and output vector lie, i.e. where their membership value is maximum, those regions are recognized and a fuzzy rule is formulated. In this manner all fuzzy rules are formulated. Steps 3 and 4 are followed for resolving conflicts amongst rules. Step 3 Calculate βP where P is the class label and varies from {1,2,3,....}. In (1), k ∈ P refers to all samples for a given rule that has P as the output class label and j refers to dimension of input data… where µkj (xkj ) is activation of max membership function for feature j of an input x having n features.” Also see pg. 4 left col section A “We plan to use fuzzy membership functions (MFs) for this process as using them is a form of non-linear normalization. Second case which happens often is that data may be naturally segregated in groups, with gaps in between. Therefore normalizing the entire data with a single set of parameters may not be best; instead normalizing each group of data with its own set of parameters might show the variety in data in a much better manner”)

Regarding claim 8 
Sevakula in view of Doctor teaches the method of claim 1. 
Sevakula further teaches wherein inputting the raw data and the fuzzy data comprises inputting during training of autoencoder. (Examiner notes that both input feature and preprocessed New Features are input into autoencoder neural network model on FIG. 2 and also see pg. 4 left col “Our goal for preprocessing is to enable the SA model to work for all data sets. With real data, two cases are possible. First case is when numerical range of values is different for different attributes, which is treated by some kind of data normalization. We plan to use fuzzy membership functions (MFs) for this process as using them is a form of non-linear normalization. Second case which happens often is that data may be naturally segregated in groups, with gaps in between. Therefore normalizing the entire data with a single set of parameters may not be best; instead normalizing each group of data with its own set of parameters might show the variety in data in a much better manner”)

Regarding claim 11
Sevakula in view of Doctor teaches the method of claim 1. 
Sevakula further teaches the method further comprising stacking one or more autoencoder layers during training to create a deep stack of auto encoders. (FIG. 1-2 shows stacked autoencoders with multiple layers see pg. 3 and 4 “Fig. 2 Stacked Sparse Auto-Encoder Model with pre-processing”) 

Regarding claim 12 (Currently Amended)
Sevakula teaches a system comprising: a computing device configured to obtain raw data comprising a plurality of training samples; (pg. 2 right col section A second paragraph “The second term in (5) is used as a regularization term and the term λ controls its importance in the equation. m refers to the number of training samples, y refers to desired output, nl refers to number of neurons in hidden layer l while sl refers to number of hidden layers.” Also see FIG. 2 where there are plurality of samples real data features and categorical data features)
identify a plurality of groups or clusters within the raw data; (pg. 4 left col section A “Therefore normalizing the entire data with a single set of parameters may not be best; instead normalizing each group of data with its own set of parameters might show the variety in data in a much better manner. To account for this case, we plan to use clustering methods to naturally recognize the segregated groups and accordingly find their parameters for normalization.”)
determine a plurality of membership functions, (pg. 4 left col “Our goal for preprocessing is to enable the SA model to work for all data sets. With real data, two cases are possible. First case is when numerical range of values is different for different attributes, which is treated by some kind of data normalization. We plan to use fuzzy membership functions (MFs) for this process as using them is a form of non-linear normalization…)
wherein the plurality of membership functions comprise a membership function for each of the plurality of groups or clusters; (pg. 4 left col “Our goal for preprocessing is to enable the SA model to work for all data sets. With real data, two cases are possible. First case is when numerical range of values is different for different attributes, which is treated by some kind of data normalization. We plan to use fuzzy membership functions (MFs) for this process as using them is a form of non-linear normalization… Gaussian Bell membership function is defined for each cluster, and a new feature is made for each of the K clusters. The parameters for the membership function is defined based on the data values found in that cluster. Gaussian Bell membership function is given by… Consider j th cluster from i th feature data. The minimum and maximum values of the cluster data are found, and are denoted by minij and maxij.”)
determine an activation level for at least one membership function based on features of a sample; (See training Phase step 2-3 on pg. 2 “Given a sample, the fuzzy regions where the input and output vector lie, i.e. where their membership value is maximum, those regions are recognized and a fuzzy rule is formulated. In this manner all fuzzy rules are formulated. Steps 3 and 4 are followed for resolving conflicts amongst rules. Step 3 Calculate βP where P is the class label and varies from {1,2,3,....}. In (1), k ∈ P refers to all samples for a given rule that has P as the output class label and j refers to dimension of input data… where µkj (xkj ) is activation of max membership function for feature j of an input x having n features.” Also see pg. 4 left col section A “We plan to use fuzzy membership functions (MFs) for this process as using them is a form of non-linear normalization. Second case which happens often is that data may be naturally segregated in groups, with gaps in between. Therefore normalizing the entire data with a single set of parameters may not be best; instead normalizing each group of data with its own set of parameters might show the variety in data in a much better manner”)
input features of the sample into a first set of input nodes of an autoencoder; (Examiner notes that the input features and the preprocessed new features are feed into autoencoder neural network model as shown on Fig. 2 “Stacked Sparse Auto-Encoder Model with pre-processing”)
input the activation level into a second set of input nodes of the autoencoder. (Examiner notes that the input features and the preprocessed New features are feed into autoencoder neural network model as shown on Fig. 2 “Stacked Sparse Auto-Encoder Model with pre-processing” see Training Phase step 2-3 on pg. 4 “Given a sample, the fuzzy regions where the input and output vector lie, i.e. where their membership value is maximum, those regions are recognized and a fuzzy rule is formulated. In this manner all fuzzy rules are formulated. Steps 3 and 4 are followed for resolving conflicts amongst rules. Step 3 Calculate βP where P is the class label and varies from {1,2,3,....}. In (1), k ∈ P refers to all samples for a given rule that has P as the output class label and j refers to dimension of input data… where µkj (xkj ) is activation of max membership function for feature j of an input x having n features.”)
Sevakula does not teach and a neural network comprising a classification layer, wherein the classification layer provides an output indicating a classification for crisp input of a sample, and wherein the classification layer back propagates an existing error between an estimated output from the neural network and actual output from labels associated with labeled data.
Doctor teaches wherein a neural network further comprises a classification layer, (abstract “the neural network including a plurality of layers having at least an input layer, one or more hidden layers and an output layer, each layer comprising a plurality of interconnected neurons, the number of hidden neurons utilized being adaptive, the ANN determining the most important input data and defining therefrom a second ANN,”)
and wherein the classification layer provides an output indicating a classification for crisp input of a sample, (Examiner notes that the crisp input gets feed into the fuzzy neural network(with hidden/output layer[corresponds to classification layer]) as evidence by FIG. 3 and FIG. 6 col 9-10 lines 6-11 “The outputs of the fuzzy inference engine are fuzzy sets that specify a possibility distribution of the control actions. These fuzzy outputs need to be converted to nonfuzzy (crisp) control values that can then be used to operate the various …The type-1 FLC can be completely described using a mathematical formula that maps a crisp input vector X into a crisp output y=f(x). Such a formula can be obtained by following the signal x through the fuzzifier, where it is converted into the fuzzy set A, into the inference block where it is converted into the fuzzy set B”)
and wherein the classification layer back propagates an existing error between an estimated output from the neural network and actual output from labels associated with labeled data; (col 14-15 lines 1-5 “The system employs a Multilayer Preceptron feedforward backpropagation Neural Network net for each given output decision, classification or value S that is designated by the defined linguistic labels s. This Artificial Neural Networks (ANN) model was chosen as it has been shown to be a universal approximator and thus this ANN can find the mapping from the input values to the given output values using only one hidden layer. In addition, this ANN is relatively easy to be rapidly trained. Once, trained, this ANN model can very rapidly map inputs to outputs.”)
Sevakula and Doctor are analogous art because they are both directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified sparse auto-encoder based fuzzy rule classifier of Sevakula to include classification layer that back propagates an existing error between an estimated output from a neural network of Doctor in order to improve the accuracy of prediction and assigned labels correctly as disclosed by Doctor (col 14-15 lines 1-5 “The system employs a Multilayer Preceptron feedforward backpropagation Neural Network net for each given output decision, classification or value S that is designated by the defined linguistic labels s. This Artificial Neural Networks (ANN) model was chosen as it has been shown to be a universal approximator and thus this ANN can find the mapping from the input values to the given output values using only one hidden layer. In addition, this ANN is relatively easy to be rapidly trained. Once, trained, this ANN model can very rapidly map inputs to outputs.”). 

Regarding claim 13 (Currently Amended)
Sevakula in view of Doctor teaches the system of claim 12. 
Sevakula further teaches wherein the sample comprises a training sample of the plurality of training samples, (FIG. 1 shows receiving plurality of raw input features also see pg. 2 right col “The second term in (5) is used as a regularization term and the term λ controls its importance in the equation. m refers to the number of training samples, y refers to desired output, nl refers to number of neurons in hidden layer l while sl refers to number of hidden layers.”)
the computing device configured to operate on the training samples during training of one or more autoencoder levels. (Examiner notes that the input features and the preprocessed New features[corresponds to fuzzy data] are feed into autoencoder neural network model as shown on Fig. 2 “Stacked Sparse Auto-Encoder Model with pre-processing”)

Regarding claim 14 (Currently Amended)
Sevakula in view of Doctor teaches the system of claim 12. 
Sevakula further teaches wherein the output is a first output, (“The time complexity in finding the classification output using fuzzy rule based models is O(nd), where n is the number of rules and d is the dimension of the data samples.”)
and the computing device is further configured to process a second output from an auto encoder layer, (pg. 3 left col “First is the architecture of SA which includes the number of hidden layers and number of neurons in each layer. Additionally, two more parameters β and sparsity parameter ρ also need to be decided” also see pg. 5 left col “In the results tables, SA∗ signifies the SA based fuzzy classifier model with preprocessing, and SA signifies SA based fuzzy classifier model without any preprocessing. The term in the bracket next to SA/SA* signifies number of hidden layers in the Stacked Sparse Auto-encoder.”)
Doctor further teaches and wherein the classification layer comprises two or more nodes. (Col 15 lines 63-67 “The summation covers all possible forward paths from input node m to the output node. The rational for Equation (14) is that if a feature is important, it will have more influence on the output node by propagating forward through the hid den nodes.”)
Sevakula and Doctor are analogous art because they are both directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified sparse auto-encoder based fuzzy rule classifier of Sevakula to include classification layer that back propagates an existing error between an estimated output from a neural network of Doctor in order to improve the accuracy of prediction and assigned labels correctly as disclosed by Doctor (col 14-15 lines 1-5 “The system employs a Multilayer Preceptron feedforward backpropagation Neural Network net for each given output decision, classification or value S that is designated by the defined linguistic labels s. This Artificial Neural Networks (ANN) model was chosen as it has been shown to be a universal approximator and thus this ANN can find the mapping from the input values to the given output values using only one hidden layer. In addition, this ANN is relatively easy to be rapidly trained. Once, trained, this ANN model can very rapidly map inputs to outputs.”). 

Regarding claim 15 (Currently Amended)
Sevakula in view of Doctor teaches the method of claim 12. 
Sevakula further teaches wherein the output is a first output, (“The time complexity in finding the classification output using fuzzy rule based models is O(nd), where n is the number of rules and d is the dimension of the data samples.”) and the computing device is further configured to process …output …from an auto encoder layer, (Fig. 2 shows stacked spare Auto-encoder model with pre-processing with multiple layer)
Doctor further teaches wherein the output is a first output, and the computing device is further configured to process a second output (col 20 lines 31-37 “Step 2 is simply expanded to allow rules to have multiple outputs where the calculations in Equations (21) and (23) are repeated for each output value. Once the membership functions and the set of rules have been extracted from the input/ output data, the FLC has been formed. The learnt FLC can be used to provide output control responses to users based on different input conditions or end user queries.”)
and wherein the classification layer comprises two or more nodes. (Col 15 lines 63-67 “The summation covers all possible forward paths from input node m to the output node. The rational for Equation (14) is that if a feature is important, it will have more influence on the output node by propagating forward through the hid den nodes.”)
Sevakula and Doctor are analogous art because they are both directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified sparse auto-encoder based fuzzy rule classifier of Sevakula to include classification layer that back propagates an existing error between an estimated output from a neural network of Doctor in order to improve the accuracy of prediction and assigned labels correctly as disclosed by Doctor (col 14-15 lines 1-5 “The system employs a Multilayer Preceptron feedforward backpropagation Neural Network net for each given output decision, classification or value S that is designated by the defined linguistic labels s. This Artificial Neural Networks (ANN) model was chosen as it has been shown to be a universal approximator and thus this ANN can find the mapping from the input values to the given output values using only one hidden layer. In addition, this ANN is relatively easy to be rapidly trained. Once, trained, this ANN model can very rapidly map inputs to outputs.”). 

Regarding claim 16 (Currently Amended)
Sevakula in view of Doctor teaches the method of claim 12. 
Sevakula further teaches wherein the computing device is further configured to output to an input layer of the neural network, the neural network comprising a plurality of auto-encoder layers. (FIG. 1-2 shows stacked autoencoders with multiple layers see pg. 3 and 4 “Fig. 2 Stacked Sparse Auto-Encoder Model with pre-processing”) 
Regarding claim 18 (Currently Amended)
Sevakula teaches determine, …an activation level based on a sample for at least one membership function, (See training Phase step 2-3 “Given a sample, the fuzzy regions where the input and output vector lie, i.e. where their membership value is maximum, those regions are recognized and a fuzzy rule is formulated. In this manner all fuzzy rules are formulated. Steps 3 and 4 are followed for resolving conflicts amongst rules. Step 3 Calculate βP where P is the class label and varies from {1,2,3,....}. In (1), k ∈ P refers to all samples for a given rule that has P as the output class label and j refers to dimension of input data… where µkj (xkj ) is activation of max membership function for feature j of an input x having n features.” Also see pg. 2 right col “A hidden neuron is considered active if its activation (output) is greater than 0. It is observed from the visual cortex that at a time only few neurons are active and not all. Taking this idea closely, SA is different from regular autoencoders because it ensures that hidden neurons are not active most of the time. In other words, at a time, only few neurons in a layer are active. The goal of training in this manner is that unique representations are found for each class of training samples and thus be robust to noise [18], [19]. These kind of representations are found by defining sparsity parameter ρ, which refers to the average activation energy of neurons. Applying sparsity constraint to the autoencoder ensures that average activation of neurons is close to the desired ρ.”)
wherein the membership function corresponds to a group or cluster determined based on training data; (pg. 4 left col “The division of input feature into k fuzzy regions provide Sparse Auto-encoder extra freedom to learn non-linear relationships and may result in more useful features after SA learning. The use of k-means clustering allows similar feature values to be grouped together and increases the significance of membership function and regions. It may be noted that in the process, the number of features after pre-processing would increase.”)
input, …features for a sample into a first set of input nodes of a neural network, (Examiner notes that Fig. 2 shows input features for both real data feature and categorical get preprocessed new feature which corresponds to generating fuzzy data based on the input features)
wherein the neural network comprises one or more autoencoder layers and an input layer comprising the first set of input nodes and a second set of input nodes; (FIG. 2 shows receiving plurality of input features at autoencoder neural network layers nodes)
and input, …the activation level into the second set of input nodes of the neural network. (See training Phase step 2-3 “Given a sample, the fuzzy regions where the input and output vector lie, i.e. where their membership value is maximum, those regions are recognized and a fuzzy rule is formulated. In this manner all fuzzy rules are formulated. Steps 3 and 4 are followed for resolving conflicts amongst rules. Step 3 Calculate βP where P is the class label and varies from {1,2,3,....}. In (1), k ∈ P refers to all samples for a given rule that has P as the output class label and j refers to dimension of input data… where µkj (xkj ) is activation of max membership function for feature j of an input x having n features.”)
Sevakula does not teach a non-transitory computer
…
wherein the neural network further comprises a classification layer, and wherein the classification layer provides an output indicating a classification for crisp input of a sample, and wherein the classification layer back propagates an existing error between an estimated output from the neural network and actual output from labels associated with labeled data; and input, at a fuzzy input component of the neural network processing component.
Doctor teaches a non-transitory computer(Examiner notes that it is well understood in the art for neural network to be implemented on computer with processor and Doctor teaches training neural network see abstract “the neural network including a plurality of layers having at least an input layer, one or more hidden layers and an output layer, each layer comprising a plurality of interconnected neurons, the number of hidden neurons utilized being adaptive, the ANN determining the most important input data and defining therefrom a second ANN,”)
wherein the neural network further comprises a classification layer, (abstract “the neural network including a plurality of layers having at least an input layer, one or more hidden layers and an output layer, each layer comprising a plurality of interconnected neurons, the number of hidden neurons utilized being adaptive, the ANN determining the most important input data and defining therefrom a second ANN,”)
and wherein the classification layer provides an output indicating a classification for crisp input of a sample, (Examiner notes that the crisp input gets feed into the fuzzy neural network(with hidden/output layer[corresponds to classification layer]) as evidence by FIG. 3 and FIG. 6 col 9-10 lines 6-11 “The outputs of the fuzzy inference engine are fuzzy sets that specify a possibility distribution of the control actions. These fuzzy outputs need to be converted to nonfuzzy (crisp) control values that can then be used to operate the various …The type-1 FLC can be completely described using a mathematical formula that maps a crisp input vector X into a crisp output y=f(x). Such a formula can be obtained by following the signal x through the fuzzifier, where it is converted into the fuzzy set A, into the inference block where it is converted into the fuzzy set B”)
and wherein the classification layer back propagates an existing error between an estimated output from the neural network and actual output from labels associated with labeled data; (col 14-15 lines 1-5 “The system employs a Multilayer Preceptron feedforward backpropagation Neural Network net for each given output decision, classification or value S that is designated by the defined linguistic labels s. This Artificial Neural Networks (ANN) model was chosen as it has been shown to be a universal approximator and thus this ANN can find the mapping from the input values to the given output values using only one hidden layer. In addition, this ANN is relatively easy to be rapidly trained. Once, trained, this ANN model can very rapidly map inputs to outputs.”)
and input, at a fuzzy input component of the neural network processing component. (Col 9 lines 55-59 “The type-1 FLC works as follows: The fuzzifier is responsible for mapping the crisp sensory inputs into input fuzzy sets which in turn activate the rules. The fuzzy inference engine then maps input fuzzy sets into output fuzzy sets and handles the way in which rules are activated and combined.”)
Sevakula and Doctor are analogous art because they are both directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified sparse auto-encoder based fuzzy rule classifier of Sevakula to include classification layer that back propagates an existing error between an estimated output from a neural network of Doctor in order to improve the accuracy of prediction and assigned labels correctly as disclosed by Doctor (col 14-15 lines 1-5 “The system employs a Multilayer Preceptron feedforward backpropagation Neural Network net for each given output decision, classification or value S that is designated by the defined linguistic labels s. This Artificial Neural Networks (ANN) model was chosen as it has been shown to be a universal approximator and thus this ANN can find the mapping from the input values to the given output values using only one hidden layer. In addition, this ANN is relatively easy to be rapidly trained. Once, trained, this ANN model can very rapidly map inputs to outputs.”). 
Regarding claim 19 (Currently Amended)
Sevakula in view of Doctor teaches the computer readable storage media of claim 18. 
Sevakula further teaches …to determine a plurality of groups or clusters based on the training data, (pg. 4 left col “To account for this case, we plan to use clustering methods to naturally recognize the segregated groups and accordingly find their parameters for normalization. The algorithm for the entire method is given below… Using LLoyd’s k-means clustering algorithm, the data is clustered into K groups. The value for K is either given by user or chosen using Silhouette index [20] or set as a default value. Step 3 Gaussian Bell membership function is defined for each cluster, and a new feature is made for each of the K clusters.”)
wherein the plurality of groups or clusters comprise the group or cluster. (Pg. 4 left col “The division of input feature into k fuzzy regions provide Sparse Auto-encoder extra freedom to learn non-linear relationships and may result in more useful features after SA learning. The use of k-means clustering allows similar feature values to be grouped together and increases the significance of membership function and regions. It may be noted that in the process, the number of features after pre-processing would increase.”)
Doctor further teaches wherein the computer-executable instructions further cause the one or more processors (Examiner notes that it is well understood in the art for neural network to be implemented on computer with processor abstract “the neural network including a plurality of layers having at least an input layer, one or more hidden layers and an output layer, each layer comprising a plurality of interconnected neurons, the number of hidden neurons utilized being adaptive, the ANN determining the most important input data and defining therefrom a second ANN,”))
Sevakula and Doctor are analogous art because they are both directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sevakula to incorporate the teaching of Doctor to include neuro type-2 fuzzy based method for decision making. 
One of ordinary skill in the art would have been motivated to make this modification in order to provide neuro type-2 fuzzy based neural network and leverage “large amounts of vague or complex information requiring analysis for making operational and cost effective decisions” as disclosed by Doctor (col 1 lines 6-13 “…methods and controllers for multi purpose intelligent data analysis and decision Sup port systems in a variety of industrial and commercial applications which are characterised by large amounts of vague or complex information requiring analysis for making operational and cost effective decisions. The methods incorporate a group decision making process based on type-2 fuzzy systems.”). 
Regarding claim 20 (Previously Presented)
Sevakula in view of Doctor teaches the non-transitory computer-readable storage medium of claim 19.  
Sevakula further teaches …to generate a plurality of membership functions for the plurality of groups or clusters, (see training Phase step 2-3 on pg. 4 “Given a sample, the fuzzy regions where the input and output vector lie, i.e. where their membership value is maximum, those regions are recognized and a fuzzy rule is formulated. In this manner all fuzzy rules are formulated. Steps 3 and 4 are followed for resolving conflicts amongst rules. Step 3 Calculate βP where P is the class label and varies from {1,2,3,....}. In (1), k ∈ P refers to all samples for a given rule that has P as the output class label and j refers to dimension of input data… where µkj (xkj ) is activation of max membership function for feature j of an input x having n features.”)
wherein the plurality of membership functions comprise the membership function. (Pg. 4 left col “Our goal for preprocessing is to enable the SA model to work for all data sets. With real data, two cases are possible. First case is when numerical range of values is different for different attributes, which is treated by some kind of data normalization. We plan to use fuzzy membership functions (MFs) for this process as using them is a form of non-linear normalization. Second case which happens often is that data may be naturally segregated in groups, with gaps in between. Therefore normalizing the entire data with a single set of parameters may not be best; instead normalizing each group of data with its own set of parameters might show the variety in data in a much better manner. To account for this case, we plan to use clustering methods to naturally recognize the segregated groups and accordingly find their parameters for normalization. The algorithm for the entire method is given below.”)
Doctor further teaches wherein the computer-executable instructions further cause the one or more processor. (Examiner notes that it is well understood in the art for neural network to be implemented on computer with processor abstract “the neural network including a plurality of layers having at least an input layer, one or more hidden layers and an output layer, each layer comprising a plurality of interconnected neurons, the number of hidden neurons utilized being adaptive, the ANN determining the most important input data and defining therefrom a second ANN,”))
Sevakula and Doctor are analogous art because they are both directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sevakula to incorporate the teaching of Doctor to include neuro type-2 fuzzy based method for decision making. 
One of ordinary skill in the art would have been motivated to make this modification in order to provide neuro type-2 fuzzy based neural network and leverage “large amounts of vague or complex information requiring analysis for making operational and cost effective decisions” as disclosed by Doctor (col 1 lines 6-13 “…methods and controllers for multi purpose intelligent data analysis and decision Sup port systems in a variety of industrial and commercial applications which are characterised by large amounts of vague or complex information requiring analysis for making operational and cost effective decisions. The methods incorporate a group decision making process based on type-2 fuzzy systems.”). 

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Sevakula et al. (“Data Preprocessing methods for Sparse Auto-encoder based Fuzzy Rule Classifier”, hereinafter: Sevakula) in view of Doctor et al. and further in view of in view of Petridis et al. (“Deep Complementary Bottleneck Features for visual Speech Recognition”, hereinafter: Petridis). 
Regarding claim 9 (Previously Presented)
Sevakula in view of Doctor teaches the method of claim 1. 
Sevakula further teaches and training remaining autoencoder layers and the one or more additional neural network layers for a output; (Pg. 2 right col section B “For finding the class of a test input vector x, calculate αP for each class. The class receiving the maximum α is selected as the output class i.e predicted class.” Also see Section A “The cost function for an auto-encoder is given by (5). W and b refers to weight matrix and bias vector respectively. The second term in (5) is used as a regularization term and the term λ controls its importance in the equation. m refers to the number of training samples, y refers to desired output, nl refers to number of neurons in hidden layer l while sl refers to number of hidden layers.”)
Sevakula in view of Doctor does not teach the method further comprising: removing an output layer of the autoencoder and adding one or more additional neural network layers.
Petridis teaches the method further comprising: removing an output layer of the autoencoder (pg. 2305 left col first paragraph “Then the decoding layers of the autoencoder are removed and replaced with classification layers and a softmax output layer in order to make the low dimensional representation more discriminative for visual speech recognition.”)
and adding one or more additional neural network layers; (pg. 2305 left col “In the second stage, we remove the decoding layers and add two classification layers followed by an output softmax layer.”)
Sevakula, Doctor and Petridis are analogous art because they are both directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sevakula in view of Doctor to incorporate the teaching of Petridis to include deep autoencoder with deep complementary bottleneck features for visual recognition.
One of ordinary skill in the art would have been motivated to make this modification in order to provide “an improve deep autoencoder with a bottleneck layer in order to reduce the dimensionality of the image” as disclosed by Petridis (abstract “To the best of our knowledge, this is the first work that extracts DBNFs for visual speech recognition directly from pixels. We first train a deep autoencoder with a bottleneck layer in order to reduce the dimensionality of the image. Then the autoencoder’s decoding layers are replaced by classification layers which make the bottleneck features more discriminative. Discrete Cosine Transform (DCT) features are also appended in the bottleneck layer during training in order to make the bottleneck features complementary to DCT features.”). 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Nelwamondo et al. (“FUZZY ARTMAP AND NEURAL NETWORK APPROACH TO ONLINE PROCESSING OF INPUTS WITH MISSING VALUES”) teaches an ensemble of Fuzzy-ARTMAPs is used for classification whereas an ensemble of multi-layer perceptrons is used for the regression problem. 
Yan et al. (“Correcting Instrumental Variation and Time-Varying Drift: A Transfer Learning Approach With Autoencoders”) teaches drift correction autoencoder (DCAE) and learns to model and correct the influential factors explicitly with the help of transfer samples.
Kumar et al. (“An Approach for Intrusion Detection Using Fuzzy Feature Clustering”) teaches an efficient fuzzy feature clustering method that uses different text processing techniques that uses suitable data mining techniques. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to VAN C MANG whose telephone number is (571)270-7598. The examiner can normally be reached Mon - Fri 8:00-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on 5712729767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/V.M./Examiner, Art Unit 2126                                                                                                                                                                                                        
/BRIAN M SMITH/Primary Examiner, Art Unit 2122