DETAILED ACTION
This action is in response to communications filed on 10/27/2020 in which claims 1-4, 15, 17-18, 29, and 31 are amended; claims 13 and 27; and 1-12, 14-26, and 28-34 are still pending. 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/17/2020 has been entered.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 7/20/2020 is being considered by the examiner.

Response to Arguments
Applicant’s arguments and amendments submitted 10/27/2020 have been fully considered.

In response to applicant’s arguments and amendments, see pgs. 10-11 of filed response with respect to claims rejected under 35 USC §112 have been fully considered; and upon further 

Applicant’s arguments see pgs. 11-13, with respect to the claims rejected under 35 USC § 103 rejection been fully considered and are not persuasive. Therefore, the rejection has been maintained.
First applicant argues, the teaches in the cited prior art fail to disclose the claim elements directed to the use of a first classifier for assigning input data into a plurality of classes and producing a first output comprising at least two scores. The examiner notes that claims are interpreted under broadest reasonable interpretation in light of applicant’s specification, see MPEP 2111. In the instant case the claimed elements are recited at a broad level related to operations for classifying input data features as claimed by applicant limitations.  The Yadav et al. (Non-Patent Literature Publication: “Novelty detection applied to the classification problem using probabilistic neural network, hereinafter ‘Yadav’) reference discloses the input data that are extracted as feature vectors having dimensions and where the input is processed by the layers of the probabilistic neural network as depicted in Fig. 1. Specifically, the Yadav references discloses the plurality of classifiers as the output layer nodes associated with the first classifier, that is the pattern layer having a plurality of classifier nodes used to assign output to the output layer, see Yadav: pgs. 2-5: Sec. 2. The claimed output comprises two score that are used to compute a summation, these score are considered indicative for determining where the input belongs as they are used to assign corresponding class among the plurality of classes in the output layer, see Yadav: pgs. 2-5: Sec. 2. This recitation is within the scope of applicant’s claim limitation. The applicant should consider further limiting the claim scope to help distinguish the scope of the claim from the teaches in the Yadav references. 

Lastly applicant’s argument beardly noting that none of the cited references teach the additional claim limitations with respect to claim 10 amount to mere allegations of patentability. The applicant has not provided a reason as to how the cited references are distinct in scope from the claimed invention. Applicant's arguments, thus, fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references.

Regarding the independent claims 24, 30 and 32, the claims recite similar limitations to the one discussed in claim 10 above, and the rejection of the claims have been maintained. 
The rejection with respect to the 35 USC § 103 rejection of claims 1-12, 14-26, and 28-34, has been updated below to address claim amendments. 


 

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:


The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification, as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 

Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are listed below:
Claim 31
means for generating a first classifier for …
means for receiving, at a second classifier the output of the first classifier …
means for assigning, at the second classifier, the input data to a third class …
means for determining at the first classifier, whether to assign the input data…
means for classifying, at the first classifier, the input data to one of the first plurality of classes…
Claim 32:
means for obtaining …
means for synthetically generating …
means for training a binary classifier of ….
The specification does disclose a general-purpose processor, in paragraph [0064], as the structure for performing the recited functions in the limitation; and Figs. 10-12 provide a disclose flow chart for performing the claimed functions.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103, which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-8, and 33 are rejected under 35 U.S.C. 103 as being unpatentable over Yadav et al. (Non-Patent Literature Publication: “Novelty detection applied to the classification problem using probabilistic neural network, hereinafter ‘Yadav’) in view of Steenhoek et al (NPL: “Probabilistic Neural Networks for Segmentation of Features in Corn Kernel Images”, hereinafter ‘Steen”).

Regarding independent claim 1 limitations, Yadav teaches a method performed by an artificial neural network, comprising:
receiving input data; (depicted in Fig. 1: x-input data:

    PNG
    media_image1.png
    488
    579
    media_image1.png
    Greyscale


Input layer for receiving input data in pg. 3: Sec. 2.1: Architecture of probabilistic neural network, as described in [7], has been shown in the Figure 1. It consists of an input layer, pattern layer and output layer. Input layer consists of d input nodes, where d is the dimension of the data [claimed received input data]…)
generating, at a first classifier, for assigning the input data to one of a first plurality of classes, a first output based on features extracted from the input data, the first output generated prior to assigning the input data to one of the first plurality of classes, and the first output comprising at least two scores;  (Yadav teaches a first classifier, as the pattern layer, for assigning the x input data set to a fist plurality in the output nodes classes, using the Probabilistic Neural Network (PNN) algorithm and generate output as the set of weight connection values that are at least two feature value scores, considered an output based on features extracted from the input data,  as depicted in Fig. 1, in pg. 3: Sec. 2.1: 1st ¶: …Input layer consists of d input nodes, where d is the dimension of the data. Pattern layer [claimed first classifier] consists of N pattern nodes, where N is the number of training examples. Output layer consists of C nodes, where C is the number of classes [claimed first plurality of classes]. Each input node in the input layer is connected to each of the N pattern node in the pattern layer and weights of connections is feature values of these training patterns…and as depicted in Figure 1…






[AltContent: textbox ([img-media_image1.png])]

Where the weight values are assigned, prior to assigning the input data, a score value of 1 as the confidence the input is assigned to a pattern node in the same class, in pg. 3: Sec. 2.1 1st para. :… Only those pattern node belonging to the same class are connected to the output node corresponding to that class. All these connections carry a weight value 1 [claimed extracted features from the input data]. Therefore, each node in output layer simply adds the output values [claimed output based on extracted features comprising claimed at least two scores for generating a summation value] of pattern nodes of its class and this sum becomes its output value... If  we  represent  it component  of  jth  training  pattern  as  xji  , weight from ith input node to jth  pattern node as wij and weight from jth pattern node to  kth output node as ajk [additionalyl an output comprising claimed two scores including the ajk and output values for computing sums]. where i = 1, 2, .., d and j = 1, 2, .., N ,k = 1, 2, .., C….)
receiving, at a second classifier, the first output of the first classifier, (receiving output as depicted in Fig. 1 at the output layer, in pg. 3: Sec. 2.1 1st para. :… Only those pattern node belonging to the same class are connected to the output node corresponding to that class. All these connections carry a weight value 1 [claimed extracted features from the input data]. Therefore, each node in output layer [claimed second classifier] simply adds the output values [claimed output based on extracted features comprising claimed at least two scores for generating a summation value] of pattern nodes of its class and this sum becomes its output value... If  we  represent  ith  component  of  jth  training  pattern  as  xji  , weight from ith input node to jth pattern node as wij and weight from jth pattern node to  kth output node as ajk [additionalyl an output comprising claimed two scores including the ajk and output values for computing sums]. where i = 1, 2, .., d and j = 1, 2, .., N ,k = 1, 2, .., C….)
the second classifier being trained based on a feature vector from the first classifier, (training based on input vector and weight vector, as the d-dimension data and in pg. 3 Right Col. 1st - Last para: …To train the PNN, first we normalize each pattern xj of the training set to have unit length i.e Pd i=1 x2ji = 1. We set wj , weights linking input layer nodes to jth pattern node, such that wj = xj … where  the  symbol  x  denotes  the  pattern  to  be  classified, xj is the jth training pattern and s is a  smoothing parameter. Since  weight  vector  wj  = xj ,  we  can  rewrite  equation  (1) …)
and each score of the at least two scores indicating a confidence for whether the input data belongs to one of the first plurality of classes; (scores are used as claimed indicators of confidence, in pg. 3: Sec. 2.1 1st para. :… Only those pattern node belonging to the same class are connected to the output node corresponding to that class. All these connections carry a weight value 1. Therefore, each node in output layer simply adds the output values of pattern nodes of its class and this sum becomes its output value... If  we  represent  it component  of  jth  training  pattern  as  xji  , weight from ith input node to jth  pattern node as wij and weight from jth pattern node to  kth output node as ajk [claimed scores indicating data belongs to the claimed plurality of classes]. where i = 1, 2, .., d and j = 1, 2, .., N ,k = 1, 2, .., C….) )
assigning, at the second classifier, the input data to a first class or a second class based on a distribution of the at least two scores, the input data assigned to the first class when the at least two scores correspond to a first distribution, and in which the first class and the second class are distinct from the first plurality of classes; (Yadav teaches assigning, at the second classifier that is the output layer, input data to third or second class n= 2,3 plurality of N classes in the pattern layer as depicted in Fig. 1; where the third or second classes are in the output layer and are distinct from the first plurality of N pattern classes in the output layer, as depicted in Fig. 1, in pg. 3: Sec. 2.1: 1st para: Architecture of probabilistic neural network, as described in [7], has been shown in the Figure 1. It consists of an input layer, pattern layer and output layer. Input layer consists of d input nodes, where d is the dimension of the data. Pattern layer consists of N pattern nodes, where N is the number of training examples. Output layer consists of C nodes, where C is the number of classes... … Only those pattern node [claimed third or second class associated with a pattern node within n-pattern classes] belonging to the same class are connected to the output node [claimed assigning at second classifier] corresponding to that class. All these connections carry a weight value 1 [claimed determination of distinct class nodes assigned at the second classifier]. Therefore, each node in output layer simply adds the output values [claimed output based on extracted features comprising claimed at least two scores for generating a summation value] of pattern nodes of its class and this sum becomes its output value... If  we  represent  it component  of  jth  training  pattern  as  xji  , weight from ith input node to jth  pattern node as wij and weight from jth pattern node to  kth output node as ajk [using weight score to preform claimed assignment process]. where i = 1, 2, .., d and j = 1, 2, .., N ,k = 1, 2, .., C…; where the feature scores correspond to a first distribution as the respective Gaussian kernel, in pg. pg. 4; Left Col.; 2nd full para,: From above description it is clear that classification using probabilistic neural network (PNN) is somewhat reminiscent of bayesian classification using Gaussian mixture model (GMM) in the sense that both estimate a probability measure of class membership for a given pattern using mixture of Gaussians and select the class for which this measure is highest. There are obvious differences. GMM uses fewer Gaussian kernels than the number of patterns in the training data set while in PNN, each training pattern corresponds to one Gaussian kernel…)
determining, at the first classifier, whether to assign the input data to one of the first plurality of classes based on the assignment of the input data by the second classifier; (as depicted in Fig. 1, in pg. 3: Sec. 2.1: 1st para: Architecture of probabilistic neural network, as described in [7], has been shown in the Figure 1. It consists of an input layer, pattern layer and output layer. Input layer consists of d input nodes, where d is the dimension of the data. Pattern layer consists of N pattern nodes, where N is the number of training examples. Output layer consists of C nodes, where C is the number of classes... … Only those pattern node [claimed determining process at the claimed first classifier] belonging to the same class are connected to the output node [claimed determination based on assignment at second classifier of the input data] corresponding to that class. All these connections carry a weight value 1. Therefore, each node in output layer simply adds the output values [claimed output based on extracted features comprising claimed at least two scores for generating a summation value] of pattern nodes of its class and this sum becomes its output value... If  we  represent  it component  of  jth  training  pattern  as  xji  , weight from ith input node to jth  pattern node as wij and weight from jth pattern node to  kth output node as ajk [claimed determination based on the assignment of the second classifier])
and classifying, at the first classifier, the input data to one of the first plurality of classes when the input data is assigned to the first class. (as depicted in Fig. 1, in pg. 3: Sec. 2.1: 1st para: Architecture of probabilistic neural network, as described in [7], has been shown in the Figure 1. It consists of an input layer, pattern layer and output layer. Input layer consists of d input nodes, where d is the dimension of the data. Pattern layer consists of N pattern nodes, where N is the number of training examples. Output layer consists of C nodes, where C is the number of classes... … Only those pattern node  belonging to the same class are connected to the output node corresponding to that class [claimed classifying at first classifier when the data is assigned claimed first class of the pattern node, that is the claimed first class of the first classifier]. All these connections carry a weight value 1. Therefore, each node in output layer simply adds the output values  of pattern nodes of its class and this sum becomes its output value... If  we  represent  it component  of  jth  training  pattern  as  xji  , weight from ith input node to jth  pattern node as wij and weight from jth pattern node to  kth output node as ajk [claimed classifying at first classifier when the data is assigned claimed first class of the pattern node, that is the claimed first class of the first classifier]… And classifying new data inputs based on assignment of classifiers, in pg. 4: Left Col.: …x, first, it is normalized and then placed at input nodes. At pattern layer, each pattern node j computes  inner  product  of  weight  vector  wj  and  x  ,wT x,. Then this inner product is passed to a non-linear activation function shown in equation (2). At output layer ,each output node k receives an output  from pattern nodes [claimed classifying at first classifier when the data is assigned claimed first class of the pattern node, that is the claimed first class of the first classifier] associated with class k…)
While Yadav discloses the use of a neural network trained to classify input data based on the use of sequential classifiers having a pattern classification layer including classifiers and  assigning input data to the plurality of classes in the output layer as noted above. 
Examiner notes that the pattern layer is a classification of the feature disclosed as a pattern node. Yadav does not expressly disclose examples of patterns class nodes as pattern categories.
nd para.:  Software was developed to extract red, green, and blue pixel values from regions representing color pattern features from each of the collected images. Recorded information for each selected pixel included the following: x and y pixel position, image file name, image kernel code, color pattern category….
The Yadav and Steen references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing methods and systems for information processing and pattern recognition.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to integrate the method for training a classifier layers of a neural network comprising a plurality of class elements for  pattern recognition using pattern categories as disclosed by Steen with the method for information processing and pattern recognition as disclosed by Yadav.
One of ordinary skill in the arts would have been motivated to integrate the disclosed methods in order to process information as segmented features using a probabilistic neural network (Steen, Abstract). Doing so will help enhance the accuracy when processing image features as artifacts in segmented images (Steen, Abstract). 

Regarding claim 2, the rejection of claim 1 is incorporated and Yadav in combination with Steen further teaches the method of claim 1:
further comprising assigning, at the second classifier, the input data to the second class when the at least two scores correspond to a second distribution. (Yadav teaches classifying the input data, that is x, to the second class in the C number of classes as depicted in Fig. 1, in pg. 3: Right. Col. : 1st and 2nd  full para: To train the PNN, first we normalize each pattern xj of the training set to have unit length … We set wj , weights linking input layer nodes to jth pattern node, such that wj = xj . If we represent ith component of jth training pattern as xji , weight from ith input node to jth pattern node as wij and weight from jth pattern node to kth output node as ajk. where i = 1, 2, .., d and j = 1, 2, .., N,k = 1, 2, .., C…; where the feature scores correspond to a second distribution as the respective Gaussian kernel corresponding to the jth training  Pattern class for classifying the input data xj to a second class k in C classes in the output classes, in pg. pg. 4; Left Col.; 2nd full para,:…Gaussian mixture model (GMM) in the sense that both estimate a probability measure of class membership for a given pattern using mixture of Gaussians and select the class for which this measure is highest. There are obvious differences. GMM uses fewer Gaussian kernels than the number of patterns in the training data set while in PNN, each training pattern corresponds to one Gaussian kernel. …)

Regarding claim 3, the rejection of claim 1 is incorporated and Yadav in combination with Steen further teaches the method of claim 1:
in which the second classifier is trained with examples of data belonging to the first plurality of classes and the at least two scores. (Yadav teaches the trained classifier with data belonging to the first set of plurality of classes, that the plurality of C trained classes with the PNN classifier including pattern layers associated with a class model as depicted in Fig. 1, and  in pg. 3: Sec. 2.1: 1st para: Architecture of probabilistic neural network, as described in [7], has been shown in the Figure 1. It consists of an input layer, pattern layer and output layer. Input layer consists of d input nodes, where d is the dimension of the data. Pattern layer consists of N pattern nodes, where N is the number of training examples. Output layer consists of C nodes, where C is the number of classes... … Only those pattern node belonging to the same class are connected to the output node corresponding to that class. All these connections carry a weight value 1. Therefore, each node in output layer simply adds the output values [claimed second classifier with data comprising claimed at least two scores for generating a summation value] of pattern nodes of its class and this sum becomes its output value...)

Regarding claim 4, the rejection of claim 3 is incorporated and Yadav in combination with Steen further teaches the method of claim 3:
in which the second classifier is trained based on data not belonging to the first plurality of classes, the data  comprising synthetically generated negative training data. (Yadav teaches the data sampled for training the PNN is artificially generated negative training examples used to create the decision boundary for the classes of target, associated with a novel class, not belonging to the first plurality of classes and creating a decision boundary for the synthetically (i.e. artificially) generated data, in pg. 4: Sec. 2.2: 1st ¶:  In this approach, we artificially generate negative ex-amples and train the PNN with these negative examples to create closed decision boundary around target classes [claimed generation of synthetic negative training data based claimed data]. Effectiveness of this approach depends on how we artificially generate negative examples around target classes so that close decision boundary is created for target classes and open decision boundary is created for novel class…)

Regarding claim 5, the rejection of claim 4 is incorporated and Yadav in combination with Steen further teaches the method of claim 4:
in which the synthetically generated training negative data is a function of known training data from the first plurality of classes. (Yadav teaches generating the data for the novel class from the artificially generated negative data, that is synthetically generated negative data, in pg. 4: Sec. 2.2: 1st ¶: To generate nega-tive examples [generated training negative data], we take two farthest pattern from the training data [generated training negative data is a function of known training data from the first plurality of known classes]. Then we find the distance between them, d , and create a hypersphere of radius R = d + s, centered at the mean of those two farthest points, where s is a small constant [generated training negative data is a function of known training data from the first plurality of known classes]. This ensures that this hypersphere encloses all the training patterns. Now generate enough number of points in the hypersphere uniformly…)

Regarding claim 6, the rejection of claim 3 is incorporated and Yadav in combination with Steen further teaches the method of claim 3:
further comprising modifying a boundary of at least one of the first plurality of classes, the second class, or a combination thereof based on the data not belonging to the first plurality of classes. (Yadav teaches modifying the threshold, T, that is the decision boundary, for each class of the first plurality of C classes based on data Ok NovelClass_data, that is data not belonging to the first of the plurality of classes, in pg. 5: Col. 1: 1st Full ¶ & 2nd Full ¶: To find a threshold for a given class k, we generate negative examples around this class data using the same approach discussed in Section 2.2. Then we use kth class training data and novel class data as test data [a combination thereof based on the data not belonging to the first plurality of classes]. We apply this test data on the Probabilistic Neural Network with an initial threshold T k and we consider only output value of kth output node while finding threshold for kth class. Procedure to find threshold for each class is given in Algorithm 2.)

Regarding claim 7, the rejection of claim 1 is incorporated and Yadav in combination with Steen further teaches the method of claim 1:
in which the first plurality of classes are a plurality of known classes. (Yadav teaches the first plurality of classes of input classes associated with a as target classes in a closed decision boundary that are the first plurality of classes of known classes, in pg. 4: Sec. 2.2. 1st ¶: In this approach, we artificially generate negative ex-amples and train the PNN with these negative examples to create closed decision boundary around target classes. Effectiveness of this approach depends on how we artificially generate negative examples around target classes so that close decision boundary is created for target classes and open decision boundary is created for novel class… )

Regarding claim 8, the rejection of claim 1 is incorporated and Yadav in combination with Steen further teaches the method of claim 1:
in which the second class comprises an unknown class or a class that is different from the first plurality of classes. (Yadav teaches the novel class that is trained using novel class data, that comprises an unknown class by classifying the novel pattern into a novel class that is different from the , in pg.4: Sec. 2.2.2: After getting novel class data, we train the Probabilistic Neural Network with novel class data. While generating novel class data, we do not assume any known distribution of the data. Since the above approach generates novel examples strictly around the region of the training data irrespective of any distribution of the training data, closed decision boundaries are created around classes and novel patterns are classified into novel class..)

Regarding claim 33, the rejection of claim 1 is incorporated and Yadav in combination with Steen further teaches the method of claim 1:
in which the at least two scores comprise un-normalized scores. (Yadav teaches the set of weight connection values that are at least two feature value scores, considered un-normalized scores, in pg. 3: Sec. 2.1: 1st : ¶.Architecture of probabilistic neural network, as described in [7], has been shown in the Figure 1. It consists of an input layer, pattern layer and output layer. Input layer consists of d input nodes, where d is the dimension of the data. Pattern layer consists of N pattern nodes, where N is the number of training examples. Output layer consists of C nodes, where C is the number of classes. Each input node in the input layer is connected to each of the N pattern node in the pattern layer and weights of connections is feature values of these training patterns. Pattern layer is not completely connected to the output layer as shown in Figure 1…)

Claims 9-10, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Yadav et al. (Non-Patent Literature Publication: “Novelty detection applied to the classification problem using probabilistic neural network, hereinafter ‘Yadav’) in view of Steenhoek et al (NPL: “Probabilistic Neural Networks for Segmentation of Features in Corn Kernel Images”, hereinafter ‘Steen”) and in further view of Skabar (NPL: “Single-class classifier learning using neural networks: An application to the prediction of mineral deposits”, ‘Sbar’).


in which the binary classifier is linear or non-linear. (Yadav teaches the binary classifier as a non-linear classifier, noted as the C+1th class that represents a novel class in the generated by training the PNN classifier with multiple pattern layers associated with each class pattern as depicted in Fig. 1, that is a non-linear neural network classifier, in pg. 4: 1st partial ¶: “we find negative class examples…..”, where the classifiers as assigned binary numbers 1 and -1 for classifying the target and novel class as noted in algorithm 2 )
Yadav and Steen do not expressly disclose claim 9 limitation:
… binary classifier …
Dina teaches claim 9 limitation:
… binary classifier … (Dina teaches the binary classifier for classifying elements into classes using 0 and 1, in pg. 5: Left Col: last ¶: …The PNN network consists of three layers respectively: input layer, pattern layer and competitive layer as shown in Fig 8. It is presumed that there are Q input vector/target vector pairs (number of neurons in layer 1) where each target vector has number of classes K (number of neurons in layer 2). One of these elements is 1 and the rest are 0…)
 Sbar teaches claim 9 limitation:
… binary classifier… in pg. 2129: Sec. 3: 1st para. … Suppose that the training examples can be represented. by some unknown binary function f(x) that maps x onto its target value 4 where d is 0 or 1. Assume that all known (i.e., labeled) examples of the target class have target value 1 and that all other examples (i.e. unlabeled positive and unlabeled negative examples) have target 0. The objective is to learn a function h: X + [OJ] such that h(x) = P(f(x) = I). Tbns, h(x) is a probabilistic function whose output is the probability that Ax) = 1. The function h(x) can be modelled using a feedfonvard neural network [training a binary classifier of a neural network] with a single output neuron. Because the network output is to represent a probability, the output values should be bounded between 0 and 1, and this can be achieved using a sigmoidal activation function. )
The Yadav, Steen, and Sbar references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing methods and systems for information processing and pattern recognition.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to integrate the method for training a binary neural network classifier to identify an input as belonging to a known class or an unknown class as disclosed by Sbar with the method for information processing and pattern recognition as collectively disclosed by Yadav and Dina.
One of ordinary skill in the arts would have been motivated to integrate the disclosed methods in order to enable feedforward neural networks that can be used to learn a classifier from a dataset consisting of (labeled) examples of the target class, (positive examples) together with a corpus of unlabeled (positive and negative) examples. (Sbar, Abstract). Doing so would allow for learning techniques applicable to a broad range of classification and pattern recognition problems where the labeling of the data makes it difficult to supply labeled counter-examples (Sbar, Introduction).

Claims 10 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Yadav et al. (Non-Patent Literature Publication: “Novelty detection applied to the classification problem using probabilistic neural network, hereinafter ‘Yadav’) in view of Skabar (NPL: “Single-class classifier learning using neural networks: An application to the prediction of mineral deposits”, ‘Sbar’).

Regarding independent claim 10 limitations, Yadav a method performed by an artificial neural network, comprising:
obtaining positive  training data assigned to one of a plurality of classes; (Yadav teaches the obtaining data belonging to positive training data assigned to one of a plurality of classes, as a known positive  training data assigned to one of a plurality of classes to create a decision , in pg. 4. Sec. 2.2 ... we artificially generate negative examples and train the PNN with these negative examples to create closed decision boundary around target classes [obtaining positive training data assigned to one of a plurality of classes]. Effectiveness of this approach depends on how we artificially generate negative examples around target classes so that close decision boundary is created for target classes [obtaining positive training data assigned to one of a plurality of classes] and open decision boundary is created for novel class…)
synthetically generating negative training data as a function of the positive training data; and  (Yadav teaches generating the data for the novel class from the artificially generated negative data, that is synthetically generated negative data, in pg. 4: Sec. 2.2: 1st ¶, is a function defined by the “Function classify” that involves the output_value, that is known data of the c plurality of classes of the PNN to classify the novel class a dataset input x, in pg. 5: Col. 1: 2nd full ¶: “Some small modifications can be made to …..End”; where the synthetic negative training data is generated as a function of the positive training data … we artificially generate negative examples [synthetically generating negative training data] and train the PNN with these negative examples to create closed decision boundary around target classes  [as a function of the known positive training data]. Effectiveness of this approach depends on how we artificially generate negative examples around target classes [synthetic negative training data is generated as a function of the positive training data] so that close decision boundary is created for target classes and open decision boundary is created for novel class…)
training a binary classifier of the artificial neural network to identify an input as belonging to a known class or an unknown class based on a distribution of probability scores generated by a trained classifier based on features of the positive training data and the negative training data. (Yadav teaches training the PNN, including a binary classifier for assigned binary numbers 1 and -1 for classifying the target class as input belonging to a known class and novel class as input belonging to unknown class as noted in algorithm 2, and the artificial neural network as the PNN network used to determine if the input data, that is x, to decide if it belongs to known C class set of patterns a novel class, in pg. 5: Col. 1: 2nd full ¶: “Some small modifications can be made to …..End”, using any distribution of the training data that can comprising the generated negative examples, as the negative data, as noted in Algorithm 1, in pg. 4: Sec. 2.2.2; Yadav teaches training a neural network classifier based on training data comprising the known data and the negative data, in pg. 4. Sec. 2.2 ... we artificially generate negative examples and train the PNN [training a neural network classifier] with these negative examples to create closed decision boundary around target classes [training based on features of the positive training data and the negative training data]. Effectiveness of this approach depends on how we artificially generate negative examples around target classes so that close decision boundary is created for target classes [obtaining known data from a plurality of classes] and open decision boundary is created for novel class…; where identifying the classification of the input is based on a distribution probability of scores generated by a trained classifier based on features of the positive training data and the negative training data, as the Gaussian kernel distribution for determining class membership of an input  to an identified class membership , in pg. pg. 4; Left Col.; 2nd full para,: From above description it is clear that classification using probabilistic neural network (PNN) is somewhat reminiscent of bayesian classification using Gaussian mixture model (GMM) in the sense that both estimate a probability measure of class membership for a given pattern using mixture of Gaussians and select the class for which this measure is highest. There are obvious differences. GMM uses fewer Gaussian kernels than the number of patterns in the training data set while in PNN, each training pattern corresponds to one Gaussian kernel [based on a distribution of probability scores generated by a trained classifier based on features of the positive training data and the negative training data] …))
Yadav does not expressly teach claim 10 limitation:
training a binary classifier of the artificial neural network…
Sbar teaches claim 10 limitation:
training a binary classifier of the artificial neural network… (Sbar teaches in pg. 2127: 2nd Col: 2nd Full para.  … This paper describes how neural networks can he used to learn a classifier [training a binary classifier of a neural network] from a dataset consisting of (labeled) examples belonging to the target class (i.e., positive examples) together with a corpus of unlabeled (positive and negative) examples. The technique is applicable to a broad range of classification and pattern recognition problems in which either the nature of the problem domain, or the expense of labeling the data, makes it difficult to supply labeled counter-examples….; where the classifier is a binary classifier of a neural network … in pg. 2129: Sec. 3: 1st para. … Suppose that the training examples can be represented. by some unknown binary function f(x) that maps x onto its target value 4 where d is 0 or 1. Assume that all known (i.e., labeled) examples of the target class have target value 1 and that all other examples (i.e. unlabeled positive and unlabeled negative examples) have target 0. The objective is to learn a function h: X + [OJ] such that h(x) = P(f(x) = I). Tbns, h(x) is a probabilistic function whose output is the probability that Ax) = 1. The function h(x) can be modelled using a feedfonvard neural network [training a binary classifier of a neural network] with a single output neuron. Because the network output is to represent a probability, the output values should be bounded between 0 and 1, and this can be achieved using a sigmoidal activation function. )
The Yadav and Sbar references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing methods and systems for information processing and pattern recognition.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to integrate the method for training a binary neural network classifier to identify an input as belonging to a known class or an unknown class as disclosed by Sbar with the method for information processing and pattern recognition as disclosed by Yadav.
One of ordinary skill in the arts would have been motivated to integrate the disclosed methods in order to enable feedforward neural networks that can be used to learn a classifier from a dataset consisting of (labeled) examples of the target class, (positive examples) together with a corpus of unlabeled (positive and negative) examples. (Sbar, Abstract). Doing so would allow for learning techniques applicable to a broad range of classification and pattern recognition problems where the labeling of the data makes it difficult to supply labeled counter-examples (Sbar, Introduction).

Regarding claim 14, the rejection of claim 10 is incorporated and Yadav in combination with Sbar further teaches the method of claim 10:
further comprising modifying a boundary of at least an existing known class, an existing unknown class, or a combination thereof based at least in part on the negative training data. (Yadav teaches modifying the threshold, T, that is the decision boundary, for each class of the first plurality of C classes based on data Ok NovelClass_data, that is data not belonging to the first of the plurality of classes as the negative training data, in pg. 5: Col. 1: 1st Full ¶ & 2nd Full ¶.)

Claims 11-12,  24-26 and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Yadav et al. (Non-Patent Literature Publication: “Novelty detection applied to the classification problem using probabilistic neural network) hereinafter ‘Yadav’ in view of Skabar (NPL: “Single-class classifier learning using neural networks: An application to the prediction of mineral deposits”, hereinafter ‘Sbar’) and in further view of Masud et al. (US Patent Application Publication No. 2012/0054184) hereinafter ‘Masud’.

Regarding claim 11, the rejection of claim 10 is incorporated and Yadav in combination with Sbar further teaches the method of claim 10:
in which synthetically generating the negative training data comprises: computing a first vector between each data point  in a cluster of positive data and a centroid of the cluster; and (Yadav teaches using the k-nearest neighbor and the center of the hypersphere of points from the training data, that is the cluster of known data, in pg., 4: Sec. 2.2.1 … To generate negative examples, we take two farthest pattern from the training data. Then we find the distance between them, d , [computing a first vector between each data point of the  known positive data and a centroid …] and create a hypersphere of radius R = d + s, centered at the mean [a centroid of the cluster] of those two farthest points [a first vector between each data point in a cluster of known positive data and a centroid …], where s is a small constant. This ensures that this hypersphere encloses all the training patterns [data point in a cluster of known data and a centroid of the cluster]. Now generate enough number of points in the hypersphere uniformly…)
computing a second vector between a centroid of class specific clusters and a centroid of all known data points independent of class. (Yadav teaches using the k-nearest neighbor and the center of the hypersphere of points from the training data, that is the cluster of known data, in pg., 4: Sec. 2.2.1 … To generate negative examples, we take two farthest pattern from the training data. Then we find the distance between them, d , and create a hypersphere of radius R = d + s, centered at the mean [a centroid of the cluster] of those two farthest points where s is a small constant. This ensures that this hypersphere encloses all the training patterns [computing a … vector between a centroid of class specific clusters and a centroid of all known data points independent of class enclosing all the training patterns]. Now generate enough number of points in the hypersphere uniformly…)
Yadav and Sbar do not expressly teach claim 11 limitations:
computing a first vector between each data point in a cluster of …. data ….; and
computing a second vector between a centroid of class specific clusters and a centroid of all data points independent of class.
Masud does teach claim 11 limitations:
computing a first vector between each data point in a cluster of …. data ….; and (Masud teaches computing a vector associated with the input data points x, in [0069] and using the K-nearest neighbor classifier to cluster the training data by computing the centroid of the data points belonging to each class in the cluster, in [0094].)
computing a second vector between a centroid of class specific clusters and a centroid of all data points independent of class. (Masud teaches as depicted in Fig. 2 computing a second vector the input vector of the F-outliers that are between points of its own class, all known F-outlier points that are independent of a class that fall outside the decision boundary and apart for the data points of other classes,  and the second vector between a centroid of class specific clusters and a centroid of all data points independent of class, in [0061]: In one embodiment, the novel class determination engine 108 in FIG. 1 identifies any F-outliers from the data stream. In this example, data points 208 may be considered F-outliers as they fall outside of the predetermined decision boundary. Following a property that states a data point should be closer to the data points of its own class ( cohesion) and farther apart from the data points of other classes ( separation), the novel class determination engine 108 may measure the cohesion (e.g., 210) among each of the F-outliers 208 in the buffer, and the separation (e.g., 212) of each of the F-outliers 208 from the existing class instances by computing a unified measure of cohesion and separation, which may be called q-Neighborhood Silhouette Coefficient ( q-NSC). The expres­sion q-NSC yields a value between -1 and +1. A positive value indicates that, a particular data point is closer to the other F-outlier instances (more cohesion) and farther away from existing class instances 202, 204, 206 (more separation), and vice versa. In an example, based on the positive cohesion and separation measurement, F-outliers 208 may be declared a novel class 214. The q-NSC value of an F-outlier may be computed separately for each classification model. A novel class is declared if there are at least q' (>q) F-outliers having a positive q-NSC for all the classification models.; where the µi is the centroid of data points of F-outlier cluster Фi, that is the centroid of all known data points independent of class and the centroid of an existing class as µj, used to compute the second vector of F-outliers points in [0124]: Without loss of generality, in one example, let cp, be an pseudopoint Fpseudopoint having weight q q1which and cpis 1 be the an closest existing class  pseudopoint having weight q2, which is the closest existing class pseudopoint from cp, (FIG. 7). FIG. 7 illustrates an example of the computation of deviation. In FIG. 7, cp, may be an Fpseudopoint, i.e., a cluster ofF-outilers, and cp1 may be an existing class pseudopoint, i.e., a cluster of existing class instances. In the non-limiting example of FIG. 7, all instances in cp, may belong to a novel class. q-NSC' (cp,), the approxi­mate q-NSC of cp1 may be computed using the following, formula:… Whereµ, is the centroid of cp,, µ1 is the centroid of <Pp and D, is the mean distance from centroid µ, to the instances in cp,. In one example, the exact value of q-NSC may result from… Where Ac ",,q(x) is the set of q nearest neighbors of x within Fpseudopoint cp,, and Acm;noq(x) is the set of q nearest neighbors of x within pseudopoint cp1, for some xEcp,…;and the x data points are captured as vectors, in [0069]: The data stream may be a continuous sequence of data points: { x1 , ... , xnow}, where each x, is ad-dimensional feature vector, x1 is the very first data point in the stream, and x is the latest data point that has just arrived.)
The Yadav, Sbar, and Masud references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing methods and systems for training classification models.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method for using input data as feature vectors and centroids of clustered data as disclosed by Masud with method for training classifiers as collectively disclosed by Yadav and Sbar.


Regarding claim 12, the rejection of claim 11 is incorporated and Masud in combination with Yadav and Sbar further teaches the method of claim 11:
further comprising generating the negative data from the second vector or a negative vector of the first vector. (Masud teaches generating unlabeled data x from the input stream T that is associated with an unknown class, that is generating negative data, in [0072] where the input data are a sequence of feature vectors, that is negative data from data comprising the second vector or first vector input in sequence x in [0069].)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Yadav, Sbar, and Masud for the same reasons disclosed above.

Regarding independent claim 24 limitations, Yadav in combination with Sbar teaches an artificial neural network, comprising:
claim 24 limitations are similar to claim 10 limitations and are rejected under the same rationale.
Yadav and Sbar does not expressly disclose claim 24 limitations:
at least one memory unit; and at least one processor coupled to the memory unit, the at least one processor configured:
Masud does expressly teach claim 24 limitations:
at least one memory unit; and at least one processor coupled to the memory unit, the at least one processor configured: (Masud teaches the memory unit in Fig. 23 for executing instruction by the processor for, in [0261] and may be configured based on the particular implementation for executing instructions for software loaded in the memory, in [0257].)
Yadav, Sbar, and Masud references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing methods and systems for information processing and pattern recognition.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to integrate a memory unit and processor as disclosed by Masud with the method for processing information collectively disclosed by Yadav and Sbar.
One of ordinary skill in the arts would have been motivated to integrate the disclosed methods in order to classify data concepts that evolve over time improve using classification models (Masud, 0004); doing so would help improve automatic detection of novel data instances for training classification models that may go undetected in a distributed computing environment (Masud, [0004] & [0262]).

Regarding claim 25, the rejection of claim 24 is incorporated and Yadav in combination with Sbar and Masud further teaches the artificial neural network of claim 24:
claim 25 limitations are similar to claim 11 limitations and are rejected under the same rationale.

Regarding claim 26, the rejection of claim 25 is incorporated and Masud in combination with Yadav and Sbar further teaches the artificial neural network of claim 25:
in which the at least one processor is further configured: (Masud teaches the memory unit in Fig. 23 for executing instruction by the processor for, in [0261] and may be configured based on the particular implementation for executing instructions for software loaded in the memory, in [0257].)
claim 26 limitations are similar to claim 12 limitations and are rejected under the same rationale.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Yadav, Sbar, and Masud for the same reasons disclosed above.

Regarding claim 28, the rejection of claim 24 is incorporated and Yadav further teaches the artificial neural network of claim 24:
claim 28 limitations are similar to claim 14 limitations and are rejected under the same rationale.
Yadav and Sbar do not expressly disclose:
in which the at least one processor is further configured
Masud teaches the claim limitation: in which the at least one processor is further configured (Masud teaches the memory unit in Fig. 23 for executing instruction by the processor for, in [0261] and may be configured based on the particular implementation for executing instructions for software loaded in the memory, in [0257].)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Yadav, Sbar, and Masud for the same reasons disclosed above.


15-22, 31, and 34  are rejected under 35 U.S.C. 103 as being unpatentable over Yadav et al. (Non-Patent Literature Publication: “Novelty detection applied to the classification problem using probabilistic neural network, hereinafter ‘Yadav’), in view of Steenhoek et al (NPL: “Probabilistic Neural Networks for Segmentation of Features in Corn Kernel Images”, hereinafter ‘Steen”), and in further view of Masud et al. (US Patent Application Publication No. 2012/0054184) hereinafter ‘Masud’.

Regarding independent claim 15 limitations, Yadav in combination with Dina teaches an apparatus an artificial neural network, comprising:
Claim 15 recites limitations similar to claim 1 limitations that are rejected under the same rationale.
Yadav and Steen do not expressly teach claim 15 limitations:
at least one memory unit; and at least one processor coupled to the memory unit, the at least one processor configured:
Masud does expressly teach claim 15 limitations:
at least one memory unit; and at least one processor coupled to the memory unit, the at least one processor configured: (Masud teaches the memory unit in Fig. 23 for executing instruction by the processor for, in [0261] and may be configured based on the particular implementation for executing instructions for software loaded in the memory, in [0257].)
Yadav, Steen, and Masud references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing methods and systems for information processing and pattern recognition.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to integrate a memory unit and processor as disclosed by Masud with the method for processing information as collectively disclosed by Yadav and Steen.


Regarding claim 16, the rejection of claim 15 is incorporated and Yadav in combination with Steen and Masud further teaches the artificial neural network of claim 15:
claim 16 limitation is similar to claim 2 limitation and is rejected under the same rationale.
Yadav and Steen do not expressly disclose:
in which the at least one processor is further configured
Masud teaches the claim limitation: in which the at least one processor is further configured (Masud teaches the memory unit in Fig. 23 for executing instruction by the processor for, in [0261] and may be configured based on the particular implementation for executing instructions for software loaded in the memory, in [0257].)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Yadav, Steen, and Masud for the same reasons disclosed above.

Regarding claim 17, the rejection of claim 15 is incorporated and Yadav in combination with Steen and Masud further teaches the artificial neural network of claim 15:
claim 17 limitation is similar to claim 3 limitation and is rejected under the same rationale.
Yadav and Steen do not expressly disclose:
in which the at least one processor is further configured
Masud teaches the claim limitation: in which the at least one processor is further configured (Masud teaches the memory unit in Fig. 23 for executing instruction by the processor for, in [0261] and may be configured based on the particular implementation for executing instructions for software loaded in the memory, in [0257].)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Yadav, Steen, and Masud for the same reasons disclosed above.

Regarding claim 18, the rejection of claim 17 is incorporated and Yadav in combination with Steen and Masud further teaches the artificial neural network of claim 17:
claim 18 limitation is similar to claim 4 limitation and is rejected under the same rationale.

Regarding claim 19, the rejection of claim 18 is incorporated and Yadav in combination with Steen and Masud further teaches the artificial neural network of claim 18:
claim 19 limitation is similar to claim 5 limitation and is rejected under the same rationale.

Regarding claim 20, the rejection of claim 17 is incorporated and Yadav in combination with Steen and Masud further teaches the artificial neural network of claim 17:
claim 20 limitation is similar to claim 6 limitation and is rejected under the same rationale.
Yadav and Steen do not expressly disclose:
in which the at least one processor is further configured
Masud teaches the claim limitation: in which the at least one processor is further configured (Masud teaches the memory unit in Fig. 23 for executing instruction by the processor for, in [0261] and may be configured based on the particular implementation for executing instructions for software loaded in the memory, in [0257].)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Yadav, Steen,  and Masud for the same reasons disclosed above.

Regarding claim 21, the rejection of claim 15 is incorporated and Yadav in combination with Steen and Masud further teaches the artificial neural network of claim 15:
claim 21 limitation is similar to claim 7 limitation and is rejected under the same rationale.

Regarding claim 22, the rejection of claim 15 is incorporated and Yadav in combination with Steen and Masud further teaches the artificial neural network of claim 15:
claim 22 limitation is similar to claim 8 limitation and is rejected under the same rationale.
Regarding claim 23, the rejection of claim 15 is incorporated and Yadav in combination with Steen, and Masud further teaches the artificial neural network of claim 15:
in which the second classifier is linear or non-linear. (Yadav teaches use of the second classifier as a non-linear classifier for determining patterns in the data set associated with novel classes, noted as the C+1th class that represents a novel class in the generated by training the PNN classifier with multiple pattern layers associated with each class pattern as depicted in Fig. 1, that is a non-linear neural network classifier, in pg. 4: 1st partial ¶: “we find negative class examples…..”, where the classifiers as assigned binary numbers 1 and -1 for classifying the target and novel class as noted in algorithm 2)

Regarding independent claim 31 limitations, Yadav in combination with Steen teaches an apparatus, comprising:
claim 31 limitation is similar to claim 1 limitation and is rejected under the same rationale.
Yadav and Steen do not expressly teach a means for.
Masud does teach the means for as the memory unit in Fig. 23 for executing instruction by the processor for, in [0261] and may be configured based on the particular implementation for executing instructions for software loaded in the memory, in [0257].
Yadav, Steen, and Masud references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing methods and systems for information processing and pattern recognition.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to integrate a memory unit and processor as disclosed by Masud with the method for processing information as collectively disclosed by Yadav and Steen.
One of ordinary skill in the arts would have been motivated to integrate the disclosed methods in order to classify data concepts that evolve over time improve using classification models (Masud, 0004); doing so would help improve automatic detection of novel data instances for training classification models that may go undetected in a distributed computing environment (Masud, [0004] & [0262]).

in which the at least two scores comprise un-normalized scores. (Yadav teaches the set of weight connection values that are at least two feature value scores, considered un-normalized scores, in pg. 3: Sec. 2.1: 1st : ¶.Architecture of probabilistic neural network, as described in [7], has been shown in the Figure 1. It consists of an input layer, pattern layer and output layer. Input layer consists of d input nodes, where d is the dimension of the data. Pattern layer consists of N pattern nodes, where N is the number of training examples. Output layer consists of C nodes, where C is the number of classes. Each input node in the input layer is connected to each of the N pattern node in the pattern layer and weights of connections is feature values of these training patterns. Pattern layer is not completely connected to the output layer as shown in Figure 1…)

Claim 32 is rejected under 35 U.S.C. 103 as being unpatentable over Yadav et al. (Non-Patent Literature Publication: “Novelty detection applied to the classification problem using probabilistic neural network, hereinafter ‘Yadav’) in view of Masud et al. (US Patent Application Publication No. 2012/0054184, hereinafter ‘Masud’), in further view of Skabar (NPL: “Single-class classifier learning using neural networks: An application to the prediction of mineral deposits”, hereinafter ‘Sbar’).

Regarding independent claim 32 limitations, Yadav in combination with Sbar teaches , an apparatus, comprising:
claim 32 limitation is similar to claim 10 limitation and is rejected under the same rationale
Yadav and Sbar does not expressly disclose the means for as a processor with memory. 
. (Masud teaches the memory unit in Fig. 23 for executing instruction by the processor for, in [0261] and may be configured based on the particular implementation for executing instructions for software loaded in the memory, in [0257].)
Yadav, Sbar, and Masud references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing methods and systems for information processing and pattern recognition.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to integrate a memory unit and processor as disclosed by Masud with the method for processing information collectively disclosed by Yadav and Sbar.
One of ordinary skill in the arts would have been motivated to integrate the disclosed methods in order to classify data concepts that evolve over time improve using classification models (Masud, 0004); doing so would help improve automatic detection of novel data instances for training classification models that may go undetected in a distributed computing environment (Masud, [0004] & [0262]).

Claim 29 is  rejected under 35 U.S.C. 103 as being unpatentable over Yadav et al. (Non-Patent Literature Publication: “Novelty detection applied to the classification problem using probabilistic neural network) hereinafter ‘Yadav’, Steenhoek et al (NPL: “Probabilistic Neural Networks for Segmentation of Features in Corn Kernel Images”, hereinafter ‘Steen”), and in view of Hunzinger et al. (US Pub No. 2013/0117210 hereinafter, ‘Hun’).

Regarding independent claim 29 limitations, Yadav and Steen teaches:
claim 29 limitation is similar to claim 1 limitation and is rejected under the same rationale
Yadav and Steen do not expressly teach claim 29 limitations:
non-transitory computer-readable medium having program code recorded thereon, the program code being executed by a processor of an artificial neural network and comprising: 
Hun does expressly teach claim 29 limitations:
a non-transitory computer-readable medium having program code recorded thereon, the program code being executed by a processor and comprising: (Hun teaches the system using neural processors for implementing learning operations, in [0279]: FIG. 56 illustrates an example implementation 5600 of the aforementioned methods for neural component replay, learning refinement, memory transfer, associative learning, pattern comparison, pattern completion, pattern separation, pattern generalization, pattern sequence completion with a hierarchy, and pattern hierarchical replay, where a memory 5602 can be interfaced via an interconnection network 5604 with individual ( distributed) processing units (neural proces­sors) 5606 of a computational network (neural network) in accordance with certain aspects of the present disclosure; where the system includes memory considered a non-transitory computer readable memory, in [0279]: … One or more weights and delays associated with one or more connections (synapses) of the computational network (neural network) may be loaded from the memory 5602 via connec­tion(s) of the interconnection network 5604 into each pro­cessing unit (neural processor) 5606; for performing operations as hardware software modules including memory for executing program code on  processors. Om [0291] & [[0294]-[0297].
program code... (Hun teaches the system using neural processors for implementing learning operations, in [0279]: FIG. 56 illustrates an example implementation 5600 of the aforementioned methods for neural component replay, learning refinement, memory transfer, associative learning, pattern comparison, pattern completion, pattern separation, pattern generalization, pattern sequence completion with a hierarchy, and pattern hierarchical replay, where a memory 5602 can be interfaced via an interconnection network 5604 with individual ( distributed) processing units (neural proces­sors) 5606 of a computational network (neural network) in accordance with certain aspects of the present disclosure; where the system includes memory considered a non-transitory computer readable memory, in [0279 & [0297]: … One or more weights and delays associated with one or more connections (synapses) of the computational network (neural network) may be loaded from the memory 5602 via connec­tion(s) of the interconnection network 5604 into each pro­cessing unit (neural processor) 5606; for performing operations as hardware software modules including memory for executing program code on  processors, in [0291] & [0297]-[0298].
Yadav, Steen, and Hun are analogous art because they are directed to the method and system for information processing and pattern recognition.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to integrate a memory unit, processor, and software as disclosed by Hun with the method for processing information collectively disclosed by Yadav and Steen.
One of ordinary skill in the arts would have been motivated to integrate the disclosed methods in order to classify data based on patterns being processed using a hierarchical machine learning process (Hun, 0240-0241); doing so improves the process of pattern separations by increasing the difference .

Claim 30 is rejected under 35 U.S.C. 103 as being unpatentable over Yadav et al. (Non-Patent Literature Publication: “Novelty detection applied to the classification problem using probabilistic neural network) hereinafter ‘Yadav’, in view of Skabar (NPL: “Single-class classifier learning using neural networks: An application to the prediction of mineral deposits”, hereinafter ‘Sbar’) and in further view of Hunzinger et al. (US Pub No. 2013/0117210 hereinafter, ‘Hun’).

Regarding independent claim 30 limitations, Yadav in combination with Sbar teaches:
claim 30 limitations are similar to claim 10 limitations and are rejected under the same rationale.
Yadav and Sbar do not expressly teach claim 30 limitations:
non-transitory computer-readable medium having program code recorded thereon, the program code being executed by a processor of an artificial neural network and comprising: 
 program code…
Hun does expressly teach claim 30  limitations:
non-transitory computer-readable medium having program code recorded thereon, the program code being executed by a processor of an artificial neural network and comprising: … (Hun teaches the system using neural processors for implementing learning operations, in [0279]: FIG. 56 illustrates an example implementation 5600 of the aforementioned methods for neural component replay, learning refinement, memory transfer, associative learning, pattern comparison, pattern completion, pattern separation, pattern generalization, pattern sequence completion with a hierarchy, and pattern hierarchical replay, where a memory 5602 can be interfaced via an interconnection network 5604 with individual ( distributed) processing units (neural proces­sors) 5606 of a computational network (neural network) in accordance with certain aspects of the present disclosure; where the system includes memory considered a non-transitory computer readable memory, in [0279]: … One or more weights and delays associated with one or more connections (synapses) of the computational network (neural network) may be loaded from the memory 5602 via connec­tion(s) of the interconnection network 5604 into each pro­cessing unit (neural processor) 5606; for performing operations as hardware software modules including memory for executing program code on  processors. Om [0291] & [[0294]-[0297].
program code... (Hun teaches the system using neural processors for implementing learning operations, in [0279]: FIG. 56 illustrates an example implementation 5600 of the aforementioned methods for neural component replay, learning refinement, memory transfer, associative learning, pattern comparison, pattern completion, pattern separation, pattern generalization, pattern sequence completion with a hierarchy, and pattern hierarchical replay, where a memory 5602 can be interfaced via an interconnection network 5604 with individual ( distributed) processing units (neural proces­sors) 5606 of a computational network (neural network) in accordance with certain aspects of the present disclosure; where the system includes memory considered a non-transitory computer readable memory, in [0279 & [0297]: … One or more weights and delays associated with one or more connections (synapses) of the computational network (neural network) may be loaded from the memory 5602 via connec­tion(s) of the interconnection network 5604 into each pro­cessing unit (neural processor) 5606; for performing operations as hardware software modules including memory for executing program code on  processors, in [0291] & [0297]-[0298].
Yadav, Sbar, and Hun are analogous art because they are directed to the method and system for information processing and pattern recognition.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to integrate a memory unit, processor, and software as disclosed by Hun with the method for processing information collectively disclosed by Yadav and Sbar.
One of ordinary skill in the arts would have been motivated to integrate the disclosed methods in order to classify data based on patterns being processed using a hierarchical machine learning process (Hun, 0240-0241); doing so improves the process of pattern separations by increasing the difference between patterns and improving distinction in recognizing separate patterns  during an automated learning process (Hun, 0240-0241).
Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Zhou (US Pub No. 2009/0060340): teaches the use of back-propagation neural networks for classification of features using sequential classes. 
Matsugu et al. (US Pub. No. 2005/0283450): teaches the use of convolutional neural networks to classify features as node classifiers assigned to layers of the neural network.
Aharonov et al. (US Pub. No. 20110312530): teaches binary classifiers as decision trees using neural networks. 
Codella et al. (US Patent Application Publication No. 2016/0092789): System for sampling in imbalanced machine learning classifier systems using generated synthetic 
Hua et al. (US Patent Application Publication No. 2016/0217349): Classification of negative and positive data using model vectors and similarity values, in [0115]-[0119].
Lin et al. (US Patent Application Publication No. 2015/0088791): Identifying training data associated with majority and minority data classes by evaluating the usefulness of candidate data samples [0043] and uses distance thresholds to generated center points associated with the data sample, in [0026]-[0030].
Abe et al. (US Patent Application Publication No. 2005/0289089): Abe teaches Multi-class learning with neural networks.
                                                                                                                                                                                                   Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUWATOSIN ALABI whose telephone number is (571)272-0516.  The examiner can normally be reached on Monday-Friday, 8:00am-5:00pm EST..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available 






/O.O.A./Examiner, Art Unit 2126    
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126