DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 2022-01-03 has been entered.  Applicant’s amendments to the Specification and Claims have overcome each and every objection and 112(b) rejection, except for those regarding Claim 8, previously set forth in the Non-Final Office Action mailed 2021-10-26.  The status of the claims is as follows:
Claims 1 and 3-14 remain pending in the application.
Claim 2 is cancelled.
Claims 1 and 3-14 are amended.
Response to Arguments
Applicant’s amendment to Claim 12 has successfully overcome the 35 USC 101 rejection that was based on non-statutory subject matter.
Applicant’s amendment to Claim 14 has not successfully overcome the 35 USC 101 rejection that was based on non-statutory subject matter.  It is unclear how the term “non-transitory” applies to a “learned model”.  If Applicant means “a learned model stored on a non-transitory computer readable medium”, this is still non-statutory subject matter because it is the “learned model” itself (potentially software) that the claim is directed to. The Specification does not define the “learned model” as a piece of hardware, and machine learning models are commonly understood to be computer programs, thus the claim amounts to “software per se”.  
Applicant's arguments in response to rejections under 35 USC 101 related to a judicial exception have been fully considered but they are not persuasive.
Applicant argues on Remarks Pages 20-22 regarding Step 2A Prong One that several limitations cannot practically be performed in the human mind because they “need…processor circuitry and thus cannot be performed by a pen and paper.”  This is not a persuasive argument.  Everything described in the claims could be done with a pen and paper with a simple enough model/tree.  
One example of something that cannot be practically performed in the human mind is a method of training a model (see PEG Example 39 and MPEP 2106.04(a)(1)(vii)), which is known to be impractical to perform in the human mind.  However, the present invention is explicitly not directed to training the model, as it assumes an already “learned model”.  The “learning” is alluded to as something that has already been done, as in Claim 1:   “learned model that is obtained by…learning” and “reliability index that is obtained through the learning”, and “output, based on the data for learning”.  Amended Claim 8 refers to “a number of times of learnings”, again suggesting the learning is done.  This corresponds with Specification [0007]:  “In this way, according to a conventional learning structure, even when learning is not yet sufficiently performed on a particular state space, prediction processing is performed using a node that is located at an utmost end position among upper nodes and that encompasses the relevant state space, and thus, at least rough prediction processing can be performed, that is, a generalization function can be achieved.”  Here, the invention is described as allowing one to the learning has proceeded to a degree”.  Thus, Examiner understands that the claimed method is performed on a tree that is learned to a degree.   Examiner asserts that as claimed, the claimed operations, performed on a tree of small enough size that has been “learned to a degree”, can be practically performed in the human mind with pen and paper, and thus the method steps recite a mental process.
Applicant argues on Remarks Page 23 regarding Step 2A Prong Two that the Examiner is “incorrect” that the claims do not integrate the judicial exception into a practical application because it provides a “technological improvement to the certain technological problems.”  Examiner respectfully points out that the test here is if the additional elements to the mental process integrate the judicial exception into a practical application.  Examiner notes that every limitation is directed to the mental process, and the only additional element is processor circuitry, which amounts to mere instructions to apply the exception on a computer (see MPEP 2106.05(f)).
Applicant argues on Remarks Page 25 regarding Step 2B that the Examiner has not followed the proper procedures to establish well understood, routine, conventional activity because “no court decisions are listed”.  Examiner respectfully points out that Examiner did not recite “well-understood, routine, conventional activity” in the rejection, but rather again reiterated “mere instructions to apply the exception on a computer”, which is also applicable to Step 2B as well as 2A Prong Two (see MPEP 2106.05(f):  “Another consideration when 
Applicant's arguments in response to rejections under 35 U.S.C. 103 have been fully considered but they are not persuasive.
Regarding the independent claims, Applicant argues on Remarks Page 27 that Gama’s error of naïve Bayes classifiers is different than the definition of error in the claimed invention.  Examiner respectfully disagrees, as the claim recites what one of ordinary skill in the art would understand to be an error, which is the difference between an expected value and an actual value (“difference between an output…and a prediction output”).  Gama also discloses an error of a classifier, which is also a difference between an output (the class) and the prediction output (the expected class).
Regarding the independent claims, Applicant argues on Remarks Page 27 that Gama does not specify specifying an output node from the input nodes because Gama prunes the subtrees and a node becomes a leaf.  Examiner respectfully points out that Examiner acknowledged this in the rejection, and was referencing Gama’s work up to but not including pruning.  As is stated below in the rejections:  “Now, let us consider the teachings of Gama up to, but not including, the pruning based on the highest error.  Note that Gama states that the Naïve-Bayes classifiers “monitor” the Naïve Bayes error.  “Monitoring” is useful in itself, and 
Regarding the independent claims, Applicant argues on Remarks Page 27-28 that Woods is silent on details about local accuracy and is not related to selecting a node among input nodes.  In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).  Examiner points out that Gama teaches error rates of individual nodes among input nodes, wherein each nodes comprises a classifier, and Woods was relied upon strictly for the concept of choosing a classifier with the lowest error.  Thus, the combination results in selecting the node of the input nodes with the lowest error.
Regarding Claim 5, Applicant argues on Remarks Page 30 that “data corresponding to the node is referenced and a prediction output of a node lower than the node is NOT referenced”.  Examiner respectfully points out that the claim language states “prediction output at the each input node or a node among the input nodes that is located on a layer among the layers that is lower than the each input node”.  Examiner mapped the art to “at the each input node” as the node on a lower layer was not required to be mapped by the “or” in the claim language.  Examiner was sure to specify this in the rejection language, and it is still in the rejection language below.
Regarding Claims 6-7, Applicant argues on Remarks Page 30-32 that “data corresponding to the node is referenced and a prediction output of a node lower than the node 
Regarding Claim 8, Applicant’s arguments on Remarks Page 32 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.  The new reference Pyeatt et. al. has been included to teach the newly amended subject matter.
Applicant also argues here that “By this amendment, the meaning of "appropriate calculation," recited in Applicant's claim 8, is clarified. ‘Appropriate calculation’ is possible when…”  Examiner respectfully disagrees, because while Applicant has clarified when “appropriate calculation” is possible, Applicant has not clarified what the said “appropriate actually is.  Examiner recommends simply removing the word “appropriate”, or alternatively, explicitly defining “appropriate calculation”.  As currently claimed, the “appropriate calculation” could be any calculation at all.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 8-9 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The term “appropriate” in claim 8 is a relative term which renders the claim indefinite. The term “appropriate” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  
Claim 9 is rejected because it inherits the deficiencies of Claim 8.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:


Claim 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because it is directed to “a learned model”.  While a programmed computer or non-transitory storage medium comprising a computer are both examples of eligible subject matter, merely claiming a “computer program” is not eligible as it is considered “software per se”.  The Specification does not define the “learned model” as a piece of hardware, and machine learning models are commonly understood to be computer programs, thus the claim amounts to “software per se”. 
Claims 1-14 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis:
In the instant case, Claims 1-10 are directed to an information processing device, Claim 11 is directed to a method, and Claim 13 is directed to an IC chip.  These claims fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).  Claims 12 and 14, as shown above, do not fall within one of the four statutory categories.
Step 2 Analysis:
Based on the claims being determined to be within one of the four categories (Step 1), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea).  In this case the claims fall within the judicial exception of an 
Step 2A: Prong 1 analysis:
The claims recite:
Claims 1 and 10-14:
 “generate…a prediction output corresponding to input data, based on a learned model that is obtained by causing a learning model having a tree structure configured by a plurality of hierarchically arranged nodes each associated with a corresponding one of hierarchically divided state spaces to learn a predetermined set of pieces of data for learning”.  Generating a prediction output corresponding to input data, based on a learned model is something that can be performed in the human mind.  Using a tree-based model that has already been learned can be performed by a human with pen and paper.  The claim here merely states that the model is already learned, and does not provide details on the learning process itself.  Thus, the limitation recites a mental process.
“based on the input data…specify input nodes corresponding to the input data wherein each of the input nodes is located on a corresponding one of layers from beginning to end of the learning tree structure”.  The “specifying” here falls under “observation, evaluation, judgment, or opinion”, and is thus a mental process.
“acquire a reliability index that is obtained through learning a predetermined set of pieces of data for learning and indicates prediction accuracy”.  Acquiring a value indicating an accuracy can be performed by a human with pen and paper.  The 
“based on the reliability index acquired by the reliability-index acquisition processor circuitry…specify from the input nodes corresponding to the input data, an output node that is a basis of the generation of a prediction output”.  Specifying based on an index falls under “observation, evaluation, judgment, or opinion”, and is thus a mental process.
“generate…a prediction output, based on the data for learning that is included in the state spaces that corresponds to the output node specified”.  Generating an output based on data falls under “observation, evaluation, judgment, or opinion”, and is thus a mental process.
“wherein the reliability index comprises first errors each generated at a corresponding input node among the input nodes based on a difference between an output corresponding to the input data and a prediction output based on learned data included in the state spaces that corresponds to the corresponding input node”.  Calculating this reliability index falls under “observation, evaluation, judgment, or opinion”, and is thus a mental process.
“specify, as the output node, a node which is among the input nodes and for which a corresponding first error among the first errors is minimal”.  Specifying a node for which error is minimal falls under “observation, evaluation, judgment, or opinion”, and is thus a mental process.
Step 2A:  Prong 2 analysis:
This judicial exception is not integrated into a practical application for Claims 1 and 10-14.  Additional elements “processor circuitry”, “input-node specification processor circuitry”, “reliability-index acquisition processor circuitry”, “output-node specification processor circuitry”, and “prediction-output generation processor circuitry” amount to mere instructions to apply the exception on a computer (see MPEP 2106.05(f)).
Step 2B analysis:
Claims 1 and 10-14 do not include additional elements that are sufficient to amount to significantly more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, additional elements “input-node specification processor circuitry”, “reliability-index acquisition processor circuitry”, “output-node specification processor circuitry”, and “prediction-output generation processor circuitry” amount to mere instructions to apply the exception on a computer (see MPEP 2106.05(f)).
	Dependent claim(s) 2-9 when analyzed as a whole are held to be patent ineligible under 35 U.S.C. 101 because the additional recited limitation(s) fail(s) to establish that the claim(s) is/are not directed to an abstract idea, as they recite further embellishment of the judicial exception.  
	Claim 2 recites the same limitations as Claim 1, providing further details on the reliability index and specifying the output node, and is also directed to a mental process.
	Claim 3 recites the same limitations as Claim 2, providing details on error calculation with a forgetting factor, and is also directed to a mental process.

Claim 5 recites the same limitations as Claim 1, providing further details on the reliability index and output node specification, and is also directed to a mental process.
Claim 6 recites the same limitations as Claim 5, providing further details on the reliability index and output node specification, and is also directed to a mental process.
Claim 7 recites the same limitations as Claim 5, providing further details on the reliability index and output node specification, and is also directed to a mental process.
Claim 8 recites the same limitations as Claim 1, providing further details on the reliability index calculation and the output node specification, which describe further a mental process.  Additional elements “highly reliable node specification processor circuitry”, “calculation possibility determination processor circuitry”, “selective output-node specification processor circuitry” amount to mere instructions to apply the exception on a computer (see MPEP 2106.05(f)).
Claim 9 recites the same limitations as Claim 8, providing a predetermined number, and is thus still directed to a mental process.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the 

Claims 1, 4-7, and 10-14 are rejected under 35 U.S.C. 103 as being unpatentable over Gama et. al. (Learning Decision Trees from Dynamic Data Streams”; hereinafter Gama), as further graphically evidenced by corresponding tutorial Gama et. al. (“Learning with Local Drift Detection”; hereinafter GamaTutorial) in view of Woods et. al. (“Combination of multiple classifiers using local accuracy estimates”; hereinafter Woods).   
As per Claim 1, Gama teaches An information processing device configured to generate, by processor circuitry, a prediction output corresponding to input data, based on a learned model that is obtained by causing a learning model having a tree structure configured by a plurality of hierarchically arranged nodes each associated with a corresponding one of hierarchically divided state spaces to learn a predetermined set of pieces of to-be-learned data (Gama, Section 4.1, end of first paragraph, discloses:  “All algorithms ran on a Centrino at 1.5GHz with 512 MB of RAM and using Linux Mandrake”, thus disclosing an information processing device with processor circuitry. Gama, Section 3, discloses:  “UFFT is an algorithm for supervised classification learning that generates a forest of binary trees”, thus disclosing a tree structure, which inherently has a plurality of hierarchically arranged nodes.  Gama, Section 3 Line 7, continues:  “During the training phase the algorithm maintains a short term memory”, and thus by a “training phase”, Gama discloses based on a learned model that is obtained by causing a learning model to learn a predetermined set of pieces of data for learning.  Gama, Bottom of page 1361, discloses:  “All decision nodes contain naive Bayes to detect changes in the class distribution of the examples that traverse the node, that correspond to detect shifts in different regions of the instance space.”  Here, Gama discloses the “nodes”, which are hierarchically arranged, correspond to “instance space” and thus discloses nodes each associated with a corresponding one of hierarchically divided state spaces.  Gama, Section 3.1.5.1, discloses:  “Each tree in the forest makes a prediction”, and thus Gama discloses generates a prediction output corresponding to input data.  GamaTutorial illustrates this on Pages 17:

    PNG
    media_image1.png
    585
    1198
    media_image1.png
    Greyscale

input-node specification processor circuitry, based on the input data, configured to specify input nodes corresponding to the input data wherein each of the input nodes is located on a corresponding one of layers from beginning to end of the learning tree structure (Recall above Gama Section 4.1, end of first paragraph, discloses processor circuitry.  Gama, Section 3.1.5.1, discloses:  “Each tree in the forest makes a prediction”.  In order to make a prediction output, that prediction output must be based on input data, otherwise a prediction is impossible.  The input data traverses the tree as per Gama, Bottom of page 1361:  “All decision nodes contain naive Bayes to detect changes in the class distribution of the examples that traverse the node”, and thus the input nodes are corresponding to the input data.  The tree has layers from root to leaf, as is shown in the illustration from GamaTutorial, and thus each of the input nodes is located on a corresponding one of layers from beginning to end of the learning tree structure.)
reliability-index acquisition processor circuitry configured to acquire a reliability index that is obtained through the learning a predetermined set of pieces of data for learning and indicates prediction accuracy (Recall above Gama Section 4.1, end of first paragraph, discloses processor circuitry. Gama, Section 3.2, discloses:  “The UFFT algorithm maintains, at each node of all decision trees, a naive-Bayes classifier. Those classifiers were constructed using the sufficient statistics needed to evaluate the splitting criteria when that node was a leaf. When the leaf becomes a node the naive-Bayes classifier will classify the examples that traverse the node. The basic idea of the drift detection method is to control this online error-rate. If the distribution of the examples that traverse a node is stationary, the error rate of naive-Bayes decreases. If there is a change on the distribution of the examples the naive-Bayes error will increase.”  Here, Gama discloses acquires a reliability index (“error-rate”) that indicates prediction accuracy (“If the distribution of the examples that traverse a node is stationary, the error rate of naive-Bayes decreases. If there is a change on the distribution of the examples the naive-Bayes error will increase”).  The tree is built by training, as indicated by Gama, Top of page 1361:  “When a new training example becomes available, it will cross the corresponding binary decision trees from the root node till a leaf. At each node, the naïve Bayes installed at that node classifies the example.”  Training comprises training data that is to be learned.  Thus the tree itself, and consequently the reliability index (“error-rate”) is obtained through the learning a predetermined set of pieces of data for learning.)
(Recall above Gama Section 4.1, end of first paragraph, discloses processor circuitry. Gama, Section 3.1.4, discloses:  “To classify an unlabeled example, the example traverses the tree from the root to a leaf. It follows the path established, at each decision node, by the splitting test at the appropriate attribute-value. The leaf reached classifies the example.”  Here, Gama establishes an output node (“the leaf reached”) that is a basis of the generation of a prediction output (“classifies the examples”).  Thus, Gama discloses that the output node is a leaf node.  Gama, Section 3.2, discloses:  “If the distribution of the examples that traverse a node is stationary, the error rate of naive-Bayes decreases. If there is a change on the distribution of the examples the naive-Bayes error will increase [17]. When the system detect an increase of the naive-Bayes error in a given node, an indication of a change in the distribution of the examples, this suggest that the splitting-test that has been installed at this node is no longer appropriate. In such cases, all the subtree rooted at that node is pruned, and the node becomes a leaf.”  Here, Gama discloses that leaf nodes can be destroyed and created from internal nodes (input nodes) based on the reliability index (“error-rate”).  Since the output node must be a leaf node, then Gama discloses specify an output node from the input nodes based on the reliability index.)
a prediction-output generation processor circuitry configured to generate a prediction output, based on the data for learning that is included in the state spaces that corresponds to the output node specified by the output-node specification processor circuitry.  (Recall above Gama Section 4.1, end of first paragraph, discloses processor circuitry.  Recall that Gama, Bottom of page 1361, discloses:  “All decision nodes contain naive Bayes to detect changes in the class distribution of the examples that traverse the node, that correspond to detect shifts in different regions of the instance space.”  Here, Gama discloses the “instance space”, and thus discloses the data for learning that is included in the state spaces.  Recall also that Gama, Section 3.1.5.1, discloses:  “Each tree in the forest makes a prediction”, and thus Gama discloses generate a prediction output.  Recall that Gama, Section 3.1.4, discloses:  “To classify an unlabeled example, the example traverses the tree from the root to a leaf. It follows the path established, at each decision node, by the splitting test at the appropriate attribute-value. The leaf reached classifies the example.”  Thus, Gama discloses that the prediction (“classifies”) corresponds to the output node (“the leaf reached”)).
wherein the reliability index comprises first errors each generated at a corresponding input node among the input nodes based on a difference between an output corresponding to the input data and a prediction output based on learned data included in the state spaces that corresponds to the corresponding input node (Recall that Gama, Bottom of page 1361, discloses:  “All decision nodes contain naive Bayes to detect changes in the class distribution of the examples that traverse the node, that correspond to detect shifts in different regions of the instance space.”  Here, Gama discloses the “instance space”, and thus discloses learned data that is included in the state spaces.  Gama, Section 3.2, discloses:  “The UFFT algorithm maintains, at each node of all decision trees, a naive-Bayes classifier. Those classifiers were constructed using the sufficient statistics needed to evaluate the splitting criteria when that node was a leaf. When the leaf becomes a node the naive-Bayes classifier will classify the examples that traverse the node. The basic idea of the drift detection method is to control this online error-rate. If the distribution of the examples that traverse a node is stationary, the error rate of naive-Bayes decreases. If there is a change on the distribution of the examples the naive-Bayes error will increase.”  Here, Gama discloses first errors (“error-rate”) each generated at a corresponding input node among the input nodes (“at each node of all decision trees”).  This is based on a difference between an output corresponding to the input data and a prediction output, as it is an error rate based on the output of a classifier, which provides a prediction output.)
wherein the output-node specification processor circuitry specifies, as the output node, a node which is among the input nodes (Recall above Gama Section 4.1, end of first paragraph, discloses processor circuitry.  Gama, Section 3.1.4, discloses:  “To classify an unlabeled example, the example traverses the tree from the root to a leaf. It follows the path established, at each decision node, by the splitting test at the appropriate attribute-value. The leaf reached classifies the example.”  Here, Gama establishes an output node (“the leaf reached”) that is a basis of the generation of a prediction output (“classifies the examples”).  Thus, Gama discloses that the output node is a leaf node.  Gama, Section 3.2, discloses:  “If the distribution of the examples that traverse a node is stationary, the error rate of naive-Bayes decreases. If there is a change on the distribution of the examples the naive-Bayes error will increase [17]. When the system detect an increase of the naive-Bayes error in a given node, an indication of a change in the distribution of the examples, this suggest that the splitting-test that has been installed at this node is no longer appropriate. In such cases, all the subtree rooted at that node is pruned, and the node becomes a leaf.”  Here, Gama discloses that leaf nodes can be destroyed and created from internal nodes (input nodes) based on the reliability index (“error-rate”).  Since the output node must be a leaf node, then Gama discloses that an input node may become an output node, and thus Gama discloses specifies, as the output node, a node which is among the input nodes.)
However, Gama does not teach and for which a corresponding first error among the first errors is minimal.
Woods teaches and for which a corresponding first error among the first errors is minimal.  (Note above that Gama discloses that each node comprises a Naïve Bayes classifier (“naive-Bayes error in a given node”).  Gama, Abstract discloses:  “Decision nodes and leaves contain naive-Bayes classifiers playing different roles during the induction process. Naive-Bayes in leaves are used to classify test examples. Naive-Bayes in inner nodes play two different roles. They can be used as multivariate splitting-tests if chosen by the splitting criteria, and used to detect changes in the class-distribution of the examples that traverse the node.”  Here, Gama discloses Naive-Bayes classifiers in inner nodes.  This is further illustrated in GamaTutorial Page 19:

    PNG
    media_image2.png
    456
    636
    media_image2.png
    Greyscale

Now, let us consider the teachings of Gama up to, but not including, the pruning based on the highest error.  Note that Gama states that the Naïve-Bayes classifiers “monitor” the Naïve Bayes error.  “Monitoring” is useful in itself, and does not necessarily require pruning, and thus Gama does not explicitly “teach away” from potentially using these errors for other purposes.
Now let us consider Woods.  Based on Gama, a plurality of Naïve Bayes Classifiers with corresponding error-rates (first error) are disclosed.  Woods, Section 3.2, Last Sentence of Paragraph 1, discloses:  “When the individual classifiers disagree, local accuracy is estimated for each classifier, and the decision of the classifier with the highest local accuracy estimate is selected.”  A “highest accuracy” also means “lowest error”.  Thus, Woods discloses specifies a And thus in combination with Gama, discloses specifies, as the output node, a node which is among the input nodes and for which a corresponding first error among the first errors is minimal.)
Gama and Woods are analogous art because they are both in the field of endeavor of machine learning.
Therefore, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the invention, to combine the decision tree with a classifier at each node of Gama, with the choosing of the classifier with the highest accuracy of Woods.  One would have been motivated to do so in order to improve the accuracy of the prediction (Woods, Conclusion:  “We have shown that even if all the individual classifiers have been optimized, dynamic classifier selection by local accuracy is still capable of improving overall performance significantly. By contrast, simple voting techniques, and even a recently proposed CMC algorithm, were not able to show any significant improvement when the individual classifiers were sufficiently optimized. At times, some of the other CMC algorithms actually hurt performance. The proposed DCS-LA algorithm was always capable of improving performance.”)

As per Claim 4, the combination of Gama and Woods teaches the information processing device according to claim 1.  Recall above that Gama Section 3 discloses learning a predetermined set of pieces of data for learning (“training”).  Recall also that Gama, Bottom of page 1361, discloses state spaces (“instance space”) which is also illustrated in GamaTutorial, and thus discloses learned data included in the state spaces. Recall that Gama, Section 3.1.5.1, discloses:  “Each tree in the forest makes a prediction”, and thus prediction output and therefore prediction output based on learned data included in the state spaces.  Recall that Gama, Section 3.2, discloses:  “The UFFT algorithm maintains, at each node of all decision trees, a naive-Bayes classifier. Those classifiers were constructed using the sufficient statistics needed to evaluate the splitting criteria when that node was a leaf. When the leaf becomes a node the naive-Bayes classifier will classify the examples that traverse the node. The basic idea of the drift detection method is to control this online error-rate. If the distribution of the examples that traverse a node is stationary, the error rate of naive-Bayes decreases. If there is a change on the distribution of the examples the naive-Bayes error will increase.”  Here, Gama discloses a tree, and thus that data corresponds to the corresponding input node.  Gama here also discloses that “error-rate” are each generated at a corresponding input node among the input nodes (“at each node of all decision trees”).  This is based on a difference between an output corresponding to the input data and a prediction output, as it is an error rate based on the output of a classifier, which provides a prediction output.
Gama also discloses an end prediction error at an end node among the input nodes.  An “end node” may be any node.  One could calculate these errors in any order they like, and the last node may be the “end node”, and thus the error is the “end prediction error”.  
Gama, Section 3.2, discloses:  “If the distribution of the examples that traverse a node is stationary, the error rate of naive-Bayes decreases. If there is a change on the distribution of the examples the naive-Bayes error will increase [17].”  Here, Gama discloses that the error rate increases or decreases, and thus it is being gradually updated, and so this algorithm is performed iteratively.  Thus, the second iteration may be produce a second error. Since the error calculation is the same calculation that produces the first error, this discloses second error based on first errors.
However, Gama does not teach corresponding error is minimal.
Woods teaches corresponding error is minimal.  (Woods, Section 3.2, Last Sentence of Paragraph 1, discloses:  “When the individual classifiers disagree, local accuracy is estimated for each classifier, and the decision of the classifier with the highest local accuracy estimate is selected.”  A “highest accuracy” also means “lowest error”.  Thus, Woods discloses specifies a classifier for which a corresponding first error among the first errors is minimal.)
This concept, when combined with Gama, results in calculating a second error for a node at an input node which is among the input nodes and for which a corresponding first error among the first errors is minimal.
The combination of Woods and Gama then teaches wherein the output-node specification processor circuitry makes a comparison in a magnitude relation for the end prediction error and the second error, and specifies, as the output node, the input node for which the corresponding first error is minimal when the second error is smaller than the end prediction error, otherwise, specifies, as the output node, the end node among the input nodes.  (As shown above, Gama discloses an end error, first error, and second error.  Woods discloses choosing the smallest error, which comprises makes a comparison in a magnitude relation (a complicated way of saying “comparing”).  Choosing the smallest error suggests specifies, as the output node, the input node for which the corresponding first error is minimal when the second error is smaller than the end prediction error, because if the second error is the second iteration of error corresponding to the given node, and that node’s first error is minimal, then the second error would also be minimal, which means less than all other nodes, including the end node.  When the end error is smaller than the second error, then it specifies, as the output node, the end node.  Again, this amounts to choosing the lowest error as suggested by Woods, since the end error which is smaller.)

As per Claim 5, the combination of Gama and Woods teaches the information processing device according to claim 1.  Gama teaches wherein the reliability index is generated for each of the input nodes under a predetermined condition by referring to a prediction output at the each input node or a node among the input nodes that is located on a layer among the layers that is lower than the each input node (Recall that Gama, Section 3.2, discloses:  “The UFFT algorithm maintains, at each node of all decision trees, a naive-Bayes classifier. Those classifiers were constructed using the sufficient statistics needed to evaluate the splitting criteria when that node was a leaf. When the leaf becomes a node the naive-Bayes classifier will classify the examples that traverse the node. The basic idea of the drift detection method is to control this online error-rate. If the distribution of the examples that traverse a node is stationary, the error rate of naive-Bayes decreases. If there is a change on the distribution of the examples the naive-Bayes error will increase.”  Here, Gama discloses a reliability index (“error-rate”) under a predetermined condition by referring to a prediction output (“If the distribution of the examples that traverse a node is stationary, the error rate of naive-Bayes decreases. If there is a change on the distribution of the examples the naive-Bayes error will increase”).  The “predetermined condition” could be anything, but may be interpreted as the condition that the prediction output be compared to the true output.  It is as the nodes are traversed:  “classify the examples that traverse the node”.  This is referring to a prediction output at the each input node or a node among the input nodes that is located on a layer among the layers that is lower than the each input node – in this case, at the each input node (“examples that traverse the node”))
and wherein the output-node specification processor circuitry is configured to specify the output node based on the reliability index having been generated for the each input node. (Gama, Section 3.1.4, discloses:  “To classify an unlabeled example, the example traverses the tree from the root to a leaf. It follows the path established, at each decision node, by the splitting test at the appropriate attribute-value. The leaf reached classifies the example.”  Here, Gama establishes an output node (“the leaf reached”) that is a basis of the generation of a prediction output (“classifies the examples”).  Thus, Gama discloses that the output node is a leaf node.  Gama, Section 3.2, discloses:  “If the distribution of the examples that traverse a node is stationary, the error rate of naive-Bayes decreases. If there is a change on the distribution of the examples the naive-Bayes error will increase [17]. When the system detect an increase of the naive-Bayes error in a given node, an indication of a change in the distribution of the examples, this suggest that the splitting-test that has been installed at this node is no longer appropriate. In such cases, all the subtree rooted at that node is pruned, and the node becomes a leaf.”  Here, Gama discloses that leaf nodes can be destroyed and created from internal nodes (input nodes) based on the reliability index (“error-rate”).  Since the output node must be a leaf node, then Gama discloses specifies an output node from the input nodes based on the reliability index generated for each input node.)

As per Claim 6, the combination of Gama and Woods teaches the information processing device according to claim 5.  Recall above that Gama Section 3 discloses learning a predetermined set of pieces of data for learning (“training”).  Recall also that Gama, Bottom of page 1361, discloses state spaces (“instance space”) which is also illustrated in GamaTutorial, and thus discloses learned data included in the state spaces. Recall that Gama, Section 3.1.5.1, discloses:  “Each tree in the forest makes a prediction”, and thus prediction output and therefore prediction output based on learned data included in the state spaces.  Recall that Gama, Section 3.2, discloses:  “The UFFT algorithm maintains, at each node of all decision trees, a naive-Bayes classifier. Those classifiers were constructed using the sufficient statistics needed to evaluate the splitting criteria when that node was a leaf. When the leaf becomes a node the naive-Bayes classifier will classify the examples that traverse the node. The basic idea of the drift detection method is to control this online error-rate. If the distribution of the examples that traverse a node is stationary, the error rate of naive-Bayes decreases. If there is a change on the distribution of the examples the naive-Bayes error will increase.”  Here, Gama discloses a tree, and thus that data corresponds to the corresponding input node.  Gama here also discloses that “error-rate” are each generated at a corresponding input node among the input nodes (“at each node of all decision trees”).  This is based on a difference between an output corresponding to the input data and a prediction output, as it is an error rate based on the output of a classifier, which provides a prediction output.
Gama also discloses first errors and third errors.  Gama’s algorithm traverses the tree, and thus on any tree with over 3 layers there are at least first errors and third errors.  The first error has a corresponding input node and if the third error is performed on a node 2 levels down from the first error node, then that discloses third errors each generated at a corresponding input node on a layer among the layers that is lower than the corresponding input node.
However, Gama does not teach corresponding error is minimal.
Woods teaches corresponding error is minimal.  (Woods, Section 3.2, Last Sentence of Paragraph 1, discloses:  “When the individual classifiers disagree, local accuracy is estimated for each classifier, and the decision of the classifier with the highest local accuracy estimate is selected.”  A “highest accuracy” also means “lowest error”.  Thus, Woods discloses specifies a classifier for which a corresponding first error among the first errors is minimal.)
This concept, when combined with Gama, results in specifies, as the output node, a node which is among the input nodes and for which a condition that the corresponding first error is smaller than the corresponding third error is satisfied, since in this case one is choosing the node with the smallest error.

As per Claim 7, the combination of Gama and Woods teaches the information processing device according to claim 5.  Recall above that Gama Section 3 discloses learning a predetermined set of pieces of data for learning (“training”). Recall also that Gama, Bottom of page 1361, discloses state spaces (“instance space”) which is also illustrated in GamaTutorial, and thus discloses learned data included in the state spaces. Recall that Gama, Section 3.1.5.1, discloses:  “Each tree in the forest makes a prediction”, and thus prediction output and therefore prediction output based on learned data included in the state spaces.  Recall that Gama, Section 3.2, discloses:  “The UFFT algorithm maintains, at each node of all decision trees, a naive-Bayes classifier. Those classifiers were constructed using the sufficient statistics needed to evaluate the splitting criteria when that node was a leaf. When the leaf becomes a node the naive-Bayes classifier will classify the examples that traverse the node. The basic idea of the drift detection method is to control this online error-rate. If the distribution of the examples that traverse a node is stationary, the error rate of naive-Bayes decreases. If there is a change on the distribution of the examples the naive-Bayes error will increase.”  Here, Gama discloses a tree, and thus that data corresponds to the corresponding input node.  Gama here also discloses that “error-rate” are each generated at a corresponding input node among the input nodes (“at each node of all decision trees”).  This is based on a difference between an output corresponding to the input data and a prediction output, as it is an error rate based on the output of a classifier, which provides a prediction output.
Gama also discloses first errors and fourth errors and fifth errors.  Gama’s algorithm traverses the tree, and thus on any tree with over 5 layers there are at least first errors and fourth errors and fifth errors.  The first error has a corresponding input node and if the fourth and fifth errors are performed on nodes 3 and 4 levels down from the first error node, then that discloses fourth errors each generated at a corresponding input node on a layer among the layers that is lower than the corresponding input node and fifth errors each generated at a corresponding input node on a layer among the layers that is lower than the corresponding input node.
However, Gama does not teach corresponding error is minimal.
Woods teaches corresponding error is minimal.  (Woods, Section 3.2, Last Sentence of Paragraph 1, discloses:  “When the individual classifiers disagree, local accuracy is estimated for each classifier, and the decision of the classifier with the highest local accuracy estimate is selected.”  A “highest accuracy” also means “lowest error”.  Thus, Woods discloses specifies a classifier for which a corresponding first error among the first errors is minimal.)
This concept, when combined with Gama, results in specifies, as a node of interest, a node which is among the input nodes and for which a condition that the corresponding fourth error is smaller than the corresponding fifth error is satisfied, since in this case one is choosing the node with the smallest error.  In this case that the fourth error is smaller than the fifth error, then one determines, as a node of interest, a node which is among the input nodes and for which a corresponding first error is smaller than any other first error among first errors at nodes that are among the input nodes and that are lower than or same as the node.  Again, choosing the smallest first error is suggested by Woods.  Otherwise, if the fourth error is not smaller than the fifth error, proceed to a node among the input nodes that is located on a lower layer among the layers until the condition in which the corresponding fourth error is smaller than the corresponding fifth error is satisfied.  Iterating until the fourth error is smaller, is also suggested by Woods, as we are looking for the smallest error.  When the condition of the fourth error being smaller than the fifth error is satisfied, the output-node specification processor circuitry is configured to specify the node of interest as the output node.  Again, here, we are choosing the smallest error.  While in contrast, when the condition is not satisfied, the output-node specification processor circuitry is configured to cause the comparison for the corresponding fourth error and the corresponding fifth error to sequentially proceed to a node among the input nodes that is located on a lower layer among the layers until the condition in which the corresponding fourth error is smaller than the corresponding Here, if the smallest error is not found, then it defaults to the end node.  Gama discloses this as they state in the Abstract:  “Naive-Bayes in leaves are used to classify test examples”, here Gama discloses the leaf, or end node, being used for the output.)

As per Claim 10, Claim 10 is an information processing device claim corresponding to information processing device claim 1.  The difference is a lack of an output node specification processor circuitry and a prediction output generation processor circuitry.  Another difference is the language that the reliability index is “gradually updated”.  Gama, Section 3.2, discloses:  “If the distribution of the examples that traverse a node is stationary, the error rate of naive-Bayes decreases. If there is a change on the distribution of the examples the naive-Bayes error will increase [17].”  Here, Gama discloses that the error rate increases or decreases, and thus it is being gradually updated.  Claim 10 is rejected for the same reasons as Claim 1.

As per Claim 11, Claim 11 is a method claim corresponding to information processing device Claim 1.  Claim 11 is rejected for the same reasons as Claim 1.

As per Claim 12, Claim 12 is a computer program claim corresponding to information processing device Claim 1.  The information processing device is inherently running a computer program.  Claim 12 is rejected for the same reasons as Claim 1.

As per Claim 13, Claim 13 is an IC chip claim corresponding to information processing device claim Claim 1.  The information processing device inherently comprises an IC chip, and an input terminal.  Claim 13 is rejected for the same reasons as Claim 1.

As per Claim 14, Claim 14 is a learned model claim corresponding to information processing device Claim 1.  The learned model is understood to be either computer hardware, which is an information processing device, or a computer program, which an information processing device is inherently running.  Claim 14 is rejected for the same reasons as Claim 1.

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gama as further graphically evidenced by GamaTutorial with Woods in view of Liu et. al. (“FP-ELM: An online sequential learning algorithm for dealing with concept drift”; hereinafter Liu).   
As per Claim 3, the combination of Gama and Woods teaches the information processing device according to claim 1.  Gama teaches first error having been already obtained through the learning a predetermined set of pieces of data for learning and prediction output based on the learned data included in the state spaces that corresponds to the corresponding input node.  Recall above that Gama Section 3.2 discloses a first error (“error-rate”) and Gama Section 3 discloses learning a predetermined set of pieces of data for learning (“training”).  Recall also that Gama, Bottom of page 1361, discloses state spaces (“instance space”) which is also illustrated in GamaTutorial.  Recall that Gama, Section 3.1.5.1, discloses:  “Each tree in the forest makes a prediction”, and thus prediction output.
However, Gama does not teach wherein the each first error is updated by performing a weighting addition using a forgetting coefficient a (0 < a < 1) on the each first error and an absolute value of a difference between the output corresponding to the input data and a prediction output.
Liu teaches wherein the each first error is updated by performing a weighting addition using a forgetting coefficient a (0 < a < 1) on the each first error and an absolute value of a difference between the output corresponding to the input data and a prediction output.  (Liu, Pg 323, Right Column above Eq 5, discloses:  “ELM is to minimize the training error 
    PNG
    media_image3.png
    19
    53
    media_image3.png
    Greyscale
”.  Liu, Pg 324 Top right, discloses: “To pay more attention to the new data chunk, we give a forgetting parameter α1 (0<α1<1) to the old data chunkא0. Formally, the new output weight matrix β(1) is the solution to minimize

    PNG
    media_image4.png
    39
    385
    media_image4.png
    Greyscale
”
Here, L1 is a first error as it is to be “minimized”. It comprises a difference between the output corresponding to the input data and a prediction output
    PNG
    media_image3.png
    19
    53
    media_image3.png
    Greyscale
.  Note that the fact that the 
    PNG
    media_image5.png
    25
    70
    media_image5.png
    Greyscale
 is squared, makes the sign irrelevant, as x^2 = (-x)^2 = |x|^2, where |x| is the absolute value of x.  Thus ||H1B – T1||^2 = ||  |(H1B-T1)|  ||^2.  Therefore, here is disclosed an absolute value of a difference between the output corresponding to the input data and a prediction output.  Also note that alpha1 has been described as a forgetting coefficient (“forgetting parameter”) and (0<α1<1).  Finally, note that the loss function, first error L, is updated by performing a weighting addition, as 
    PNG
    media_image6.png
    22
    165
    media_image6.png
    Greyscale
 is an addition operation, and the first term in this addition is weighted by the forgetting coefficient alpha1.)
Liu and the combination of Gama and Woods are analogous art because they are both in the field of endeavor of machine learning.
Therefore, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the invention, to combine the decision tree with a classifier at each node of Gama and Woods, with the forgetting factor of Liu.  One would have been motivated to do so in order to maintain accuracy of the predictor by reducing the importance of old data that may adversely impact results (Liu, Section 3:  “However, in many real applications, especially in non-stationary or concept drifting environments, the data in stream may have obvious timeliness. That is, the newly arrived data tend to reflect the current patterns in the data, and this is pivotal in learner training. The earlier data may play a minor role, and some data that are too old or are even totally out-dated only bring errors to the learner. Under these conditions, data chunks arriving at different periods have a different importance in training, but OS-ELM fails to reflect this. To deal with this problem, we propose the forgetting parameters extreme learning machine (FP-ELM).”

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gama as further graphically evidenced by GamaTutorial with Woods in view of Pyeatt et. al. (“Decision Tree Function Approximation in Reinforcement Learning”; hereinafter Pyeatt).   
As per Claim 8, the combination of Gama and Woods teaches the information processing device according to claim 1.  Gama teaches a reliable node specification processor circuitry and reliability index acquired by the reliability index acquisition processor circuitry and reliability from among the input nodes corresponding to the input node (Recall that Gama, Section 3.2, discloses:  “The UFFT algorithm maintains, at each node of all decision trees, a naive-Bayes classifier. Those classifiers were constructed using the sufficient statistics needed to evaluate the splitting criteria when that node was a leaf. When the leaf becomes a node the naive-Bayes classifier will classify the examples that traverse the node. The basic idea of the drift detection method is to control this online error-rate. If the distribution of the examples that traverse a node is stationary, the error rate of naive-Bayes decreases. If there is a change on the distribution of the examples the naive-Bayes error will increase.”  Here, Gama discloses an “error-rate”, and thus a reliable node specification processor circuitry and reliability index acquired by the reliability index acquisition processor circuitry.  Gama also here discloses reliability from among the input nodes corresponding to the input node (“classify the examples that traverse the node”)).
However, Gama does not teach selects a highly reliable node having highest reliability from among the input nodes corresponding to the input node
Woods teaches selects a highly reliable node having highest reliability from among the input nodes corresponding to the input node (Woods, Section 3.2, Last Sentence of Paragraph 1, discloses:  “When the individual classifiers disagree, local accuracy is estimated for each classifier, and the decision of the classifier with the highest local accuracy estimate is selected.”  A “highest accuracy” also means “lowest error”.  Thus, Woods discloses selects a )
Gama teaches a calculation possibility determination processor circuitry configured to determine whether or not a node among the input nodes that is located on a layer among the layers that is one layer lower than the highly reliable node is a node for which appropriate calculation is possible (Recall that Gama discloses a classifier in each node in Gama, Abstract: “Decision nodes and leaves contain naive-Bayes classifiers playing different roles during the induction process. Naive-Bayes in leaves are used to classify test examples. Naive-Bayes in inner nodes play two different roles. They can be used as multivariate splitting-tests if chosen by the splitting criteria, and used to detect changes in the class-distribution of the examples that traverse the node.”  This is on multiple layers (“decision nodes and leaves”), which includes a node among the input nodes that is located on a layer among the layers that is one layer lower than the highly reliable node is a node for which appropriate calculation is possible.  An “appropriate calculation” is not a specific term and may be very broadly interpreted.  Examiner is interpreting this “appropriate calculation is possible” as “a calculation that results in lower reliability (higher error)”.  If this is not the result (reliability is higher), then obviously it is not possible that the calculation results in lower reliability.)
However, Gama does not teach and a selective output-node specification processor circuitry configured to specify the highly reliable node as the output node that is the basis of the generation of the prediction output when the node that is located on the layer one layer lower than the highly reliable node is the node for which the appropriate calculation is possible, and specifies the node that is located on the layer one layer lower than the highly reliable node 
Woods teaches and a selective output-node specification processor circuitry configured to specify the highly reliable node as the output node that is the basis of the generation of the prediction output when the node that is located on the layer one layer lower than the highly reliable node is the node for which the appropriate calculation is possible, and specifies the node that is located on the layer one layer lower than the highly reliable node as the output node that is the basis of the generation of the prediction output when, in contrast, the node that is located on the layer one layer lower than the highly reliable node is not the node for which the appropriate calculation is possible.  (Recall above, that the “appropriate calculation” was interpreted by Examiner as where the node that is located on the layer one layer lower than the highly reliable node had lower reliability.  Also recall that Gama teaches nodes and layers.  Recall that Woods, Section 3.2, Last Sentence of Paragraph 1, discloses:  “When the individual classifiers disagree, local accuracy is estimated for each classifier, and the decision of the classifier with the highest local accuracy estimate is selected.”  A “highest accuracy” also means “lowest error”.  Thus, if the node that is located on the layer one layer lower than the highly reliable node, is the node for which the appropriate calculation is possible, then that node one layer lower is not as reliable as the highly reliable node.  Therefore, the highest accuracy node, as suggested by Woods is the highly reliable node, and thus is disclosed that specifies the highly reliable node as the output node that is the basis of the generation of the prediction output.  Otherwise, the node that is located on the layer one layer lower than the  is more reliable, and thus in this case the node that is located on the layer one layer lower than the highly reliable node is not the node for which the appropriate calculation is possible and Woods would choose the lower node as the highest accuracy node, and thus is disclosed specifies the node that is located on the layer one layer lower than the highly reliable node as the output node that is the basis of the generation of the prediction output.)
However, the combination of Gama and Woods thus far fails to teach wherein, when a number of times of learnings of a respective node of the input nodes is larger than or equal to a predetermined number, the reliability index of the node is calculated using a past reliability index of the respective node, and when a number of times of learnings of the respective node is smaller than the predetermined number, the reliability index of the respective node is a defined value; wherein when the number of times of learnings corresponding to the node that is located on the layer one layer lower than the highly reliable node is larger than or equal to the predetermined number, it is determined that the appropriate calculation is possible, and when the number of times of learnings corresponding to the node that is located on the layer one layer lower than the highly reliable node is smaller than the predetermined number, it is determined that the appropriate calculation is not possible.
Pyeatt teaches wherein, when a number of times of learnings of a respective node of the input nodes is larger than or equal to a predetermined number, the reliability index of the node is calculated using a past reliability index of the respective node, and when a number of times of learnings of the respective node is smaller than the predetermined number, the reliability index of the respective node is a defined value (Recall that Gama teaches a reliability index.  Pyeatt, Page 4 Figure 3, discloses:
    PNG
    media_image7.png
    950
    687
    media_image7.png
    Greyscale

Here, Pyeatt discloses when a number of times of learnings of the respective node (“history_list_length”) is smaller than the predetermined number (“history_list_min_size”), the reliability index of the respective node is a defined value (the reliability of the node is such that “split := False”).  Pyeatt also teaches when a number of times of learnings of a respective node of the input nodes is larger than or equal to a predetermined number (“else”), the reliability index of the node is calculated using a past reliability index of the respective node (“calculate average…and standard deviation…of the history list” is used to determine if the node’s reliability is such that it should be split).  Thus, as the history of the node is being used, the reliability index (which determines whether or not the node should be split) is calculated using a past (“history”) reliability index.)
The combination of Gama with Pyeatt further teaches wherein when the number of times of learnings corresponding to the node that is located on the layer one layer lower than the highly reliable node is larger than or equal to the predetermined number, it is determined that the appropriate calculation is possible, and when the number of times of learnings corresponding to the node that is located on the layer one layer lower than the highly reliable node is smaller than the predetermined number, it is determined that the appropriate calculation is not possible.  (Recall above that the combination of Gama and Woods teaches a highly reliable node.  Also recall that Gama discloses a classifier in each node in Gama, Abstract: “Decision nodes and leaves contain naive-Bayes classifiers playing different roles during the induction process. Naive-Bayes in leaves are used to classify test examples. Naive-Bayes in inner nodes play two different roles. They can be used as multivariate splitting-tests if chosen by the splitting criteria, and used to detect changes in the class-distribution of the examples that traverse the node.”  This is on multiple layers (“decision nodes and leaves”), which includes node that is located on the layer one layer lower than the highly reliable node.  When combined with Pyeatt, Pyeatt discloses when a number of times of learnings (“history_list_length”) is smaller than the predetermined number (“history_list_min_size”), the appropriate calculation (“calculate average…and standard deviation…of the history list”) is not possible.  Pyeatt also teaches when a number of times of learnings is larger than or equal to a predetermined number (“else”), the appropriate calculation (“calculate average…and standard deviation…of the history list”) is possible.  When combined with Gama and Woods, this may be performed on a node one layer lower than the highly reliable node.
Pyeatt and the combination of Gama and Woods are analogous art because they are both in the field of endeavor of machine learning and decision trees.
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Pyeatt and the combination of Gama and Woods.  One of ordinary skill in the art would be motivated to do so in order to optimize the results of the decision tree, and allow it to learn in a continuous state space (Pyeatt, Page 4: “We use Q-learning, so each leaf node stores one value for each possible action that can be taken, along with a history of the inputs and rewards that have been received. The history list is used to decide whether the region represented by the node should be split.” And Pyeatt Page 5 “T statistic”:  “Our approach is based on the t-statistic. The algorithm calculates the means and variances for each input variable. If the node has not received any positive 
    PNG
    media_image8.png
    40
    178
    media_image8.png
    Greyscale
 in its history list, then the input variable with the highest variance is chosen as the decision variable for the new 

As per Claim 9, the combination of Gama, Woods, and Pyeatt teaches the information processing device according to claim 8.  Pyeatt teaches wherein the predetermined number is two.  (Pyeatt, Page 4 Figure 3, discloses “history_list_min_size”.  One of ordinary skill in the art will appreciate that this may be any whole number, including two.)

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/L.A.S./Examiner, Art Unit 2126                                                                                                                                                                                                        
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126