Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 20210-10-13 has been entered.  Applicant’s amendments to the specification and claims have overcome each and every objection and 112(b) rejection previously set forth in the Non Final Office Action mailed 2021-08-11.  The status of the claims is as follows:
Claim 2 is cancelled.
Claims 1 and 3-15 remain pending in the application.
Claims 1 and 15 have been amended.
Response to Arguments
Applicant's arguments in response to rejections under 35 USC 103 have been fully considered, and the amended matter of replacing “accessing” with “reading” overcomes the previously applied prior art.  Examiner has applied the new reference Saqib et. al. as necessitated by the amendment.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the 

Claims 1, 7, 8, 11, 13, 14, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Saqib et. al. (“Pipelined Decision Tree Classification Accelerator Implementation in FPGA (DT-CAIF)”; hereinafter Saqib) in view of Nishiyama (US 2011/0178976 A1).
As per Claim 1, Saqib teaches a learning device configured to perform learning of a decision tree based on learning data (Saqib, Section 3, discloses:  “In this paper we propose an efficient pipeline based implementation of a decision tree classification algorithm. The hardware accelerator for decision tree classification performs parallel operations using concurrent engines, where each engine implements pipeline technique and thus fetches data records in every cycle, enhancing the performance of classification process.”  Here, Saqib discloses a learning device (“hardware accelerator”) to perform learning of a decision tree (“implementation of a decision tree classification algorithm”) based on learning data (“data records”)).
circuitry configured to implement (Saqib, Section 3, discloses:  “In this paper we propose an efficient pipeline based implementation of a decision tree classification algorithm. The hardware accelerator for decision tree classification performs parallel operations using concurrent engines, where each engine implements pipeline technique and thus fetches data records in every cycle, enhancing the performance of classification process.”  Here, Saqib discloses circuitry (“hardware accelerator”)).
a data memory configured to store the learning data and including two or more ports for reading the learning data (Saqib, Section 3 Para 6, discloses:  “Fig. 3 represents a decision tree with depth of n, having n stages from which the data passes through, and then the classification is stored in the output block memory. The unclassified data is provided by the double-buffered input block RAM to the first stage of the engine, from where it is processed and propagated down the pipeline. The classifications for each tuple, are stored in the double buffered output block RAM. The Xilinx Logicore IP Block Memory Generator has been used in order to implement the input and output memories. Where, block memory generator uses embedded block memory primitives in Xilinx FPGAs to implement memories of different depths and widths. Our proposed design implemented on Digilent Nexys2 Spartan 3 E board uses two fully independent ports each with its own read and write interfaces and access to a shared memory space. These ports can operate at different clock frequencies thus making it possible for the classification subsystem to operate at double the frequency of the on-board system clock.”  Here, Saqib discloses a memory which stores learning data (“unclassified data”) and includes two or more ports for reading the learning data, as they recite “two fully independent ports each with its own read and write interfaces”, and thus both ports can read the learning data.)
a learning unit configured to read out feature amounts of the learning data from the data memory and derive a branch condition for a node of the decision tree based on the feature amounts, to perform learning of the decision tree (Saqib, Intro Para 3, discloses:  “The initial step is induction which involves construction of the decision tree model, where internal nodes and leaves constitute a decision tree model. Each internal node has a characteristic splitting decision and splitting attribute, while the leaves have particular category classification. Construction of a decision tree model from a training dataset/tuple constitutes of two phases. A splitting attribute and a split index are chosen by the model during the first phase.”  Here, Saqib discloses deriving a branch condition (“splitting decision and splitting attribute”).  This is done based on feature amounts (“training dataset/tuple”), and these feature amounts must be read from somewhere, and Saqib Section 3 Para 6 also states:  “The unclassified data is provided by the double-buffered input block RAM to the first stage of the engine, from where it is processed and propagated down the pipeline.”  Thus, a portion of Saqib’s implementation that performs this reading out and deriving a branch condition is a “learning unit”).
	However, Saqib does not explicitly teach a discriminator configured to perform determining, in accordance with the branch condition, a node to which learning data read out from the data memory is to be branched from the node corresponding to the branch condition; a data memory configured to store the learning data and including two or more ports for reading the learning data
	Nishiyama teaches a discriminator configured to perform determining, in accordance with the branch condition, a node to which learning data read out from the data memory is to be branched from the node corresponding to the branch condition. (Nishiyama, Para [0048], discloses:  “The discrimination unit 101 reads the branch condition of the nonterminal node and compares the inputted data with the branch condition (ST 210), and the discrimination unit 101 selects the node to be moved to next from among the child nodes. In the decision tree according to this embodiment, coordinates a and b and the two indexes of the child nodes if luminance of a is larger than that of b, or not, are written as a branch condition. Therefore, the discrimination unit 101 extracts luminance of a and b in the image data, compares luminance of a and b, and makes a choice between two child nodes”.  Here, Nishiyama discloses a discriminator (“discrimination unit”) configured to, in accordance with the branch condition (“reads the branch condition of the nonterminal node and compares the inputted data with the branch condition”), determining a node (“and makes a choice between two child nodes”) to which learning data (“inputted data” from which the tree is learned) is to be branched from the node corresponding to the branch condition (“reads the branch condition of the nonterminal node and compares the inputted data with the branch condition”).  Recall that in the 112f analysis, the “discriminator” is being interpreted as hardware.  Nishiyama, Fig.1, discloses:

    PNG
    media_image1.png
    359
    728
    media_image1.png
    Greyscale

And in Abstract, discloses: “An apparatus for discrimination includes a memory, an alignment unit configured to align nodes of a decision tree in the memory”.  So, Nishiyama discloses an “apparatus” that is communicating with a “memory”, and thus Nishiyama is describing hardware.)
	Saqib and Nishiyama are analogous art because they are both in the field of endeavor of decision tree hardware.

	Saqib further teaches wherein the learning unit is configured to, in parallel with [processing of the discriminator] reading out learning data at a specific node from the data memory via a first port of the two or more ports and performing the determining, read out, from the data memory via a second port other than the first port, learning data at a node [on which the discriminator is configured] to perform determining subsequent to the specific node (Recall that Nishiyama teaches a discriminator.  Saqib, Section 3 Para 6, as shown above, discloses:  “Our proposed design implemented on Digilent Nexys2 Spartan 3 E board uses two fully independent ports each with its own read and write interfaces and access to a shared memory space. These ports can operate at different clock frequencies thus making it possible for the classification subsystem to operate at double the frequency of the on-board system clock.”  Here, Saqib indicates that two ports can perform reading.  Saqib Section 3 Para 11 continues:  “In order to increase the efficiency of the engine it has been made parallel. Fig. 5 shows the overall pipelined and parallel architecture where the decision tree subsystem is instantiated eight times thus facilitating computation of eight classification result every clock cycle. After the initial latency, equivalent to the number of levels in the tree, 8 tuples of the dataset are categorized every clock cycle. Our tested design of the proposed architecture allows a depth of up to 13 levels, therefore the maximum latency for this design is 13. The address management for writing to the double-buffered input block RAM and reading from the double-buffered output block RAM has been done in such a way that eight consecutive tuples can be read and classified in every clock cycle. The double-buffered input and output RAMs are designed to allow for simultaneous buffering and processing. The operations of each RAM are switched after the given batch of data records are processed by the classification subsystem.”  Here, Saqib discloses parallelism and that “eight consecutive tuples can be read and classified in every clock cycle”.  Finally, Saqib lists their enhancements: “Following are the enhancements in our proposed architecture where we utilize the hardware pipelines and parallelism to overcome the above mentioned limitations:
Engine is made of pipelined stages, each stage implements rules of one level of the tree.
Pipeline to make use of processing cycles when data is written in memory, thus to increase the performance. 
Engine works on clock frequency double to that of the interface clock. iv.         Multiple data records are read as well as written simultaneously in every cycle, exhibiting parallelism, thus reducing the overall clock cycles.”
Here, Saqib discloses “each stage implements rules of one level of the tree” and “Multiple data records are read as well as written simultaneously in every cycle”.  Thus, in combination with the discriminator of Nishiyama, the combination of Saqib and Nishiyama suggests reading out learning data at a specific node from the data memory via a first port of the two or more ports and performing the determining (a node at one level) in parallel with reading out, from the data memory via a second port other than the first port, learning data at a node on which the discriminator is configured to perform determining subsequent to the specific node and derive the branch condition (a node at another level)).

As per Claim 7, the combination of Saqib and Nishiyama teaches the learning device according to claim 1.  Saqib teaches the learning unit is configured to read out at least a plurality of feature amounts included in the learning data from the data memory by one access, and derive the branch condition based on the feature amounts.  (Saqib, Section 3 Para 6, discloses “The unclassified data is provided by the double-buffered input block RAM to the first stage of the engine, from where it is processed and propagated down the pipeline.”  Here, a plurality of feature amounts (“unclassified data”) is read out from the memory (“RAM”) and the branch condition is derived (“processed and propagated down the pipeline”, which includes as in Intro Para 3 “A splitting attribute and a split index are chosen by the model during the first phase” wherein a splitting attribute is a branch condition).

As per Claim 8, the combination of Saqib and Nishiyama teaches the learning device according to claim 1 and a discriminator (see Rejection to Claim 1).  
Saqib teaches wherein [the discriminator is configured to] read out label information of the learning data together with the feature amounts of the learning data from the data memory. (Recall that Nishiyama teaches a discriminator.  Saqib, end of Intro Para 4, discloses:  “The characteristic class label to the leaf is then assigned to the incoming tuple”, and Section 3 end of Para 4 discloses:  “The output of the coefficient memory contains all the information needed to perform the operation associated with the node in the tree being addressed.”  Since the characteristic class label is part of the information needed, Saqib discloses reading out label information of the learning data from the memory.)

As per Claim 11, the combination of Saqib and Nishiyama teaches the learning device according to claim 1 as well as a discriminator and a learning unit (see Rejection to Claim 1). Nishiyama teaches wherein the discriminator is configured to share a configuration of performing determining operation on the learning data at a time of learning [by the learning unit] to perform a discriminating operation on discrimination data.  (Recall that Saqib teaches a learning unit.  Nishiyama, Para [0048], discloses:  “The discrimination unit 101 reads the branch condition of the nonterminal node and compares the inputted data with the branch condition (ST 210), and the discrimination unit 101 selects the node to be moved to next from among the child nodes. In the decision tree according to this embodiment, coordinates a and b and the two indexes of the child nodes if luminance of a is larger than that of b, or not, are written as a branch condition. Therefore, the discrimination unit 101 extracts luminance of a and b in the image data, compares luminance of a and b, and makes a choice between two child nodes”.  Here, Nishiyama discloses a discriminator (“discrimination unit”) configured to, in accordance with the branch condition (“reads the branch condition of the nonterminal node and compares the inputted data with the branch condition”), determining a node (“and makes a choice between two child nodes”) to which learning data (“inputted data” from which the tree is learned) is to be branched from the node corresponding to the branch condition (“reads the branch condition of the nonterminal node and compares the inputted data with the branch condition”).  
Here, determining a node corresponding to the branch condition is a determining operation.  This is performed on the learning data (“inputted data”).  A “configuration” may be broadly interpreted simply as the result of the operation.  Thus, by producing the determining result, one is sharing a configuration of performing determining operation on the learning data.  This is done during a learning operation of a decision tree, and is thus at a time of learning by the learning unit, as in the combination of Saqib and Nishiyama, the learning and discrimination units are working together over a period of time.  This determining operation is done in order to perform a discriminating operation on discrimination data, since the “inputted data” is also discrimination data, as it is the data being branched, and the “determining operation” is, in fact, a “discriminating operation”.  In other words, saying “performing a determining operation….to perform a discriminating operation” is analogous to saying “performing a talking operation…to perform a speaking operation”.)

As per Claim 13, the combination of Saqib and Nishiyama teaches the learning device according to claim 1 and a learning unit (see Rejection to Claim 1).  Saqib teaches wherein the learning unit is configured to perform learning using part of all the feature amounts of the learning data.  (Saqib, Intro Para 3, discloses:  “The initial step is induction which involves construction of the decision tree model, where internal nodes and leaves constitute a decision tree model. Each internal node has a characteristic splitting decision and splitting attribute, while the leaves have particular category classification. Construction of a decision tree model from a training dataset/tuple constitutes of two phases. A splitting attribute and a split index are chosen by the model during the first phase.”  Here, Saqib discloses deriving a branch condition based on feature amounts (“training dataset/tuple”).  Saqib is using some feature amounts, and is therefore using at least a part of the feature amounts, and therefore part of all the feature amounts.)

As per Claim 14, the combination of Saqib and Nishiyama teaches the learning device according to claim 1 and a learning unit (see Rejection to Claim 1).  Saqib teaches wherein the learning unit is configured to perform learning using part of all the feature amounts of the learning data.  (Saqib, Intro Para 3, discloses:  “The initial step is induction which involves construction of the decision tree model, where internal nodes and leaves constitute a decision tree model. Each internal node has a characteristic splitting decision and splitting attribute, while the leaves have particular category classification. Construction of a decision tree model from a training dataset/tuple constitutes of two phases. A splitting attribute and a split index are chosen by the model during the first phase.”  Here, Saqib discloses deriving a branch condition based on learning data (“training dataset/tuple”).  Saqib is using some feature amounts, and is therefore using at least a part of the feature amounts, and therefore part of all the feature amounts. The language “part of all the pieces of the learning data” adds no new limits to the claim, as “part of all” is simply still a “part”.  Saqib is using learning data, and thus pieces of the learning data, and is thus using at least a part of the pieces of the learning data, and therefore part of all pieces of the learning data.)

	Claim 15 is a method claim corresponding to device Claim 1.  Claim 15 is rejected for the same reasons as Claim 1.

Claims 3, 9, and 10 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Saqib and Nishiyama in view of Chen et. al. (“XGBoost: A Scalable Tree Boosting System”; hereinafter Chen).
As per Claim 3, the combination of Saqib and Nishiyama teaches the learning device according to claim 1 as well as derivation of branch conditions performed by the learning unit and determining performed by the discrimination unit (see Rejection to Claim 1). 
However, the combination of Saqib and Nishiyama thus far fails to teach wherein order of nodes as processing targets unit is order in which numbers of pieces of the learning data of adjacent nodes are close to each other.
	Chen teaches wherein order of nodes as processing targets unit discriminator is order corresponding to descending order or ascending order of numbers of pieces of the learning data of the nodes. (Chen, Section 4.1, discloses:  “The most time consuming part of tree learning is to get the data into sorted order. In order to reduce the cost of sorting, we propose to store the data in in-memory units, which we called block. Data in each block is stored in the compressed column (CSC) format, with each column sorted by the corresponding feature value. This input data layout only needs to be computed once before training, and can be reused in later iterations.”  Here, Chen discloses order of the learning data (“data into sorted order”).  Chen, Algorithm 3, discloses: 

    PNG
    media_image2.png
    419
    623
    media_image2.png
    Greyscale

Here, Chen discloses order corresponding to descending order or ascending order (“Ascending”) of numbers of pieces of the learning data (“xjk” – recall above “sorted by the corresponding feature value”) of the nodes (Ik).  Here, “feature values” are numbers of pieces of the learning data, wherein “number” in this sense is not a numerical value, but rather used in its demonstrative form (i.e., “do a number of things”), wherein “numbers of pieces of the learning data” is interpreted as “some pieces of the learning data”.  The nodes are processing targets, as Mitchell’s sorting step is the step of an algorithm (“If the sorting version of the algorithm is used”).  Mitchell is also in the field of decision trees, and thus the “nodes” are decision tree nodes and this suggests the processing performed by the learning unit of Saqib (Mitchell Pg 29 Phase 2: “Once the best splits for each node have been calculated”) and discriminator of Nishiyama (Mitchell Pg 29 Phase 2: “First we update the node ID map in the missing direction. All instances residing in node 1 are updated in the right direction to node 4. Instances residing in node 2 are updated in the left direction to node 5. The node ID map now looks like Table 16”.  Finally, note that the order of the nodes is interpreted in the possessive form, wherein rather than “the order in which nodes are processed”, order of the nodes is interpreted as “an order, which is a property of each node”, as the instant claim states that the “learning data” is “of the nodes”.  Thus, the order of the learning data (“each feature value…is sorted”), is an order of the node.  The “order of nodes as processing targets”, is interpreted as a tautology (i.e., the nodes are entities that are indeed processing targets), and this does not necessarily mean “the order in which nodes are processed by the algorithm”)).
	Saqib, Nishiyama, and Chen are analogous art because they are all in the field of endeavor of decision trees.
	It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the decision tree hardware implementation of Saqib and Nishiyama with the sorting of feature values of Chen. The modification would have been obvious because one of ordinary skill in the art would be motivated to increase the efficiency of the algorithm (Chen, Section 3.1: “It is computationally demanding to enumerate all the possible splits for continuous features. In order to do so efficiently, the algorithm must first sort the data according to feature values and visit the data in sorted order to accumulate the gradient statistics for the structure score in Eq (7).”)

As per Claim 9, the combination of Saqib and Nishiyama teaches the learning device according to claim 1.  However, the combination of Saqib and Nishiyama fails to teach 
Chen teaches wherein the learning device is configured to perform the learning of the decision tree by gradient boosting (Chen, Abstract, discloses: “Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges”, and in Intro Para 2:  “Among the machine learning methods used in practice, gradient tree boosting [10]1 is one technique that shines in many applications.”)
Saqib, Nishiyama, and Chen are analogous art because they are all in the field of endeavor of decision trees.
	It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the decision tree hardware implementation of Saqib and Nishiyama with the XGBoost algorithm of Chen. The modification would have been obvious because one of ordinary skill in the art would be motivated to save on resources (Chen, Abstract: “By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems”)

As per Claim 10, the combination of Saqib, Nishiyama, and Chen teaches the learning device according to claim 1 as well as learning unit, learning data, data memory, and discriminator (see Rejection to Claim 1).  Chen teaches calculate a first-order gradient and a second-order gradient corresponding to an error function of each piece of the learning data, (Chen, Section 2.2, discloses: 

    PNG
    media_image3.png
    454
    398
    media_image3.png
    Greyscale

Here, Chen discloses calculate a first-order gradient and a second-order gradient (“first and second order gradient statistics”) corresponding to an error function (“on the loss function”).  Chen, Section 2.1, discloses:  “

    PNG
    media_image4.png
    304
    384
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    165
    391
    media_image5.png
    Greyscale

Here, Chen discloses that these calculations are done on for each piece of the learning data (“for a given data set”), as above in the equation with gradients, the equation is summed over data with summation symbol Sigma over data set xi:

    PNG
    media_image6.png
    55
    360
    media_image6.png
    Greyscale

Chen also discloses calculate leaf weight (“Each fk corresponds to an independent tree structure q and leaf weights w.”)  As these values are used and accessed during the execution of an iterative algorithm, the values must be stored in memory, and thus Chen discloses calculate a first-order gradient and a second-order gradient corresponding to an error function .  Chen, Abstract, discloses:  “Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges.”  Here, Chen discloses so that learning of a next decision tree is performed by the gradient boosting.)

Claims 4 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Saqib and Nishiyama in view of Zhang et. al. (“A Splitting Criteria Based on Similarity in Decision Tree Learning”; hereinafter Zhang).
As per Claim 4, the combination of Saqib and Nishiyama teaches the learning device according to claim 1 as well as a discriminator to perform branching (see Rejection to Claim 1). 
However, the combination of Saqib and Nishiyama thus far fails to teach in a case in which a number of pieces of the learning data at a node is equal to or smaller than a predetermined number, regard the node as a leaf and do not perform branching after the node.
Zhang teaches in a case in which a number of pieces of the learning data at a node is equal to or smaller than a predetermined number, regard the node as a leaf and do not perform branching after the node. (Zhang, pg 1778 Section E, discloses:  “Usually, there are noise datas in training set, which may lead to overfitting, then noise branch will be generated. We take the splitting threshold method to eliminate overfitting. Two thresholds are adopted in the pruning methodology, one is the similarity of subset named r1, and the other threshold is the size of subset named r2, if the similarity of subset is greater than r1 or the size of the subset is less than r2, then stop splitting, and label the node as leaf node”.  Here, Zhang discloses in a case in which a number of pieces (“size of the subset”) of the learning data (“training set”) at a node (“node”) is equal to or smaller than a predetermined number (“is less than r2”, i.e. equal to or smaller than r2 – 1), regard the node as a leaf and do not perform branching after the node (“then stop splitting, and label the node as leaf node”)).
Saqib, Nishiyama, and Zhang are analogous art because they are all in the field of endeavor of decision trees.
	It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the decision tree FPGA implementation of Saqib and Nishiyama with the leaf threshold of Zhang. The modification would have been obvious because one of ordinary skill in the art would be motivated to avoid overfitting (Zhang, pg 1778 Section E:  “Usually, there are noise datas in training set, which may lead to overfitting, then noise branch will be generated. We take the splitting threshold method to eliminate overfitting”).

As per Claim 12, the combination of Saqib and Nishiyama teaches the learning device according to claim 1.  Saqib teaches a model memory configured to store the branch condition for the node of the decision tree (Saqib, Intro Para 3, discloses:  “A splitting attribute and a split index are chosen by the model during the first phase.”  Here, Saqib discloses a branch condition (“splitting attribute”).  Saqib, Section 3 Para 5, further discloses:  “The decision tree classification engine has three major parts: a) the double-buffered input block RAM b) the decision tree classification subsystem, and c) the double-buffered output block RAM. The decision logic reads the incoming data and takes the rules from its associated coefficient memory, processes them and forwards the data to the next stage with the processed results.”  Here, Saqib also discloses that values are stored in memory, specifically data that is forwarded to the next stage, including “rules” as explained in Intro Para 4 (“The prediction in the classification process commences at the root, and a path to a leaf is followed by using the decision rules governed at each internal node.”  Thus the branch condition is stored in memory.)
	However, Saqib does not explicitly teach in a case in which the node as a learning target is to be further branched, write flag information indicating that the node is to be further branched, into the model memory together with the branch condition, and the learning unit is configured to, in a case in which the node as a learning target is not to be further branched, write flag information indicating that the node is not to be further branched, into the model memory.
	Zhang teaches in a case in which the node as a learning target is to be further branched, write flag information indicating that the node is to be further branched, into the model memory together with the branch condition, and the learning unit is configured to, in a case in which the node as a learning target is not to be further branched, write flag information indicating that the node is not to be further branched, into the model memory.  (Zhang, pg 1778 Section E, discloses:  “Usually, there are noise datas in training set, which may lead to overfitting, then noise branch will be generated. We take the splitting threshold method to eliminate overfitting. Two thresholds are adopted in the pruning methodology, one is the similarity of subset named r1, and the other threshold is the size of subset named r2, if the similarity of subset is greater than r1 or the size of the subset is less than r2, then stop splitting, and label the node as leaf node”.  Here, Zhang discloses, in a case in which the node as a learning target is not to be further branched (“then stop splitting”), write flag information indicating that the node is not to be further branched (“label the node as leaf node”).  Note that a “label” may be considered a flag.  The memory used to store this label is effectively a leaf flag.  The process of initializing this label to “false” or “0” then amounts to in a case in which the node as a learning target is to be further branched, writing flag information indicating that the node is to be further branched.  This label must be stored in memory, and thus Zhang discloses in a case in which the node as a learning target is to be further branched, write flag information indicating that the node is to be further branched, into the model memory together with the branch condition, and the learning unit is configured to, in a case in which the node as a learning target is not to be further branched, write flag information indicating that the node is not to be further branched, into the model memory when combined with learning unit and model memory of Saqib.)

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Saqib and Nishiyama in view of Minzoni et. al. (US 2009/0097348 A1; hereinafter Minzoni) and Parlante (“Pointers and Memory”).
As per Claim 5, the combination of Saqib and Nishiyama teaches the learning device according to claim 1 as well as learning unit, learning data, discrimination unit, and branching at a node (see Rejection to Claim 1).  Saqib teaches a hierarchical level of nodes as learning (Saqib, in the enhancements section under Fig. 5, discloses:  “Engine is made of pipelined stages, each stage implements rules of one level of the tree.”  Also in Intro para 3:  “This repetitive process is continued till the depth of the tree reaches a desired level.”  Thus, Saqib discloses a hierarchical level of nodes as learning targets is switched.)
However, the combination of Saqib and Nishiyama thus far fails to explicitly teach wherein the data memory includes at least two bank regions for storing addresses of the learning data, the at least two bank regions are switched between a bank region for reading-out and a bank region for writing every time a hierarchical level of nodes as learning targets is switched, the learning unit is configured to read out an address of learning data branched at a node from the bank region for reading-out, and reads out the learning data from a region of the data memory indicated by the address, and the discriminator is configured to write the address of the learning data branched at the node into the bank region for writing.
Minzoni teaches wherein the data memory includes at least two bank regions for storing [addresses of] the [learning] data (Minzoni, Para [0057], discloses:  “According to another embodiment a memory module includes an even number of at least four memory banks, each memory bank having a plurality of memory cells, each two of the memory banks forming a memory bank region and being alternately connected to an 8-bit data bus. The memory banks are classified into two groups, each including a memory bank of each memory bank region. The memory module further includes a selection device connected to the memory banks and being operated in one of a 16-bit mode, a 8-bit mode and a 4-bit mode to access the memory bank regions, i.e. to write or to read data from a central data register to the memory bank regions.”  Here, Minzoni discloses memory includes at least two bank (“each two of the memory banks forming a memory bank region”).  Recall that above Saqib teaches “learning data”.  Storing “addresses of” data will be taught by Parlante below.)  
the at least two bank regions are switched between a bank region for reading-out and a bank region for writing [every time a hierarchical level of nodes as learning targets is switched]  (Minzoni, Para [0057], discloses:  “According to another embodiment a memory module includes an even number of at least four memory banks, each memory bank having a plurality of memory cells, each two of the memory banks forming a memory bank region and being alternately connected to an 8-bit data bus. The memory banks are classified into two groups, each including a memory bank of each memory bank region. The memory module further includes a selection device connected to the memory banks and being operated in one of a 16-bit mode, a 8-bit mode and a 4-bit mode to access the memory bank regions, i.e. to write or to read data from a central data register to the memory bank regions.”  Here, Minzoni discloses the at least two bank regions (“each two of the memory banks forming a memory bank region”) are switched between a bank region for reading-out and a bank region for writing (“a selection device connected to the memory banks…mode to access the memory bank regions, i.e. to write or to read data”).  Thus, each bank can be switched between reading and writing.  Recall above that Owaida discloses a hierarchical level of nodes as learning targets is switched.  The decision tree algorithm described by Owaida certainly involves reading and writing data, and thus the combination of Owaida with Minzoni results in the at least two bank regions are switched between a bank region for reading-out and a bank region for writing every time a hierarchical level of nodes as learning targets is switched. Owaida, in fact, also discloses reading and writing to the data memory in Owaida Section IV A Right Column (“The data memory has one write and one read port”)).
 the [learning unit] device is configured to read out [an address of] [learning] data [branched at a node] from the bank region for reading-out, [and reads out the learning data from a region of the data memory indicated by the address ]  (Minzoni, Para [0057], discloses:  “According to another embodiment a memory module includes an even number of at least four memory banks, each memory bank having a plurality of memory cells, each two of the memory banks forming a memory bank region and being alternately connected to an 8-bit data bus. The memory banks are classified into two groups, each including a memory bank of each memory bank region. The memory module further includes a selection device connected to the memory banks and being operated in one of a 16-bit mode, a 8-bit mode and a 4-bit mode to access the memory bank regions, i.e. to write or to read data from a central data register to the memory bank regions.”  Here, Minzoni discloses a bank region for reading out (“a selection device connected to the memory banks…mode to access the memory bank regions, i.e. to write or to read data”).  Recall that above Saqib teaches “learning data”.  Storing “addresses of” data will be taught by Parlante below.)  
and the [discriminator] device is configured to write [the address of] the [learning] data [branched at the node] into the bank region for writing. (Minzoni, Para [0057], discloses:  “According to another embodiment a memory module includes an even number of at least four memory banks, each memory bank having a plurality of memory cells, each two of the memory banks forming a memory bank region and being alternately connected to an 8-bit data bus. The memory banks are classified into two groups, each including a memory bank of each memory bank region. The memory module further includes a selection device connected to the memory banks and being operated in one of a 16-bit mode, a 8-bit mode and a 4-bit mode to access the memory bank regions, i.e. to write or to read data from a central data register to the memory bank regions.”  Here, Minzoni discloses a bank region for writing (“a selection device connected to the memory banks…mode to access the memory bank regions, i.e. to write or to read data”).  Recall that above Saqib teaches “learning data”.  Storing “addresses of” data will be taught by Parlante below.)  
Saqib, Nishiyama, and Minzoni are analogous art because they are all in the field of endeavor of computer hardware.
	It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the decision tree hardware implementation of Saqib and Nishiyama with the multiple memory banks of Minzoni. The modification would have been obvious because one of ordinary skill in the art would be motivated to increase storage size and speed (Minzoni Para [0029]:  “To produce a data storage with a large storage capacity and/or with a high data throughput a plurality k of memory banks are combined, memory banks being of the same bit width m”).
	However, the combination of Saqib, Nishiyama, and Minzoni thus far fails to teach writing and storing addresses of the data; reads out the learning data from a region of the data memory indicated by the address.
	Parlante teaches writing and storing addresses of the data; reads out the learning data from a region of the data memory indicated by the address. (Parlante, Page 10 Para 2, discloses:  “How are pointers implemented? The short explanation is that every area of memory in the machine has a numeric address like 1000 or 20452. A pointer to an area of memory is really just an integer which is storing the address of that area of memory. The dereference operation looks at the address, and goes to that area of memory to retrieve the pointee stored there”.  Here, Parlante discloses writing and storing addresses of the data (“A pointer to an area of memory is really just an integer which is storing the address of that area of memory”) and reads out the learning data from a region of the data memory indicated by the address (“The dereference operation looks at the address, and goes to that area of memory to retrieve the pointee stored there”)).
Saqib, Nishiyama, Minzoni, and Parlante are analogous art because they are all in the field of endeavor of computer science.
	It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the decision tree hardware implementation of Saqib, Nishiyama, and Minzoni with the pointers of Parlante. The modification would have been obvious because it would result in being able to share values without costly copy operations, and one of ordinary skill in the art would be motivated to increase efficiency (Parlante Page 3 Para 2:  “Pointers solve two common software problems. First, pointers allow different sections of code to share information easily. You can get the same effect by copying information back and forth, but pointers solve the problem better”).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Saqib, Nishiyama, Minzoni, Parlante, and Koltisdas et. al. (US 2012/0131265 A1; hereinafter Koltsidas).
As per Claim 6, the combination of Saqib, Nishiyama, Minzoni, and Parlante discloses the learning device according to claim 5 as well as a discrimination unit, branching at a node (see Rejection to Claim 1), lower node (hierarchy of nodes), bank region for writing, and writing addresses of the data (see Rejection to Claim 5).
However, the combination of Saqib, Nishiyama, Minzoni, and Parlante fails to explicitly teach wherein the discrimination unit is configured to write addresses of the learning data branched from a node to one lower node into the bank region for writing in ascending order of addresses in the bank region for writing, and write addresses of the learning data branched from the node to the other lower node into the bank region for writing in descending order of addresses in the bank region for writing.
Koltsidas teaches wherein the [discrimination unit] device is configured to write addresses of the [learning] data [branched from a node to one lower node] into the [bank region for writing] memory in ascending order of addresses in the [bank region for writing] memory, and write addresses of the [learning] data [branched from the node to the other lower node] into the [bank region for writing] memory in descending order of addresses in the [bank region for writing] memory.  (Koltsidas, Para [0049], discloses: “In the second level cache, the received group of data units is stored. In case of a flash memory data units are stored in one or more erase units (i.e., flash blocks) of the second level cache where an erase unit can hold data from a single or a small set of groups. There can already be other groups of data units residing in the second level cache. For finally writing data units residing in the second level cache to the storage device, the data units of multiple groups are sorted by logical address. This plurality of data units then is transferred to the storage device such that the storage device receives the data units for writing in a sorted way, e.g. sorted in ascending order of the logical address, or sorted in descending order of the logical address, or a first portion of data units being sorted in ascending order of logical address and a second portion of data units being sorted in descending order of logical address. As such, the overall sequence of data units written to the storage medium results in the least possible movement between the write head and the storage medium, and as such in a fast write time. At the same time, an erase unit has no longer valid data if all groups that have stored valid data units in that erase unit have been destaged. It can then just be erased without needing to relocate any data.”  Here, Koltsidas discloses device is configured to write addresses of the data into the memory (“storage device receives the data units for writing in a sorted way”) in ascending order of addresses or descending order of addresses (“sorted in ascending order of the logical address, or sorted in descending order of the logical address”).  Recall that Nishiyama teaches a discrimination unit, Saqib teaches learning data, Minzoni teaches bank region for writing, and Saqib teaches branched from a node to one lower node.  The decision tree algorithm described by Saqib certainly involves multiple tree levels of nodes and reading and writing data, and thus the combination of Saqib with Koltsidas suggests wherein the discrimination unit is configured to write addresses of the learning data branched from a node to one lower node into the bank region for writing in ascending order of addresses in the bank region for writing, and write addresses of the learning data branched from the node to the other lower node into the bank region for writing in descending order of addresses in the bank region for writing.)

	It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the decision tree hardware implementation of Saqib, Nishiyama, Minzoni, and Parlante, with the ordered address storage in memory of Koltsidas. The modification would have been obvious because one of ordinary skill in the art would be motivated to increase efficiency (Koltsidas [0049]:  “As such, the overall sequence of data units written to the storage medium results in the least possible movement between the write head and the storage medium, and as such in a fast write time. At the same time, an erase unit has no longer valid data if all groups that have stored valid data units in that erase unit have been destaged. It can then just be erased without needing to relocate any data”).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the 
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Qasaimeh et. al. (“FPGA-Based Parallel Hardware Architecture for Real-Time Image Classification”) discloses implementing decision trees (Section III, Para 3) in a way that takes advantage of multiple memory ports (Page 63, right column third paragraph:  “In our architecture, we want to implement a histogram memory so that it should provide multiports for input and output”)
Malik et. al. (“FPGA based Combinatorial Architecture for Parallelizing RRT”), Intro, discloses “a novel multi-port hardware architecture that allows for zero latency read/write access to the N RRTs that grow in parallel”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.



/L.A.S./Examiner, Art Unit 2126   
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126