Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

	The examiner welcomes applicant to request an interview to discuss any potential distinguishable subject matter in an effort to enhance compact prosecution, as well as to enhance record clarity.


Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 

As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:

(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  

With respect to claims 1-9 and 11-13,
Claim 1, includes 
(1)	“a calculator configured to” (directed to, Branch Scoring), and
(2) “a scaling unit configured to” (directed to scaling values related to branch scoring), to fall within, a range capable of being calculated

	
	In the evaluation of the three prongs,
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function
Regarding claim 1, a device (invokes, no specific structural meaning), being, configured to, based on the interpretation of the claim, comprises, no hardware, no circuit or statutory medium (non-transitory), therefore as claimed, comprises scope with no non-structural terms (or terms) having no specific structural meaning (such as hardware or circuit).

(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 

Claim 1 comprises: generic place holders w/configured to.

(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

SEE claim 1, as recited is not deemed to be, modified by sufficient structure, material, or acts for performing the claimed function, as recited.
Claims 2-9 and 11-13, also are not deemed to include, sufficient structure, material, or acts for performing the claimed function.

Regarding claims 1-9 and 11-13, and the consideration of 112 (a/b), supporting structures, applicant does disclose, at least generic computer structures, memory and CPU (in Fig. 2 and 0007-, US 2020/0050963) or hardware and circuit and software implementations, supported hardware type structures, supporting the steps, associated with, learning of a decision tree (model).

It appears the invention can be implemented, by hardware or hardware/software, combination (0548).

Applicant appears does have various structures corresponding to, scaling (in Figs. 37, 42, 43) and scaling details of operation (0385-), in the specification, as seen with details on the bit level processing, as described in the specification.

It is noted the branch score, that calculates Gain, is a detailed algorithm is seen in claim 10, therefore, appears to include, enough detailed steps, being, the acts for performing the claimed function (Gain, as the branch score).

Therefore, claim 10, is not deemed to, pass prong three.

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.

If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Suggestion is to narrow claim 1, to include hardware structures.

If applicant intends to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to remove the structure, materials, or acts that performs the claimed function; or (2) present a sufficient showing that the claim limitation(s) does/do not recite sufficient structure, materials, or acts to perform the claimed function.
	

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because.
	Regarding claim 10 (narrows the claims to, the detailed algorithm), which is dependent on claim 1 (which comprises, no hardware structures), directed to a device, with calculator and scaler.
 Claim 10 is considered to include non-statutory scope, since does not include any limiting, statutory element (such as: hardware, circuit or non-transitory storage), to limit the scope of the device to be within statutory limits, as understood.
	The examiner suggests to limit to statutory scope, by adding a statutory element (such as: circuit), to narrow the device in claim 1.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5-9 and 11-13 are rejected under 35 U.S.C. 103 as being unpatentable over Kumaran et al. (US 2018/0349382) in view of Ng et al. (US 2019/0114538).
	Regarding claim 1, Kumaran teaches details directed to a learning device configured to perform learning (see ML), of a
decision tree (SEE abstract), the learning device comprising:

a branch score calculator (see Fig. 5A, score 503), configured to calculate a branch score used for determining 
a branch condition (see Fig. 4A, Yes/No), for a node of the decision tree based on 
a cumulative sum of gradient information (see 0029, GBDT, S1-S3 in Fig. 5B, see addition), 
corresponding to each value of a feature 

(see Features F1 to F3 and W1-W2, weights), amount of learn information or data 

SEE Tree with branches (Fig. 4A-B, root w/split nodes or leaves) and values & weights, associated with split nodes, branching, based on, Y and N, and w1 to w4
	Related to, feature (w/features 502), of a feature Matrix and calculations (scoring generating, scores S-S3) and identified in Fig. 5B being a summation of features (508, 509, 510), generating S1-S3, scores, based on weights and summations (or an algorithm), generating the scores (see 508 or 509-510), associated to, Matrix 502, in Fig. 5A, with, scores (503) vs. features (F1 to F9, having 9 elements) vs. rank (501), of columns, is based on the algorithm in Fig. 5B, generating the scores, directed to, a linear expression 507, generation or model to be applied to search engines (servers & engines 109-N).

	Also see features in a matrix (Fig. 7B), with score and rank, based on P, M and T (feature attributes) and/or document attributes.
0038-, to, “…to determine the health of a given sample…”
[0038] FIGS. 4A-4B are examples of decision trees that can be represented by linear expressions. A decision tree is sample representation for classifying examples, which includes a root node and one more split-nodes or leave nodes. Each split-node can have branches to other split-nodes until a final layer of split-nodes. At each node, a feature can be used to label a classification, e.g., body mass index (BMI)>20, in which the decision tree is to determine the health of a given sample. The result of the final split-nodes or leaf nodes can provide a prediction result or prediction value referred to as weights.

Note, a ML model (trained with samples: 100 people), to predict # healthy people.

[0039] For example, referring to FIG. 4A, a ML model can be trained out of a sample, e.g., 100 people, to generate an exemplary decision tree 400 to predict the number of healthy people. Decision tree 400 can have a root node 0 (410) including the feature “BMI>20” and can have branches yes (Y) or no (N). In this example, the feature BMI>20 is a Boolean term having a true (Y) or false (N) outcome. For the N branch from root node 0 (410), people can be classified as healthy given a weight value e.g., 0.2 indicating 20 percent of the people are classified as (BMI<=20) and are healthy. For the Y branch from root node 0, there can be a split-node 1 (422) or leaf node with another feature “Weight>80 kg” (or Boolean term) having branches Y or N from split node 1 (422). For the Y branch from split node 1, a weight value of 0.7 can be provided indicating 70 percent of the people are classified as (BMI>20)(Weight>80 kg) which can indicate overweight and not healthy. For the N branch from split node 1, a weight value of 0.1 can be provided indicating 10 percent of the people are classified as (BMI>20)(Weight<=80 kg) and healthy.


Note, can be directed to, webpages or documents, supporting search engine operations as, “input into the search engine ranking algorithms.”


Note, algorithm w/summation (or sum), in the process of generating the linear expression (or model).

[0040] This decision tree can be represented as a linear expression where weight values are W.sub.1=0.7, W.sub.2=0.1 and W.sub.3=0.2 and the linear expression is: (0.2)× (BMI<=20)+(0.7)× (BMI>20)(Weight>80 kg)+(0.1)× (BMI>20)(Weight<=80 kg). For the linear expression, if the Boolean term is True (Y), it is given a value of 1, and if the Boolean term is False (N), it is given a value of 0. In this way, traversing a decision tree represented by the linear expression would lead to a single weight value depending on the value of the Boolean terms. In one example, the linear expression can be represented as a string including number values, variables and Boolean terms. In other examples, as described in FIGS. 7A-7J, the linear expressions can relate to scoring webpages or documents in order to rank search results. Such a linear expression can be transmitted to a search engine of a search server and directly input to ranking algorithms used to rank search results without modification to the search engines or ranking algorithms which can simply input the string of number values, variables and Boolean terms. In other examples, the training ML model can change at the training server and features can change, e.g., the feature BMI>20 can change to BMI<20, and the split nodes, branches and weight values can change accordingly. In such a case, the linear expressions can be modified to adjust for the changes sent to a search engine and can be input into the search engine ranking algorithms.


SEE Tree (decision), w/root and nodes (branches), w/weights, this tree has information related to, Height & Body Mass, supporting search engine operations, applied to, the search engine ranking algorithms.


[0041] Referring to FIG. 4B, in one example, instead of one split-node, decision tree 450 includes a root node 0 (460) and two split-nodes—split node 1 (462) and split node 2 (463). The branches of split nodes 1 and 2 can provide weight values W.sub.1 through W.sub.4 (470 through 473). Example features for the root nodes and split nodes are shown in FIG. 6A. Referring to FIG. 6A, a decision tree 600 has two split nodes 602 and 603. At root node 601, the Boolean feature term “BMI>20” is used with two branches Y or N. If true (Y), root node 601 branches to split node 602, which has the Boolean feature term “Height<135 cm.” If true (Y), split node 602 provides a weight (W.sub.1) value of 0.7, which can provide a confidence level or a prediction result classification, e.g., 70% of the sample of 100 people are (BMI>20)(Height<135 cm). If false (N), split node 602 provides a weight (W.sub.2) value of 0.1, which can indicate that 10% of the sample are (BMI>20)(height>=135 cm). At split node 603, if Y and height>170 cm, split node 603 can provide a weight (W.sub.3) value of 0.4 indicating 40% of the sample are (BMI<=20)(height>170 cm). And if N and height is <170 cm split node 603 provides a weight (W.sub.4) value of 0.2 indicating the 20% of the sample is (BMI<=20)(Height<=170 cm). Decision tree 600 can be represented by the linear expression 607 as shown in FIG. 6B as [(0.7)× (BMI>20)× (Height<135)+(0.1)× (BMI>20)× (Height>=135)+(0.4)× (BMI<=20)×(Height>170)+(0.2)× (BMI<=20)× (Height<=170). In this example, if a Boolean term is True it will have a value of 1 and if False it will have a value of 0, and decision tree 600 is a single tree in which a decision will traverse the tree to a single weight, e.g., W.sub.1 to W.sub.4. A score can be determined by one of the weights W.sub.1, W.sub.2, W.sub.3, or W.sub.4 in which one of the weight values will be obtained based on traversing the decision tree 600 and the other values will equal zero as the branches of the decision tree will not be traversed giving a value of 0 for a “N” branch. For example, if BMI>20 and Height<135 cm are “Y”, the decision tree 600 would traverse to weight value W.sub.1=0.7, and all other decision branches will be “N” making W.sub.2, W.sub.3 and W.sub.4=0. That is, for the linear expression, W.sub.2×0=0, W.sub.3×0=0, and W.sub.4×0=0. Thus, for this branch, using the linear expression of FIG. 6B the score would equal 0.7. In other examples, as shown in FIGS. 7F-7J, multiple trees can be traversed to different weight values across multiple trees.


And,
O	a unit configured to perform (w), on a
value related to the cumulative sum used for calculating
the branch score by the branch score calculator generating ranges, to with which the branch score is
capable of being calculated (see weights such as .7 & .1) and Ranges (601, 602, 603), in Fig. 6A & (0038-0039)

“…Decision tree 400 can have a root node 0 (410) including the feature “BMI>20” and can have branches yes (Y) or no (N)…” 
And
w/weight values vs. weight (w/ranges)
“…For the Y branch from root node 0, there can be a split-node 1 (422) or leaf node with another feature “Weight>80 kg” (or Boolean term) having branches Y or N from split node 1 (422). For the Y branch from split node 1, a weight value of 0.7 can be provided indicating 70 percent of the people are classified as (BMI>20)(Weight>80 kg) which can indicate overweight and not healthy. For the N branch from split node 1, a weight value of 0.1 can be provided indicating 10 percent of the people are classified as (BMI>20)(Weight<=80 kg) and healthy….”
	While weighting appears to read on scaling with defined ranges (appears to include .1 to 1), including the above to convert between 20% vs. .2, is deemed to be forms of scaling, is based on the broadest reasonable interpretation.
	
On the alternative, Scaling in light of applicants specification can be related to, learning, where values need to be scaled (applicant’s specification, 0385), this scaling is directed to, scaling BIT values (such as: Binary), that represent the data, are scaled to fall within a range (based on the number of bits), and/or to cause data to fall within (ranges).

The examiner cites, Ng et al., directed to neural networks, processing data in a matrix of a NN (in Layers), of an accelerator, the disclosure applies, a scaling unit (as well as unscaling or undo of scaling), the scaling based on scaling factors, relating to the BIT level scaling, including to, “scale the branch score by the branch score calculator to fall within a numerical range with which the branch score is capable of being calculated” (as claimed).
(see at least, 0052-0065)
SEE “neural network accelerator 238 further includes scaler circuit 528 and unscaler circuit 530”
Note scaling (based on values in a range), appears is directed to, “avoid arithmetic overflow in neural network applications”, ……….., at the binary data representation level (or bits).

[0051] The neural network accelerator 238 further includes scaler circuit 528 and unscaler circuit 530. An implementation of the matrix multiplier array performs fixed point multiplication, and in order to avoid arithmetic overflow in neural network applications in which the output of one layer is the input to the next layer, the input values to the matrix multiplier are scaled to a range that will not cause overflow in the computations of the next layer.

[0052] The scaling factor(s) can be determined through prior analysis of the range of possible values of the input data and matrix data in each layer of the neural network.

[0053] As an example using 16-bit fixed point, the matrix multiplier 362 performs a matrix multiplication of matrices A1 and B1: C1=A1*B1


SEE 16 bit range, one (bit), for sign bit (+/-) and scale factors A1 and B1

[0054] If A1 has a maximum input range of −15.8 to 17.5, 17.5 is used to create a scaling value of 2̂15/17.5=1872.4, which when multiplied by all the values in A1 will adjust them to the 16-bit range and preserve one sign bit. For this example the fixed precision number is treated as an integer without fractional bits. If having fractional bits were desired for some reason, smaller scaling values could be used. We call the first scaling factor “scaleA1.” The same operations can be performed on the B1 inputs with the range for B1 to generate “scaleB1.”


Note, scaling to, reduce the range of the input data (A1 & B1).

[0055] The A1 and B1 inputs are scaled by scaleA1 and scaleB1, and the floating point NN layer is rerun to assess the maximum range of the dot product result. If, for example, 32-bit values are to be accumulated, and if any dot product output exceeds a 32-bit representation, a scaling factor of 2̂31/val is used to determine a reduction scaling factor that would reduce the range of the A1 and B1 inputs. The square root of that value is multiplied by scaleA1 and multiplied by scaleB1 to produce scaleA1′ and scaleB1′ which when applied to the original floating point number ensures the fixed-precision dot product does not exceed 32 bits.

[0056] Every layer on the deep learning network is analyzed in this manner producing scaling factors for each layer. The hardware for scaling between layer N and N+1 is general enough to scale between any sequence of layers.

Also Note, “…undo the scaling…” and, “in preparation for calculating layer 2.”

[0057] B2 is the input for the next layer, and B2=C1. After applying scaling values and instead of computing C1=A1*B1, the computation is (scaleA1′*A1)*(scaleB1′*B1). The scaling result is divided by (scaleA1′*scaleB1′) to get C1 and undo the scaling. Since B2=C1 the next step would be to scale by scaleB2′ in preparation for calculating layer 2. If combining the scaling on the output of layer 1 and the input of layer 2, the layer 1 dot product outputs are scaled by scaleB2′/(scaleA1′*scaleB1′).

[0058] Combining the output scaling factors from layer N with the input scaling factors for layer N+1, the scaling factor is approximated into the form x/2̂y. This term can be implemented very efficiently as hardware in the FPGA yet can describe scaling factors with high precision.

SEE multiple, division or to divide and shift operation

[0059] In describing the scaling factors in the form x/2̂y, arbitrary precision scaling can be implemented using a single multiply operation. Normally division is a complicated operation so describing the denominator as 2̂y means the y value is actually describing a right shift of the input value. If y is a constant there are no muxes or any logic actually created so it is very efficient. Right shift by a constant can be implemented by rewiring of the input to output bus. The architecture also features one right shift by y, a multiplication by x, and a second right shift after the multiplication.

Note, scaling (referred to as multiply), unscaler (divide operation), scaling and unscaling reversing scaling (or an inverse operation) 

[0060] The scaler circuit 528 multiplies the values in the input weights matrix by the scaleA1 factor, and multiplies the input data values by the scaleBx factor (where “x” is the layer). The unscaler circuit 530 unscales the output values from the matrix multiplier array by dividing each output value by (scaleA1*scaleBx).

Therefore, since, the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Kumaran in view teachings of Ng,
to perform a scaling operation (as claimed), “to scale the branch score by the branch score calculator (of Kumaran), to fall within a numerical range with which the branch score is capable of being calculated”, on the bit level, in order to reduce the range of input values, in order to, “avoid arithmetic overflow in neural network applications”, this processing methodology, appears to have advantages and are applied to
a neural network accelerators (shown in Fig. 4).
	Regarding claim 2, the combination as applied is further deemed to render obvious as claimed, wherein
the branch score calculator (of Kumaran), as applied in combination with, Ng, further teaches, wherein system also associates, or includes a divider, associated with and in combination is, deemed configured to calculate the branch score, and wherein the scaling unit (based on Ng), is configured to perform the scaling on the value related to the cumulative sum and input to the divider (to, a layer in the ML model)

SEE Ng (as applied above), includes, division (appears is directed to, scaling or bit shifting right), opposite to multiplication (or bit shifting left), Ng, as applied (0057-0060), this operation (division), is directed to, the UNDO the scaling.
SEE unscaler (unscales) vs. scaler
[0060] The scaler circuit 528 multiplies the values in the input weights matrix by the scaleA1 factor, and multiplies the input data values by the scaleBx factor (where “x” is the layer). The unscaler circuit 530 unscales the output values from the matrix multiplier array by dividing each output value by (scaleA1*scaleBx).


	Regarding claim 3, the combination as applied is deemed to further render obvious, wherein the branch score calculator of Kumaran combined with Ng, render obvious, to include, an approximation arithmetic unit configured to perform an approximation operation on the branch score (Kumaran, see Fig. 5B), and the scaling unit performs scaling (Ng), on the value related to the cumulative sum and input to the approximation arithmetic unit (see Kumaran, such as: 0029, “predict”, or neural network, to input another layer), as applied to, avoid overflow (based on Ng), is considered obvious based on the combination.
Also see Kumaran Fig. 5B, w1, w2, w3 and Fig. 6B, 0038, 0039, 0040-0052 and Ng as applied above.

	Regarding claim 5, based on the applied combination is deemed to further render obvious, wherein the learning device according to claim 1, further comprising:

O	a scaling amount calculator (Ng), configured to calculate a
scaling amount for the value related to the cumulative sum (of Kumaran), based on 

a maximum value of …. (maximum number of bits 32 bits), of the cumulative sum (see Score, by dot product)
 
wherein the scaling unit (Ng), is configured to perform the scaling on the value related to the cumulative sum to fall within the numerical range (defined by Bit Width or # bits), using the scaling amount calculated by the scaling amount calculator, to process at next layers of the neural network model.


Ng as applied, renders obvious, considering a maximum value (0054), associated with scaling amount determination (or calculation).
SEE “a maximum input range of −15.8 to 17.5”, the higher of the two is selected (see 17.5).

[0054] If A1 has a maximum input range of −15.8 to 17.5, 17.5 is used to create a scaling value of 2̂15/17.5=1872.4, which when multiplied by all the values in A1 will adjust them to the 16-bit range and preserve one sign bit.

	Ng fails to specifically mention, a maximum value of an absolute value, but suggests above, a range wherein one value is negative, therefore one skilled in the art would realize it obvious to consider the Range in view of, a maximum value of an absolute value (amount) in order to compare to determine which is higher (17.5 vs. -15.8).

	Therefore, it is deemed obvious to apply, to utilize an absolute value (of a Negative value), in consideration of range determination (numerical range) and scaling, since one value, is shown as negative, therefore, it is obvious to, generate the absolute value of (-15.8) vs. (17.5), to generate the range determination (SEE Ng, higher of the two), is based on, an absolute value (amount), comparison, additionally, as also is obvious to consider, a summation of (15.8 + 17.5), which is also, a data value range determination, appears one would apply the function to add to arrive at a range between the numbers.



Regarding claim 6, based on the applied combination is deemed to further render obvious, further comprising:

O	a scaling amount calculator (Ng, as applied), configured to calculate a scaling amount for the value related to the cumulative sum based on a sum total of the gradient information (Kumaran, as applied), wherein the scaling unit is configured to perform the scaling on the value related to the cumulative sum to fall within the numerical range using the scaling amount calculated by the scaling amount calculator.

In view of Kumaran (0029, GBDT), also see 0048- (added)

“Gradient Boosting Decision Tree (GBDT) models”

[0029] 

“…. For example, ML models 106 can be based on any type of ML model including Gradient Boosting Decision Tree (GBDT) models, linear regression models or logistic regression models in generating decision trees that can predict a target result or provide a target value based on any number of input features at nodes of the tree providing classification labels. ML models can be generated on any type of computing device or computer ………”



	Regarding claim 7, the combination as applied is deemed to render obvious wherein, the learning device according to claim 1, further comprising:

O	a scaling amount calculator configured to calculate a
scaling amount,

o	for the value related to the cumulative sum
based on a number of samples of the learning data at each node of the decision tree, wherein

o	the scaling unit is configured to perform the scaling
on the value related to the cumulative sum to fall within
the numerical range using the scaling amount calculated by
the scaling amount calculator

SEE Ng, scaler 364, 0045, 0051, as applied, teaches to calculate the scaling (value), is deemed obvious to be at least, related to, the value of the cumulative sum (of Kumaran), is based on a number of samples (being Training Data), as claimed.


Note as in Kumaran machine learning herein is based on data samples, 0039, 0041, sample (=100 people), or samples, as a basis for decision tree generation.

SEE Kumaran

[0039] For example, referring to FIG. 4A, a ML model can be trained out of a sample, e.g., 100 people, to generate an exemplary decision tree 400 to predict the number of healthy people. Decision tree 400 can have a root node 0 (410) including the feature “BMI>20” and can have branches yes (Y) or no (N). In this example, the feature BMI>20 is a Boolean term having a true (Y) or false (N) outcome. For the N branch from root node 0 (410), people can be classified as healthy given a weight value e.g., 0.2 indicating 20 percent of the people are classified as (BMI<=20) and are healthy. For the Y branch from root node 0, there can be a split-node 1 (422) or leaf node with another feature “Weight>80 kg” (or Boolean term) having branches Y or N from split node 1 (422). For the Y branch from split node 1, a weight value of 0.7 can be provided indicating 70 percent of the people are classified as (BMI>20)(Weight>80 kg) which can indicate overweight and not healthy. For the N branch from split node 1, a weight value of 0.1 can be provided indicating 10 percent of the people are classified as (BMI>20)(Weight<=80 kg) and healthy.



Regarding claim 8, the combination as applied is deemed to render obvious wherein

a scaling amount calculator (as applied with, Ng, scaler 528, in Fig. 5), configured to calculate a scaling amount for the value related to the cumulative sum (of Kumaran), based on the value related to the cumulative sum (Range or ranges); and

o	an inverse scaling unit (Ng, such as 530, in Fig. 5) configured to perform inverse scaling that restores, based on the scaling amount, a value operated by the branch score calculator using the value related to the cumulative sum and subjected to the scaling performed by the scaling unit, to an original scale


SEE Ng as applied also includes as claimed, an inverse scaling (such as: Undo, or unscaler), after scaling, to, Undo (0057) with an, unscaler 530 (Fig. 5), wherein the unscaler is after the multiplier (362) and, the scaler (528), as shown in Fig. 5.

	SEE Fig. 5, the scaling is performed (at a layer), to, avoid overflow, while, unscaling or to undo the scaling, is done, prior to next layers, as understood. 

SEE Ng (page 9, see claim 6) & 0057 (undo or unscaler)

6. The method of claim 1, further comprising: scaling values from an input data matrix to a layer by a layer-specific scaling factor before performing neural network operations of a current layer; and unscaling result values from the current layer before initiating scaling of values from an input data matrix to a next layer.

SEE Fig. 5, scaler (528 or step), and input to the unscaler (530 or step), is deemed obvious to be as claimed: to perform, unscaling result values from the current layer before initiating scaling of values from an input data matrix to a next layer, in view of claim 6, page 9 (w/claims), the disclosure of Ng.



Regarding claim 9 the combination as applied is deemed to render obvious, wherein the device, wherein the
learning of:

o	the decision tree is performed by gradient boosting

SEE Kumaran 0029, wherein the ML models, can be,

Gradient Boosting Models (GBDT)

“…For example, ML models 106 can be based on any type of ML model including Gradient Boosting Decision Tree (GBDT) models…”


Regarding claim 11, the combination as applied is deemed to further render obvious wherein, in a case in which the feature amount has two or more dimensions, the branch score calculator is provided for each feature amount.
SEE Kumaran (features, 0004) and Matrix (Fig. 5a, 500, 502, 503 & 501, also see Fig. 5B, such as S1 = summation of features, F1, F2 & F3 x weights W1, W2, W3) or dimensions, associated with the generation of the linear expression (or model, Figs. 6A-B), associated with decision tree generation (w/nodes 601-603), applied to answer queries (results being, forecasts or predictions), directed to a search engine, as understood.

Regarding claim 12, the combination as applied is deemed to further render obvious wherein, the learning device according to claim 1, further comprising:
O	a leaf weight calculator configured to calculate a
leaf weight as an output with respect to an input to the
decision tree using a division circuit in a case in which
the node of the decision tree is a terminal node and

o	a leaf weight scaling unit, configured to perform
scaling on a value related to the cumulative sum and used
for calculating the leaf weight by the leaf weight
calculator to cause the value to fall within a numerical
range with which the leaf weight is capable of being
calculated (SEE Ng, scaling, on the BIT LEVEL)

SEE Ng (weight, 0003, 0006, 0022, 0041, 0044, 0045, 0046, 0050-) and scaler and scaling (0039-0041, 0045, 0051-)
And
Kumaran weight values (0019, 0024, 0038, 0039-0043), being leaf (end nodes vs. split & root of, a Tree) weights (see Fig. 6A)

Claim 13 (method), is deemed analyzed and discussed with respect to claim 1 (on a device), above, is directed to, a learning method for, a learning device, configured to perform learning of a decision tree, the method comprising: calculating a branch score used for determining a branch condition for a node of the decision tree based on a cumulative sum of gradient information corresponding to each value of a feature amount of learning data; and performing scaling on a value related to the cumulative sum used for calculating the branch score to cause the value to fall within a numerical range with which the branch score is capable of being calculated.


Claim 4 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Kumaran et al. and Ng et al., as applied above and further in view of Resch et al. (US 2015/0381730)
Regarding claim 4, the combination as applied with Kumaran, appears does teach, branch scores, scoring algorithm and render obvious wherein the approximation arithmetic unit is configured to calculate an approximate value of but, fails to particularly mention,

O	a logarithm (a LOG FUNCTION), of the branch score (algorithm), associated with performing linear interpolation, on the logarithm

	Note, a Logarithm based score, is a Log type algorithm based scoring…

Resch teaches, generating branch scores (associated with a tree, Figs. 41B & 41A), associated with a request and response system is deemed to teach and render obvious the difference, wherein Branch scores can be generated by applying a Log function, of the scoring algorithm.
 

	As those skilled in the art would realize, Log functions are well known generating values or scores, applied widely in the areas of engineering, etc..

SEE 0266, w/score, based on a negative Log normalized (or a logarithm based), score algorithm
 [0266] With a set of normalized interim results 1-N, each scoring function performs a scoring function on a corresponding normalized interim result to produce a corresponding score. The performing of the scoring function includes dividing an associated location weight by a negative log of the normalized interim result. For example, scoring function 2 divides location weight 2 of the storage pool 2 (e.g., associated with location ID 2) by a negative log of the normalized interim result 2 to produce a score 2.
Note, “What is a negative log? A negative log is defined as the number of times required that 1 must be divided by the base in order to achieve the log number. 
So, -Log2 (.5) = 1

Therefore since, the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Kumaran and Ng, in view of the teachings of Resch, to utilize, log based functions, as part of the algorithms (being logarithms or Log based), to generate the branch scores (based on a Log based function, or a logarithm based algorithm), as taught by Resch.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-13 are rejected under 35 U.S.C. 101 because the claimed invention is directed to, abstract subject matter, deemed without significantly more. 
Claim 1 (learning device) and claim 13 (learning method), recite limitations directed to, learning a decision tree, with a branch score calculator and scaling unit, as claimed.

In consideration of the claims in view of eligibility flow
based on the premise (BRI), the claims are directed to a device, for forming by (ML), a decision tree (or a ML model), therefore are directed to a machine (device) and process (create a tree model).
Step 1: Answer: Yes
	To consider, a streamline analysis, based on the claim or claims, as a whole being self-evident
Streamlined, Answer: No

	Next step 2A: to consider, being directed to, an abstract Idea, based on Nature or natural data (data of, real world features), input as the sample (applied to Machine learning a Tree), but, is directed to an abstract idea, due to lacking of reciting, any a practical application, in the claims.
Answer: Yes (Therefore, deemed abstract) 	
 
	The claimed elements of claim 1 & 13, are deemed to be conventional (see art rejections), as well as based on the scope are also deemed to be routine, branch scoring and tree model creation, as well as to consider scaling related data with respect to, the model creation and calculations.

	Regarding claim 2, wherein the branch score calculator includes, a divider configured to calculate the branch score.

Regarding claim 3, further limits the claims to
the branch score calculator includes an approximation
arithmetic unit configured to perform an approximation
operation on the branch score and the scaling unit performs scaling on the value related to the cumulative sum and input to the approximation arithmetic unit.

Regarding claim 4, The learning device according to claim 3, wherein the approximation arithmetic unit is configured to calculate an approximate value of a logarithm of the branch score by performing linear interpolation on the logarithm.

Regarding claim 5, The learning device according to claim 1, further comprising: a scaling amount calculator configured to calculate a scaling amount for the value related to the cumulative sum based on a maximum value of an absolute value of the cumulative sum, wherein the scaling unit is configured to perform the scaling on the value related to the cumulative sum to fall within the numerical range using the scaling amount calculated by the scaling amount calculator.

Regarding claim 6, The learning device according to claim 1, further comprising: a scaling amount calculator configured to calculate a scaling amount for the value related to the cumulative sum based on a sum total of the gradient information, wherein the scaling unit is configured to perform the scaling on the value related to the cumulative sum to fall within the numerical range using the scaling amount calculated by the scaling amount calculator.

Regarding claim 7, The learning device according to claim 1, further comprising: a scaling amount calculator configured to calculate a scaling amount for the value related to the cumulative sum based on a number of samples of the learning data at each node of the decision tree, wherein the scaling unit is configured to perform the scaling on the value related to the cumulative sum to fall within the numerical range using the scaling amount calculated by the scaling amount calculator.

Regarding 8, The learning device according to claim 1, further comprising: a scaling amount calculator configured to calculate a scaling amount for the value related to the cumulative sum based on the value related to the cumulative sum; and an inverse scaling unit configured to perform inverse scaling that restores, based on the scaling amount, a value operated by the branch score calculator using the value related to the cumulative sum and subjected to the scaling.

Regarding claim 9. The learning device according to claim 1, wherein the learning of the decision tree is performed by gradient boosting.

Claim 10, with the Gain formula (defined algorithm)

The claimed elements 1-9 and 11-13 are deemed conventional in the art, even routine to those skilled in the art, in view of prior art applied to the claims, the claims do not include any limitations directed to, any practical applications.

The claims mere comprise additional mathematical steps in the process of generating, a decision tree, but the claims do not recite any details directed to, a practical application.

Claim 10, recites a detailed algorithm, not mapped to the prior art, but, also does not recite, a practical application. 

The claims merely are directed to instructions to apply an exception using a generic computer does not amount to significantly more, since is not directed, at a practical application.
	
	The claims are seen as, pre-emptive, due to not being limited to, a practical application, nor, directed to improvements in machine processing, that can be understood.
	
For clarity the 101 is after the art rejection, to show, the prior art created model (ML), is applied to the real world, to search engines supporting query and ranking search results (or as, a practical application).
SEE Kumaran (US 2018/0349382, as applied above).

While all the claims have been analyzed to attempt to identify, something more, since the claims also do not include details associated with, a practical application to apply to the Decision Tree (Learned) or Model (ML) to, all the claims, correspond to wherein, this judicial exception is not integrated into a practical application because does not provide any meaningful limits on the use of the model once created, not any machine processing improvements in view of the operations in the generic sense.

Additionally, the prior art, is applied at [0039] For example, referring to FIG. 4A, a ML model can be trained out of a sample, e.g., 100 people, to generate an exemplary decision tree 400 to predict the number of healthy people, appears is more details, directed to, a practical application (use the model, to predict health, associated with people or a sample). 
 
It is noted, that as applied above, with Ng, the scaler is associated with accelerator (see 0051 & 238), teaches as applied above, but, is also seen to perform its function as an accelerator (directed to improve performance, 0023-) by, scaling and parallel processing (0060-0062), realizing a computer system improvement (or a practical application), in view of scaling.

Accordingly, the dependent claims, additional elements does not integrate the abstract idea into a practical application, because it does not impose any meaningful limits on practicing the abstract idea.
 
Therefore, for at least this reason, the claims are not deemed, integrated into a practical application, as described above.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The insignificant extra-solution activities identified above, which include the data-gathering, and presenting steps, are recognized by the courts as well-understood, routine, and conventional activities when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity (See MPEP 2106.05(d)(II)(i) Receiving or transmitting data over a network, e.g., using the Internet to gather data, buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network); (v) Presenting offers and gathering statistics, OIP Techs., 788 F.3d at 1362-63, 115 USPQ2d at 1092-93).  The claim is not patent eligible.


Allowable Subject Matter
Claim 10 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 101, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.

	The prior art of record fails to teach claim 10, directed to a deemed distinguishable algorithm, wherein the branch score calculator, is configure to calculate, by the algorithm, as shown and recited in claim 10 (generating Gain), wherein the learning (the decision tree), directed to gradient boosting, as understood.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
(A) RAJNAYAK et al., (US 2019/0325354 4/20/2018 (FP), and Accenture), also teaches Boosted Trees, AI and ML modeling, directed to, entity behavior based (learning models), adapted to predict entity behavior.

(B) Silverman (US 2016/0232321, 12/22/2015), teaches output score for patients risk (surgical), associated with the evaluation of patent health data, associated with a tree (or knowledge, Fig. 3A) and learning systems.

Contact Information
Any inquiry concerning this communication or earlier communications should be directed to the examiner of record
Vincent F. Boccio whose telephone number is (571) 272-7373.
The examiner can normally be reached between Monday-Friday between (8:00 AM to 4:00 PM).

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Vital can be reached on (571)272-4215. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval
(PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR.

Status information for unpublished applications is available through Private PAIR only.

For more information about the PAIR system:
"http://portal.uspto.gov/external/portal/pair"

Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) 866-217-9197 (toll-free)

If you would like assistance from a USPTO Customer Service
Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) OR 571-272-1000.

/VINCENT F BOCCIO/Primary Examiner, Art Unit 2162                                                                                                                                                                                                        
8/11/2022