DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is responsive to the original application filed on 7/23/2018.  Acknowledgement is made with respect to a claim of foreign priority to British Application GB1810736.7 filed on 6/29/2018.

Claim Interpretation

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.


This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “the module computes a binary outcome” in claims 1, 11, and 17 and their dependents1.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.


Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. § 101 because the claimed invention is directed to an abstract idea without significantly more.  The analysis of the claims will follow the 2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50 (“2019 PEG”).

When considering subject matter eligibility under 35 U.S.C. 101, it must be determined whether the claim is directed to one of the four statutory categories of invention, i.e., process, machine, manufacture, or composition of matter (Step 1). If the claim does fall within one of the statutory categories, the second step in the analysis is to determine whether the claim is directed to a judicial exception (Step 2A). The Step 2A analysis is broken into two prongs. In the first prong (Step 2A, Prong 1), it is determined whether or not the claims recite a judicial exception (e.g., mathematical concepts, mental processes, certain methods of organizing human activity). If it is determined in Step 2A, Prong 1 that the claims recite a judicial exception, the analysis proceeds to the second prong (Step 2A, Prong 2), where it is determined whether or not the claims integrate the judicial exception into a practical application. If it is determined at step 2A, Prong 2 that the claims do not integrate the judicial exception into a practical application, the analysis proceeds to determining whether the claim is a patent-eligible application of the exception (Step 2B). If an abstract idea is present in the claim, any element or combination of elements in the claim must be sufficient to ensure that the claim integrates the judicial exception into a practical application, or else amounts to significantly more than the abstract idea itself.

Claim 1
Step 1:  The claim recites a predictor; therefore, it is directed to the statutory category of a manufacture.
Step 2A Prong 1:  The claim recites, inter alia:
computes a binary outcome for selecting a child node of the internal node: Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of computing a binary outcome for selecting child node of a decision tree, which is an evaluation or observation capable of practically being performed in the human mind with the assistance of pen and paper.  For example, one can, under a BRI and based on a decision criterion, decide to select a child node of a current node based on a predetermined criterion.
compute the prediction by processing the example x using a plurality of the differentiable operations selected according to a path through the tree from the root node to a leaf node:  Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of computing a prediction using an input example using operations selected along a path in a decision tree, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  Specifically, the additional elements consist of “a memory which stores at least one example x for which an outcome y is not known; the memory storing at least one decision tree comprising a plurality of nodes connected by edges, the nodes comprising a root node, internal nodes and leaf nodes; wherein, individual ones of the nodes and individual ones of the edges each have an assigned module, comprising parameterized, differentiable operations” and “a processor”.  The additional elements of “a memory”,  “a processor”, and “the module” are generic computer components recited in a manner that represents no more than mere instructions to apply the judicial exception on a computer (see MPEP § 2106.05(f)).  The additional element of “the memory storing at least one decision tree comprising a plurality of nodes connected by edges, the nodes comprising a root node, internal nodes and leaf nodes” is insignificant extra-solution activity that does not amount to an inventive concept (see MPEP §2106.05 (g); “mere data gathering”).  Thus the additional elements do not provide any meaningful limits on the execution of the abstract idea. Even when viewed in combination, these additional elements do not integrate the abstract idea into a practical application and the claim is thus directed to the abstract idea
Step 2B:  The claim does not contain significantly more than the judicial exception.  The additional elements of “a memory”,  “a processor”, and “the module” are generic computer components recited in a manner that represents no more than mere instructions to apply the judicial exception on a computer (see MPEP § 2106.05(f)).  The additional element of “the memory storing at least one decision tree comprising a plurality of nodes connected by edges, the nodes comprising a root node, internal nodes and leaf nodes” is insignificant extra-solution activity that does not amount to an inventive concept (see MPEP §2106.05 (g); “mere data gathering”). Nothing in the claim provides significantly more than that abstract idea.  As such, the claim is ineligible.


Claim 2
Step 1:  A manufacture, as above.
Step 2A Prong 1:  The claim recites “wherein the assigned modules along the path form a neural network, and wherein the example x is any of: an image, image feature map derived from an image, video, audio signal, text segment, phonemes from a speech recognition pre-processing system, skeletal data produced by a system which estimates skeletal positions of humans or animals from images, sensor data, data derived from sensor data”. These limitations merely places restrictions on the type of data used in the analysis and the technological environment in which the judicial exception is performed, and does not negate the mental nature of the underlying process.
Step 2A Prong 2, Step 2B:  This claim recites the additional elements of “wherein the assigned modules along the path form a neural network, and wherein the example x is any of: an image, image feature map derived from an image, video, audio signal, text segment, phonemes from a speech recognition pre-processing system, skeletal data produced by a system which estimates skeletal positions of humans or animals from images, sensor data, data derived from sensor data” (which are field of use limitations under MPEP § 2106.05(h); MPEP 2106.04(d); 2019 Guidance, 84 FR 50 at 55.  See, 2019 Guidance, 84 FR 50, footnote 32. [ID:(S2AP2)1130]). Nothing in the claim integrates the abstract idea into a practical application, nor does it provide significantly more than the abstract idea, and thus the claim is subject-matter ineligible.

Claim 3
Step 1:  A manufacture, as above.
Step 2A Prong 1:  The claim recites “wherein the assigned modules which are assigned to internal nodes of the decision tree are routers configured to compute a binary decision in a stochastic manner according to characteristics of the processed example”. Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of computing a binary decision, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper.
Step 2A Prong 2, Step 2B:  This claim recites the additional element of “wherein the example x is any of: an image, image feature map derived from an image, video, audio signal, text segment, phonemes from a speech recognition pre-processing system, skeletal data produced by a system which estimates skeletal positions of humans or animals from images, sensor data, data derived from sensor data” (which is a field of use limitation under MPEP § 2106.05(h); MPEP 2106.04(d); 2019 Guidance, 84 FR 50 at 55.  See, 2019 Guidance, 84 FR 50, footnote 32. [ID:(S2AP2)1130]). Nothing in the claim integrates the abstract idea into a practical application, nor does it provide significantly more than the abstract idea, and thus the claim is subject-matter ineligible.

Claim 4
Step 1:  A manufacture, as above.
Step 2A Prong 1:  The claim recites “computing the binary decision according to samples from a probability distribution with a mean corresponding to a current input to the decision tree”. Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of computing a binary decision, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper.
Step 2A Prong 2, Step 2B:  This claim recites the additional element of “a processor”, which is a generic computer component recited in a manner that represents no more than mere instructions to apply the judicial exception on a computer (see MPEP § 2106.05(f)). Nothing in the claim integrates the abstract idea into a practical application, nor does it provide significantly more than the abstract idea, and thus the claim is subject-matter ineligible.

Claim 5
Step 1:  A manufacture, as above.
Step 2A Prong 1:  The claim recites “operate on transformed input data received at the solver and to output an estimate of a conditional distribution expressing the probability of the outcome y given the example x”. Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of operating on an input to output an estimate, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper.
Step 2A Prong 2, Step 2B:  This claim recites the additional element of “solvers”, which are generic computer components recited in a manner that represents no more than mere instructions to apply the judicial exception on a computer (see MPEP § 2106.05(f)). This claim recites the additional element of “wherein the assigned modules along the path form a neural network, and wherein the example x is any of: an image, image feature map derived from an image, video, audio signal, text segment, phonemes from a speech recognition pre-processing system, skeletal data produced by a system which estimates skeletal positions of humans or animals from images, sensor data, data derived from sensor data” (which are field of use limitations under MPEP § 2106.05(h); MPEP 2106.04(d); 2019 Guidance, 84 FR 50 at 55.  See, 2019 Guidance, 84 FR 50, footnote 32. [ID:(S2AP2)1130]). Nothing in the claim integrates the abstract idea into a practical application, nor does it provide significantly more than the abstract idea, and thus the claim is subject-matter ineligible.

Claim 6
Step 1:  A manufacture, as above.
Step 2A Prong 1:  The claim recites “compute a non-linear function of an example or a processed example reaching the edge from a parent node”. Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mathematical concept of computing a non-linear function.
Step 2A Prong 2, Step 2B:  This claim recites the additional element of “transformers”, which are generic computer components recited in a manner that represents no more than mere instructions to apply the judicial exception on a computer (see MPEP § 2106.05(f)). This claim recites the additional element of “wherein the assigned modules along the path form a neural network, and wherein the example x is any of: an image, image feature map derived from an image, video, audio signal, text segment, phonemes from a speech recognition pre-processing system, skeletal data produced by a system which estimates skeletal positions of humans or animals from images, sensor data, data derived from sensor data” (which are field of use limitations under MPEP § 2106.05(h); MPEP 2106.04(d); 2019 Guidance, 84 FR 50 at 55.  See, 2019 Guidance, 84 FR 50, footnote 32. [ID:(S2AP2)1130]). Nothing in the claim integrates the abstract idea into a practical application, nor does it provide significantly more than the abstract idea, and thus the claim is subject-matter ineligible.

Claim 7
Step 1:  A manufacture, as above.
Step 2A Prong 1:  The claim recites “wherein at least one of the transformers is a single convolutional layer of a neural network followed by a rectified linear unit”. This limitation merely places restrictions on the type of data used in the analysis and the technological environment in which the judicial exception is performed, and does not negate the mental nature of the underlying process.
Step 2A Prong 2, Step 2B:  This claim recites the additional element of “wherein at least one of the transformers is a single convolutional layer of a neural network followed by a rectified linear unit” (which is a field of use limitation under MPEP § 2106.05(h); MPEP 2106.04(d); 2019 Guidance, 84 FR 50 at 55.  See, 2019 Guidance, 84 FR 50, footnote 32. [ID:(S2AP2)1130]). Nothing in the claim integrates the abstract idea into a practical application, nor does it provide significantly more than the abstract idea, and thus the claim is subject-matter ineligible.

Claim 8
Step 1:  A manufacture, as above.
Step 2A Prong 1:  The claim recites “having been formed using a growing process which is dependent on a set of training data used to train the predictor, and wherein the training data comprises any of: an image, image feature map derived from an image, video, audio signal, text segment, phonemes from a speech recognition pre- processing system, skeletal data produced by a system which estimates skeletal positions of humans or animals from images, sensor data, data derived from sensor data”. These limitations merely places restrictions on the type of data used in the analysis and the technological environment in which the judicial exception is performed, and does not negate the mental nature of the underlying process.
Step 2A Prong 2, Step 2B:  This claim recites the additional elements of “having been formed using a growing process which is dependent on a set of training data used to train the predictor, and wherein the training data comprises any of: an image, image feature map derived from an image, video, audio signal, text segment, phonemes from a speech recognition pre- processing system, skeletal data produced by a system which estimates skeletal positions of humans or animals from images, sensor data, data derived from sensor data” (which are field of use limitations under MPEP § 2106.05(h); MPEP 2106.04(d); 2019 Guidance, 84 FR 50 at 55.  See, 2019 Guidance, 84 FR 50, footnote 32. [ID:(S2AP2)1130]). Nothing in the claim integrates the abstract idea into a practical application, nor does it provide significantly more than the abstract idea, and thus the claim is subject-matter ineligible.

Claim 9
Step 1:  A manufacture, as above.
Step 2A Prong 1:  The claim recites “wherein the outcome is a class label and the example is a voxel of a medical image, and wherein the predictor is used for medical image analysis”. These limitations merely places restrictions on the type of data used in the analysis and the technological environment in which the judicial exception is performed, and does not negate the mental nature of the underlying process.
Step 2A Prong 2, Step 2B:  This claim recites the additional elements of “wherein the outcome is a class label and the example is a voxel of a medical image, and wherein the predictor is used for medical image analysis” (which are field of use limitations under MPEP § 2106.05(h); MPEP 2106.04(d); 2019 Guidance, 84 FR 50 at 55.  See, 2019 Guidance, 84 FR 50, footnote 32. [ID:(S2AP2)1130]). Nothing in the claim integrates the abstract idea into a practical application, nor does it provide significantly more than the abstract idea, and thus the claim is subject-matter ineligible.

Claim 10
Step 1:  A manufacture, as above.
Step 2A Prong 1:  The claim recites “compute a non-linear function which acts to filter the medical image”. Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mathematical concept of computing a non-linear function.
Step 2A Prong 2, Step 2B:  This claim recites the additional elements of “wherein the assigned modules which are assigned to edges of the decision tree are transformers” and “where a plurality of different transformers are used” (which are field of use limitations under MPEP § 2106.05(h); MPEP 2106.04(d); 2019 Guidance, 84 FR 50 at 55.  See, 2019 Guidance, 84 FR 50, footnote 32. [ID:(S2AP2)1130]). Further, the “transformers” are generic computer components recited in a manner that represents no more than mere instructions to apply the judicial exception on a computer (see MPEP § 2106.05(f)).  Nothing in the claim integrates the abstract idea into a practical application, nor does it provide significantly more than the abstract idea, and thus the claim is subject-matter ineligible.

Claim 11
Step 1:  The claim recites a method; therefore, it is directed to the statutory category of a process.
Step 2A Prong 1:  The claim recites, inter alia:
computes a binary outcome for selecting a child node of the internal node: Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of computing a binary outcome for selecting child node of a decision tree, which is an evaluation or observation capable of practically being performed in the human mind with the assistance of pen and paper.  For example, one can, under a BRI and based on a decision criterion, decide to select a child node of a current node based on a predetermined criterion.
grow the decision tree by, for a current node in a layer of the tree furthest from the root node, deciding whether to: add another module to the incoming edge of the current node, add another node to the current node, or terminate growing for the current node; wherein the decision is made by using a validation set of the training examples:  Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of growing or creating a decision tree, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  Specifically, the additional elements consist of “storing in a memory a plurality of training examples comprising examples x for which outcomes y are known; accessing from the memory at least one decision tree comprising a plurality of nodes connected by edges, the nodes comprising a root node, internal nodes and leaf nodes; wherein individual ones of the nodes and individual ones of the edges each have an assigned module comprising parameterized, differentiable operations” and “a processor”.  The additional elements of “a memory”,  “a processor”, and “the module” are generic computer components recited in a manner that represents no more than mere instructions to apply the judicial exception on a computer (see MPEP § 2106.05(f)).  The additional element of “storing in a memory a plurality of training examples comprising examples x for which outcomes y are known; accessing from the memory at least one decision tree comprising a plurality of nodes connected by edges, the nodes comprising a root node, internal nodes and leaf nodes” is insignificant extra-solution activity that does not amount to an inventive concept (see MPEP §2106.05 (g); “mere data gathering”).  Thus the additional elements do not provide any meaningful limits on the execution of the abstract idea. Even when viewed in combination, these additional elements do not integrate the abstract idea into a practical application and the claim is thus directed to the abstract idea
Step 2B:  The claim does not contain significantly more than the judicial exception.  The additional elements of “a memory”,  “a processor”, and “the module” are generic computer components recited in a manner that represents no more than mere instructions to apply the judicial exception on a computer (see MPEP § 2106.05(f)).  The additional element of “storing in a memory a plurality of training examples comprising examples x for which outcomes y are known; accessing from the memory at least one decision tree comprising a plurality of nodes connected by edges, the nodes comprising a root node, internal nodes and leaf nodes” is insignificant extra-solution activity that does not amount to an inventive concept (see MPEP §2106.05 (g); “mere data gathering”). Nothing in the claim provides significantly more than that abstract idea.  As such, the claim is ineligible.

Claim 12
Step 1:  A process, as above.
Step 2A Prong 1:  The claim recites “constructing a first model by simulating splitting of the current node by adding a router module, and constructing a second model by simulating increasing the depth of an incoming edge of the current node by adding a transformer module”. Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental processes of constructing two models through simulations, which are observations or evaluations capable of being practically performed in the human mind with the assistance of pen and paper.
Step 2A Prong 2, Step 2B:  This claim recites the additional elements of “wherein the training examples comprise any of: an image, image feature map derived from an image, video, audio signal, text segment, phonemes from a speech recognition pre- processing system, skeletal data produced by a system which estimates skeletal positions of humans or animals from images, sensor data, data derived from sensor data” (which is a field of use limitation under MPEP § 2106.05(h); MPEP 2106.04(d); 2019 Guidance, 84 FR 50 at 55.  See, 2019 Guidance, 84 FR 50, footnote 32. [ID:(S2AP2)1130]). Nothing in the claim integrates the abstract idea into a practical application, nor does it provide significantly more than the abstract idea, and thus the claim is subject-matter ineligible.

Claim 13
Step 1:  A process, as above.
Step 2A Prong 1:  The claim recites “fixing the parameters of the decision tree in the first and second models, except for the parameters of modules added in the simulation, and computing a local optimization using the training data to adjust the non-fixed parameters”. Under its broadest reasonable interpretation in light of the specification, these limitations encompass the mental processes of fixing parameters in models and computing a local optimization, which are observations or evaluations capable of being practically performed in the human mind with the assistance of pen and paper.
Step 2A Prong 2, Step 2B:  This claim does not recite any additional elements that integrate the abstract idea into a practical application or provides significantly more than the abstract idea, and thus the claim is subject-matter ineligible.

Claim 14
Step 1:  A process, as above.
Step 2A Prong 1:  The claim recites “making the decision by assessing the performance of: the first model, the second model, and the decision tree before any changes, using the validation training examples and selecting according to a most accurate one of these options”. Under its broadest reasonable interpretation in light of the specification, these limitations encompass the mental processes of assessing performances of the models and selecting an option, which are observations or evaluations capable of being practically performed in the human mind with the assistance of pen and paper.
Step 2A Prong 2, Step 2B:  This claim does not recite any additional elements that integrate the abstract idea into a practical application or provides significantly more than the abstract idea, and thus the claim is subject-matter ineligible.

Claim 15
Step 1:  A process, as above.
Step 2A Prong 1:  The claim recites “refining the decision tree by computing a global optimization over parameters of the modules using the training examples”. Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of refining a decision tree, which are observations or evaluations capable of being practically performed in the human mind with the assistance of pen and paper.
Step 2A Prong 2, Step 2B:  This claim recites the additional elements of “wherein the training examples comprise any of: an image, image feature map derived from an image, video, audio signal, text segment, phonemes from a speech recognition pre-processing system, skeletal data produced by a system which estimates skeletal positions of humans or animals from images, sensor data, data derived from sensor data” (which is a field of use limitation under MPEP § 2106.05(h); MPEP 2106.04(d); 2019 Guidance, 84 FR 50 at 55.  See, 2019 Guidance, 84 FR 50, footnote 32. [ID:(S2AP2)1130]). Nothing in the claim integrates the abstract idea into a practical application, nor does it provide significantly more than the abstract idea, and thus the claim is subject-matter ineligible.

Claim 16
Step 1:  A process, as above.
Step 2A Prong 1:  The claim recites “wherein the global optimization jointly optimizes a hierarchical grouping of data to paths on the decision tree and neural networks associated with those paths”. Under its broadest reasonable interpretation in light of the specification, these limitations encompass the mental processes of optimizing a group of data paths, which are observations or evaluations capable of being practically performed in the human mind with the assistance of pen and paper.
Step 2A Prong 2, Step 2B:  This claim does not recite any additional elements that integrate the abstract idea into a practical application or provides significantly more than the abstract idea, and thus the claim is subject-matter ineligible.

Claim 17
Step 1:  The claim recites a predictor; therefore, it is directed to the statutory category of a manufacture.
Step 2A Prong 1:  The claim recites, inter alia:
computes a binary outcome for selecting a child node of the internal node: Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of computing a binary outcome for selecting child node of a decision tree, which is an evaluation or observation capable of practically being performed in the human mind with the assistance of pen and paper.  For example, one can, under a BRI and based on a decision criterion, decide to select a child node of a current node based on a predetermined criterion.
compute the prediction by processing the example x using a plurality of the differentiable operations selected according to a path through the tree from the root node to a leaf node:  Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of computing a prediction using an input example using operations selected along a path in a decision tree, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  Specifically, the additional elements consist of “a memory which stores at least one example x for which an outcome y is not known; the memory storing at least one decision tree comprising a plurality of nodes connected by edges, the nodes comprising a root node, internal nodes and leaf nodes; wherein, individual ones of the nodes and individual ones of the edges each have an assigned module, comprising parameterized, differentiable operations”, “wherein the decision tree has been formed using a growing process which is dependent on a set of training data comprising examples x for which outcomes y are known”, and “a processor”.  The additional elements of “a memory”,  “a processor”, and “the module” are generic computer components recited in a manner that represents no more than mere instructions to apply the judicial exception on a computer (see MPEP § 2106.05(f)).  The additional element of “the memory storing at least one decision tree comprising a plurality of nodes connected by edges, the nodes comprising a root node, internal nodes and leaf nodes” is insignificant extra-solution activity that does not amount to an inventive concept (see MPEP §2106.05 (g); “mere data gathering”).  The element “wherein the decision tree has been formed using a growing process which is dependent on a set of training data comprising examples x for which outcomes y are known” is a field of use limitation under MPEP § 2106.05(h). Thus the additional elements do not provide any meaningful limits on the execution of the abstract idea. Even when viewed in combination, these additional elements do not integrate the abstract idea into a practical application and the claim is thus directed to the abstract idea
Step 2B:  The claim does not contain significantly more than the judicial exception.  The additional elements of “a memory”,  “a processor”, and “the module” are generic computer components recited in a manner that represents no more than mere instructions to apply the judicial exception on a computer (see MPEP § 2106.05(f)).  The additional element of “the memory storing at least one decision tree comprising a plurality of nodes connected by edges, the nodes comprising a root node, internal nodes and leaf nodes” is insignificant extra-solution activity that does not amount to an inventive concept (see MPEP §2106.05 (g); “mere data gathering”). The element “wherein the decision tree has been formed using a growing process which is dependent on a set of training data comprising examples x for which outcomes y are known” is a field of use limitation under MPEP § 2106.05(h).  Nothing in the claim provides significantly more than that abstract idea.  As such, the claim is ineligible.


Claim 18
Step 1:  A manufacture, as above.
Step 2A Prong 1:  The claim recites “wherein the examples comprise medical image data and the outcomes are class labels”. These limitations merely places restrictions on the type of data used in the analysis and the technological environment in which the judicial exception is performed, and does not negate the mental nature of the underlying process.
Step 2A Prong 2, Step 2B:  This claim recites the additional elements of “wherein the examples comprise medical image data and the outcomes are class labels” (which are field of use limitations under MPEP § 2106.05(h); MPEP 2106.04(d); 2019 Guidance, 84 FR 50 at 55.  See, 2019 Guidance, 84 FR 50, footnote 32. [ID:(S2AP2)1130]). Nothing in the claim integrates the abstract idea into a practical application, nor does it provide significantly more than the abstract idea, and thus the claim is subject-matter ineligible.

Claim 19
Step 1:  A manufacture, as above.
Step 2A Prong 1:  The claim recites “wherein the assigned modules on the individual ones of the edges are non-linear filters which act to filter the medical image”. This limitation merely places restrictions on the type of data used in the analysis and the technological environment in which the judicial exception is performed, and does not negate the mental nature of the underlying process.
Step 2A Prong 2, Step 2B:  This claim recites the additional element of “wherein the assigned modules on the individual ones of the edges are non-linear filters which act to filter the medical image” (which is a field of use limitation under MPEP § 2106.05(h); MPEP 2106.04(d); 2019 Guidance, 84 FR 50 at 55.  See, 2019 Guidance, 84 FR 50, footnote 32. [ID:(S2AP2)1130]). Nothing in the claim integrates the abstract idea into a practical application, nor does it provide significantly more than the abstract idea, and thus the claim is subject-matter ineligible.

Claim 20
Step 1:  A manufacture, as above.
Step 2A Prong 1:  The claim recites “wherein there are a plurality of different non- linear filters on individual ones of the edges”. This limitation merely places restrictions on the type of data used in the analysis and the technological environment in which the judicial exception is performed, and does not negate the mental nature of the underlying process.
Step 2A Prong 2, Step 2B:  This claim recites the additional element of “wherein there are a plurality of different non- linear filters on individual ones of the edges” (which is a field of use limitation under MPEP § 2106.05(h); MPEP 2106.04(d); 2019 Guidance, 84 FR 50 at 55.  See, 2019 Guidance, 84 FR 50, footnote 32. [ID:(S2AP2)1130]). Nothing in the claim integrates the abstract idea into a practical application, nor does it provide significantly more than the abstract idea, and thus the claim is subject-matter ineligible.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-7 are rejected under 35 U.S.C. § 103 as being obvious over Xiao (Xiao, “NDT: Neural Decision Tree Towards Fully Functioned Neural Graph”, Dec. 16, 2017, arXiv:1712.05934v1, pp. 1-8, hereinafter “Xiao”) in view of Yamagami et al. (US 20180005126 A1, hereinafter “Yamagami”).

	Regarding claim 1, Xiao discloses [a] predictor for predicting an outcome y given an example x, comprising: (Abstract; “we propose the neural decision tree (NDT), which takes simplified neural networks as decision function in each branch and employs complex neural networks to generate the output in each leaf”, which discloses a predictor I the form of a neural decision tree that inherently processes received inputs or examples x to predict an outcome y; and Page 3, Figure 2;  the figure discloses the neural tree structure that takes an input example x to predict an outcome y in the form of a target output)
a memory which stores at least one example x for which an outcome y is not known; (Page 5, Experiment; the experiment section inherently uses a memory that stores inputs that are used in the experiment; and Page 3, Figure 2; the figure discloses the input example (input) x for which an outcome y (target outcome) is not known)
the memory (Page 5, Experiment; the experiment section inherently uses a memory that stores inputs that are used in the experiment) storing at least one decision tree comprising a plurality of nodes connected by edges, the nodes comprising a root node, internal nodes and leaf nodes; (Abstract; “neural decision tree”; and Page 3, Figure 2;  the figure discloses the plurality of nodes connected by edges indicated by arrows, the nodes comprising a root node (upper-most condition network in the figure), internal nodes (lower condition networks in the figure), and leaf nodes (target network))
wherein, individual ones of the nodes  . . .  each have an assigned module, comprising parameterized, differentiable operations, such that for each of the internal nodes the module computes a binary outcome for selecting a child node of the internal node; (Page 3, Figure 2;  the figure discloses, under a broadest reasonable interpretation of the claim language, wherein each of the individual one of the nodes each have an assigned module in the form of a respective condition network, and each of the internal nodes or condition networks the module computes a binary outcome in that the condition network splits according to >0 or <=0 for selecting a child node (one of the condition networks below a parent condition network) of the internal node; and Page 2, Column 2; “we employ a simplified neural network as condition network, which is usually a one- or two-layer multi-perceptions with the non-linear function of tanh”, which discloses that each node comprises parameterizes differentiable operations in the form of a tanh operation) 
a processor configured to compute the prediction y by processing the example x using a plurality of the differentiable operations selected according to a path through the tree from the root node to a leaf node (Page 3, Figure 2; the figure discloses, under a broadest reasonable interpretation of the claim language, a processor (that is inherently used in the experiments section of Xiao) that is configured to compute a prediction y (target output in the figure)  by processing the example x (input in the figure) using a plurality of differentiable operations (tanh as discussed above selected according to a path through the tree from the root node (upper most condition network in the figure) to a leaf node (target network)).
Xiao fails to explicitly disclose but Yamagami discloses wherein, individual ones of the . . .  edges each have an assigned module (Figure 7, Elements E1 and E2 and [0032]; “Each node is assigned an attribute to be checked (the node N1 is assigned x1, for example). Edges E1 and E2 represented by line segments are referred to edges of the decision tree (the same is true of an edge having a line segment with no label). The edges are arranged in view of the number of types of attribute values that are obtained by checking the attributes of the nodes to which the upper ends of the edges are connected. For example, the edge E1 corresponds to an attribute value of 1 of the attribute x1 and the edge E2 corresponds to an attribute value of 0 of the attribute x1” (emphasis added), which discloses, under a broadest reasonable interpretation of the claim language, wherein the individual ones of the edges (such as E1 or E2) have an assigned module or node to which the upper end of the edges are connected; and [0033]; “Which edge to be routed is selected, depending on the attribute value obtained as a result of checking”; and [0077]; “The decision tree generator 11 thus successively determines the attribute having a maximum information gain to be a node of the decision tree, successively assigns to an edge of the node the attribute value of the attribute having the maximum information gain, and thus generates a single decision tree from the multiple pieces of classification target data” (emphasis added)).
Xiao and Yamagami are analogous art because both are concerned with decision tree structures.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in decision tree structures to combine the assigning of modules to edges of decision trees as taught by Yamagami with the predictor of Xiao to yield the predictable result of wherein, individual ones of the nodes and individual ones of the edges each have an assigned module, comprising parameterized, differentiable operations, such that for each of the internal nodes the module computes a binary outcome for selecting a child node of the internal node. The motivation for doing so would be to generate a decision tree that is to be used to determine an order of inquiries asking about attributes in order to classify pieces of classification target data by successively assigning the attribute value of the attribute having the maximum information gain to an edge of the node (Yamagami; [0005]).

Regarding claim 2, the rejection of claim 1 is incorporated and Xiao further discloses wherein the assigned modules along the path form a neural network, and (Page 2, Column 2; “we employ a simplified neural network as condition network, which is usually a one- or two-layer multi-perceptions”; and Page 3, Figure 2)
wherein the example x is any of: an image, image feature map derived from an image, video, audio signal, text segment, phonemes from a speech recognition pre-processing system, skeletal data produced by a system which estimates skeletal positions of humans or animals from images, sensor data, data derived from sensor data (Page 1, Column 2; “With this proposed principle from the seminal work, we attempt to tackle image classification”, which discloses wherein the example is an image; and Page 5, §5.2;  the section discloses using input images).

Regarding claim 3, the rejection of claim 1 is incorporated and Xiao further discloses wherein the assigned modules which are assigned to internal nodes of the decision tree are routers configured to compute a binary decision in a stochastic manner according to characteristics of the processed example, and (Page 3, Figure 2; the figure discloses wherein the assigned modules are assigned to internal nodes or condition networks that compute a binary decision (>=0 or <0) in a stochastic manner)
wherein the example x is any of: an image, image feature map derived from an image, video, audio signal, text segment, phonemes from a speech recognition pre-processing system, skeletal data produced by a system which estimates skeletal positions of humans or animals from images, sensor data, data derived from sensor data (Page 1, Column 2; “With this proposed principle from the seminal work, we attempt to tackle image classification”, which discloses wherein the example is an image; and Page 5, §5.2;  the section discloses using input images).

Regarding claim 4, the rejection of claims 1 and 3 are incorporated and Xiao further discloses wherein at least one of the routers comprises a processor for computing the binary decision according to samples from a probability distribution with a mean corresponding to a current input to the decision tree (Page 5, §5; the experiments section discloses the inherent processor used in the experiment for computing the binary decision according to samples from a probability distribution with a mean corresponding to a current input into the decision tree as a multitude of test samples are used in the study corresponding to the current input).

Regarding claim 5, the rejection of claim 1 is incorporated and Xiao further discloses wherein the assigned modules which are assigned to leaf nodes of the decision tree are solvers configured to operate on transformed input data received at the solver and to output an estimate of a conditional distribution expressing the probability of the outcome y given the example x, (Page 3, Column 2; “To finally predict the category of each sample, we apply a complex network as the target network, which often is a stacked convolution one for image or an LSTM for sentence”; and Figure 2; “Target network”)
wherein the example x is any of: an image, image feature map derived from an image, video, audio signal, text segment, phonemes from a speech recognition pre-processing system, skeletal data produced by a system which estimates skeletal positions of humans or animals from images, sensor data, data derived from sensor data (Page 1, Column 2; “With this proposed principle from the seminal work, we attempt to tackle image classification”, which discloses wherein the example is an image; and Page 5, §5.2;  the section discloses using input images).

Regarding claim 6, the rejection of claim 1 is incorporated and Xiao further discloses wherein the assigned modules which are assigned to edges of the decision tree are transformers, each transformer configured to compute a non-linear function of an example or a processed example reaching the edge from a parent node, and (Page 2, Column 2; “To exactly pre-classify each sample, we employ a simplified neural network as condition network, which is usually a one- or two-layer multi-perceptions with the non-linear function of tanh)
wherein the example x is any of: an image, image feature map derived from an image, video, audio signal, text segment, phonemes from a speech recognition pre-processing system, skeletal data produced by a system which estimates skeletal positions of humans or animals from images, sensor data, data derived from sensor data (Page 1, Column 2; “With this proposed principle from the seminal work, we attempt to tackle image classification”, which discloses wherein the example is an image; and Page 5, §5.2;  the section discloses using input images).

Regarding claim 7, the rejection of claims 1 and 6 are incorporated and Xiao further discloses wherein at least one of the transformers is a single convolutional layer of a neural network followed by a rectified linear unit (Page 5, Column 2; “CNN-based architecture LeNet-5 with dropout and ReLUs, classic linear classifier SVM with RBF kernel”; and Page 2, Column 2; “a one- or two-layer multi-perceptions with the non-linear function of tanh. This layer is only applied in the inner nodes of decision tree”).

Claims 8 and 11-17 are rejected under 35 U.S.C. § 103 as being obvious over Xiao in view of Yamagami and further in view of Bulo et al. (Bulo et al., “Neural Decision Forests for Semantic Image Labelling”, 2014,  Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-8, hereinafter “Bulo”).

Regarding claim 11, Xiao discloses [a] computer-implemented method of training a predictor to predict an outcome y given an example x, the method comprising: (Abstract; “we propose the neural decision tree (NDT), which takes simplified neural networks as decision function in each branch and employs complex neural networks to generate the output in each leaf”, which discloses a predictor I the form of a neural decision tree that inherently processes received inputs or examples x to predict an outcome y; and Page 3, Figure 2;  the figure discloses the training of a neural tree structure that takes an input example x to predict an outcome y in the form of a target output, and this is inherently done on a computer)
storing in a memory a plurality of training examples comprising examples x for which outcomes y are known; (Page 3, Figure 2; Page 3, Figure 2; the figure discloses the input example (input) x for which an outcome y (target outcome) is known during the training phase of building the decision tree and Page 5, §5.2;  the section discloses the use of training examples used in the experiment that inherently uses and stores information in memory, where the training examples have known outcomes for a given input in creating the decision tree)
accessing from the memory at least one decision tree comprising a plurality of nodes connected by edges, the nodes comprising a root node, internal nodes and leaf nodes; ((Abstract; “neural decision tree”; and Page 3, Figure 2;  the figure discloses the plurality of nodes connected by edges indicated by arrows, the nodes comprising a root node (upper-most condition network in the figure), internal nodes (lower condition networks in the figure), and leaf nodes (target network). Note that the experiments section §5 on page 5 of Xiao inherently uses accessible memory in which the tree and its corresponding structure is stored)
wherein individual ones of the nodes  . . .  each have an assigned module comprising parameterized, differentiable operations, such that for each of the internal nodes the module computes a binary outcome for selecting a child node of the internal node; (Page 3, Figure 2;  the figure discloses, under a broadest reasonable interpretation of the claim language, wherein each of the individual one of the nodes each have an assigned module in the form of a respective condition network, and each of the internal nodes or condition networks the module computes a binary outcome in that the condition network splits according to >0 or <=0 for selecting a child node (one of the condition networks below a parent condition network) of the internal node; and Page 2, Column 2; “we employ a simplified neural network as condition network, which is usually a one- or two-layer multi-perceptions with the non-linear function of tanh”, which discloses that each node comprises parameterizes differentiable operations in the form of a tanh operation) 
wherein the decision is made by using a validation set of the training examples (Page 5, §5.2; the section discloses making a decision using a validation set of the training examples (10000 test samples).
Xiao fails to explicitly disclose but Yamagami discloses wherein, individual ones of the . . .  edges each have an assigned module (Figure 7, Elements E1 and E2 and [0032]; “Each node is assigned an attribute to be checked (the node N1 is assigned x1, for example). Edges E1 and E2 represented by line segments are referred to edges of the decision tree (the same is true of an edge having a line segment with no label). The edges are arranged in view of the number of types of attribute values that are obtained by checking the attributes of the nodes to which the upper ends of the edges are connected. For example, the edge E1 corresponds to an attribute value of 1 of the attribute x1 and the edge E2 corresponds to an attribute value of 0 of the attribute x1” (emphasis added), which discloses, under a broadest reasonable interpretation of the claim language, wherein the individual ones of the edges (such as E1 or E2) have an assigned module or node to which the upper end of the edges are connected; and [0033]; “Which edge to be routed is selected, depending on the attribute value obtained as a result of checking”; and [0077]; “The decision tree generator 11 thus successively determines the attribute having a maximum information gain to be a node of the decision tree, successively assigns to an edge of the node the attribute value of the attribute having the maximum information gain, and thus generates a single decision tree from the multiple pieces of classification target data” (emphasis added)).
The motivation to combine Xiao and Yamagami is the same as discussed above with respect to claim 1.
Xiao fails to explicitly disclose but Bulo discloses a processor configured to grow the decision tree by, for a current node in a layer of the tree furthest from the root node, deciding whether to: add another module to the incoming edge of the current node, add another node to the current node, or terminate growing for the current node; (Page 3, Column 2; “The standard approach to training a random decision tree of a RF consists in a recursive procedure that starts from the root and iteratively builds the tree by splitting the actual terminal node”, the building of the tree is a decision to add another module to the incoming edge of the current node, adding another node to the current node, or terminate growing for the current node; and Page 3, §3; the section further discusses the building of the neural tree and termination of growing conditions as claimed.  Note that the experiments section of Bulo inherently uses a processor).
Xiao, Yamagami, and Bulo are analogous art because all are concerned with decision tree structures.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in decision tree structures to combine the growing of decision trees as taught by Bulo with the method of Xiao and Yamagami to yield the predictable result of a processor configured to grow the decision tree by, for a current node in a layer of the tree furthest from the root node, deciding whether to: add another module to the incoming edge of the current node, add another node to the current node, or terminate growing for the current node;. The motivation for doing so would be to select the best split function and the best predictions for the children nodes of a decision tree structure (Bulo; Page 4, Column 1).

Regarding claim 17, Xiao discloses [a] predictor, comprising: a memory which stores at least one example x for which an outcome y is not known; (Page 35 of 37Abstract; “we propose the neural decision tree (NDT), which takes simplified neural networks as decision function in each branch and employs complex neural networks to generate the output in each leaf”, which discloses a predictor I the form of a neural decision tree that inherently processes received inputs or examples x to predict an outcome y; and Page 5, Experiment; the experiment section inherently uses a memory that stores inputs that are used in the experiment; and Page 3, Figure 2; the figure discloses the input example (input) x for which an outcome y (target outcome) is not known)
UTILITY PATENTMS Docket No. 404783-US-NPthe memory storing at least one decision tree comprising a plurality of nodes connected by edges, the nodes comprising a root node, internal nodes and leaf nodes; (Page 5, Experiment; the experiment section inherently uses a memory that stores inputs that are used in the experiment; and Abstract; “neural decision tree”; and Page 3, Figure 2;  the figure discloses the plurality of nodes connected by edges indicated by arrows, the nodes comprising a root node (upper-most condition network in the figure), internal nodes (lower condition networks in the figure), and leaf nodes (target network))
wherein, individual ones of the nodes . . .  each have an assigned module, comprising parameterized, differentiable operations, such that for each of the internal nodes the module computes a binary outcome for selecting a child node of the internal node; (Page 3, Figure 2;  the figure discloses, under a broadest reasonable interpretation of the claim language, wherein each of the individual one of the nodes each have an assigned module in the form of a respective condition network, and each of the internal nodes or condition networks the module computes a binary outcome in that the condition network splits according to >0 or <=0 for selecting a child node (one of the condition networks below a parent condition network) of the internal node; and Page 2, Column 2; “we employ a simplified neural network as condition network, which is usually a one- or two-layer multi-perceptions with the non-linear function of tanh”, which discloses that each node comprises parameterizes differentiable operations in the form of a tanh operation) 
a processor configured to compute the prediction y by processing the example x using a plurality of the differentiable operations selected according to a path through the tree from the root node to a leaf node (Page 3, Figure 2; the figure discloses, under a broadest reasonable interpretation of the claim language, a processor (that is inherently used in the experiments section of Xiao) that is configured to compute a prediction y (target output in the figure)  by processing the example x (input in the figure) using a plurality of differentiable operations (tanh as discussed above selected according to a path through the tree from the root node (upper most condition network in the figure) to a leaf node (target network)).
Xiao fails to explicitly disclose but Yamagami discloses wherein, individual ones of the . . .  edges each have an assigned module (Figure 7, Elements E1 and E2 and [0032]; “Each node is assigned an attribute to be checked (the node N1 is assigned x1, for example). Edges E1 and E2 represented by line segments are referred to edges of the decision tree (the same is true of an edge having a line segment with no label). The edges are arranged in view of the number of types of attribute values that are obtained by checking the attributes of the nodes to which the upper ends of the edges are connected. For example, the edge E1 corresponds to an attribute value of 1 of the attribute x1 and the edge E2 corresponds to an attribute value of 0 of the attribute x1” (emphasis added), which discloses, under a broadest reasonable interpretation of the claim language, wherein the individual ones of the edges (such as E1 or E2) have an assigned module or node to which the upper end of the edges are connected; and [0033]; “Which edge to be routed is selected, depending on the attribute value obtained as a result of checking”; and [0077]; “The decision tree generator 11 thus successively determines the attribute having a maximum information gain to be a node of the decision tree, successively assigns to an edge of the node the attribute value of the attribute having the maximum information gain, and thus generates a single decision tree from the multiple pieces of classification target data” (emphasis added)).
The motivation to combine Xiao and Yamagami is the same as discussed above with respect to claim 1.
Xiao fails to explicitly disclose but Bulo discloses wherein the decision tree has been formed using a growing process which is dependent on a set of training data comprising examples x for which outcomes y are known (Page 3, Column 2; “The standard approach to training a random decision tree of a RF consists in a recursive procedure that starts from the root and iteratively builds the tree by splitting the actual terminal node”, the building of the tree is a decision to add another module to the incoming edge of the current node, adding another node to the current node, or terminate growing for the current node)
The motivation to combine Xiao, Yamagami, and Bulo is the same as discussed above with respect to claim 11.

Regarding claim 8, the rejection of claim 1 is incorporated and Xiao further discloses wherein the training data comprises any of: an image, image feature map derived from an image, video, audio signal, text segment, phonemes from a speech recognition pre-processing system, skeletal data produced by a system which estimates skeletal positions of humans or animals from images, sensor data, data derived from sensor data (Page 1, Column 2; “With this proposed principle from the seminal work, we attempt to tackle image classification”, which discloses wherein the example is an image; and Page 5, §5.2;  the section discloses using input images).
Xiao fails to explicitly disclose but Bulo further discloses having been formed using a growing process which is dependent on a set of training data used to train the predictor (Page 3, Column 2; “The standard approach to training a random decision tree of a RF consists in a recursive procedure that starts from the root and iteratively builds the tree by splitting the actual terminal node”, the building of the tree is a decision to add another module to the incoming edge of the current node, adding another node to the current node, or terminate growing for the current node)
The motivation to combine Xiao, Yamagami, and Bulo is the same as discussed above with respect to claim 11.

Regarding claim 12, the rejection of claim 11 is incorporated and Xiao further discloses wherein making the decision comprises constructing a first model by simulating splitting of the current node by adding a router module, and (Page 3, Figure 2; the figure discloses wherein the assigned modules are assigned to internal nodes or condition networks (router module) that compute a binary decision (>=0 or <0) in a stochastic manner)
constructing a second model by simulating increasing the depth of an incoming edge of the current node by adding a transformer module, and (Page 2, Column 2; “To exactly pre-classify each sample, we employ a simplified neural network as condition network, which is usually a one- or two-layer multi-perceptions with the non-linear function of tanh; and Page 5, Table 1;  the table shows the multiple models at a certain depth; and Page 6, Table 2;  the table shows the multiple models at an increased depth that adds further transformer modules)
wherein Page 34 of 37UTILITY PATENT MS Docket No. 404783-US-NPthe training examples comprise any of: an image, image feature map derived from an image, video, audio signal, text segment, phonemes from a speech recognition pre- processing system, skeletal data produced by a system which estimates skeletal positions of humans or animals from images, sensor data, data derived from sensor data (Page 1, Column 2; “With this proposed principle from the seminal work, we attempt to tackle image classification”, which discloses wherein the training example is an image; and Page 5, §5.2;  the section discloses using input images).

Regarding claim 13, the rejection of claims 11 and 12 are incorporated and Xiao further discloses making the decision by, fixing the parameters of the decision tree in the first and second models, except for the parameters of modules added in the simulation, and computing a local optimization using the training data to adjust the non-fixed parameters (Page 5, §5.1; “Regarding the condition network, we apply a two-layer fully connected perceptions, with the hyper-parameter input-300-1 for MNIST and input-3000-1 for CIFAR. Regarding the target network, we also employ a three-layer fully connected perceptions, with the hyper-parameter input-300-100-10 for MNIST, input3000-1000-10 for CIFAR-10 and input-3000-1000-100 for CIFAR-100. 1 To train the model, we leverage AdaDelta (Zeiler, 2012) as our optimizer, with hyper-parameter as moment factor η = 0.6 and  = 1 × 10−6 . We train the model until convergence, but at most 1,000 rounds. Regarding the batch size, we always choose the largest one to fully utilize the computing devices. Notably, the hyper-parameters of approximated continuous function is α = 1000”).

Regarding claim 14, the rejection of claims 11, 12, and 13 are incorporated and Xiao further discloses making the decision by assessing the performance of: the first model, the second model, and the decision tree before any changes, using the validation training examples and selecting according to a most accurate one of these options (Page 5, §5.1; “Regarding the condition network, we apply a two-layer fully connected perceptions, with the hyper-parameter input-300-1 for MNIST and input-3000-1 for CIFAR. Regarding the target network, we also employ a three-layer fully connected perceptions, with the hyper-parameter input-300-100-10 for MNIST, input3000-1000-10 for CIFAR-10 and input-3000-1000-100 for CIFAR-100. 1 To train the model, we leverage AdaDelta (Zeiler, 2012) as our optimizer, with hyper-parameter as moment factor η = 0.6 and  = 1 × 10−6 . We train the model until convergence, but at most 1,000 rounds. Regarding the batch size, we always choose the largest one to fully utilize the computing devices. Notably, the hyper-parameters of approximated continuous function is α = 1000”; and Page 5, §5.2).

Regarding claim 15, the rejection of claim 11 is incorporated and Xiao further discloses refining the decision tree by computing a global optimization over parameters of the modules using the training examples, (Page 5, §5.1; “Regarding the condition network, we apply a two-layer fully connected perceptions, with the hyper-parameter input-300-1 for MNIST and input-3000-1 for CIFAR. Regarding the target network, we also employ a three-layer fully connected perceptions, with the hyper-parameter input-300-100-10 for MNIST, input3000-1000-10 for CIFAR-10 and input-3000-1000-100 for CIFAR-100. 1 To train the model, we leverage AdaDelta (Zeiler, 2012) as our optimizer, with hyper-parameter as moment factor η = 0.6 and  = 1 × 10−6 . We train the model until convergence, but at most 1,000 rounds”, convergence being the global optimization)
wherein Page 34 of 37UTILITY PATENT MS Docket No. 404783-US-NPthe training examples comprise any of: an image, image feature map derived from an image, video, audio signal, text segment, phonemes from a speech recognition pre- processing system, skeletal data produced by a system which estimates skeletal positions of humans or animals from images, sensor data, data derived from sensor data (Page 1, Column 2; “With this proposed principle from the seminal work, we attempt to tackle image classification”, which discloses wherein the training example is an image; and Page 5, §5.2;  the section discloses using input images).

Regarding claim 16, the rejection of claims 11 and 15 are incorporated and Xiao further discloses wherein the global optimization jointly optimizes a hierarchical grouping of data to paths on the decision tree and neural networks associated with those paths (Page 5, §5.1; “Regarding the condition network, we apply a two-layer fully connected perceptions, with the hyper-parameter input-300-1 for MNIST and input-3000-1 for CIFAR. Regarding the target network, we also employ a three-layer fully connected perceptions, with the hyper-parameter input-300-100-10 for MNIST, input3000-1000-10 for CIFAR-10 and input-3000-1000-100 for CIFAR-100. 1 To train the model, we leverage AdaDelta (Zeiler, 2012) as our optimizer, with hyper-parameter as moment factor η = 0.6 and  = 1 × 10−6 . We train the model until convergence, but at most 1,000 rounds”, convergence being the global optimization).


Claims 9, 10, and 18-20 are rejected under 35 U.S.C. § 103 as being obvious over Xiao in view of Yamagami and further in view of Georgescu et al. (US 20160174902 A1, hereinafter “Georgescu”).

Regarding claim 9, the rejection of claim 1 is incorporated and Xiao further discloses wherein the outcome is a class label (Page 3, Figure 3; the figure discloses wherein the outcome is a class label or target output for the decision tree; and Page 3, Column 2; “Li,j is the adhoc label vector of i-th sample, where the true label position is 1 and otherwise 0”).
Xiao fails to explicitly disclose but Georgescu discloses the example is a voxel of a medical image, and wherein the predictor is used for medical image analysis ([0061]; “The first deep neural network operates directly on the voxels of the medical image, and not on handcrafted features extracted from the medical image. The first deep neural network inputs image patches centered at voxels of the medical image and calculates a number of position candidates in the medical image based on the input image patches” (emphasis added), which discloses that the example or input is a voxel or voxels of a medical image used for medical analysis; and [0107]).
Xiao, Yamagami, and Georgescu are analogous art because all are concerned with machine learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in machine learning to combine the medical image analysis of Georgescu with the predictor of Xiao and Yamagami to yield the predictable result of wherein the outcome is a class label and the example is a voxel of a medical image, and wherein the predictor is used for medical image analysis. The motivation for doing so would be to provide for anatomical object detection in medical image data using deep neural networks (Georgescu; [0002]).

Regarding claim 10, the rejection of claims 1 and 9 are incorporated and Xiao fails to explicitly disclose but Yamagami discloses wherein the assigned modules which are assigned to edges of the decision tree are transformers . . . where a plurality of different transformers are used (Figure 7, Elements E1 and E2 and [0032]; “Each node is assigned an attribute to be checked (the node N1 is assigned x1, for example). Edges E1 and E2 represented by line segments are referred to edges of the decision tree (the same is true of an edge having a line segment with no label). The edges are arranged in view of the number of types of attribute values that are obtained by checking the attributes of the nodes to which the upper ends of the edges are connected. For example, the edge E1 corresponds to an attribute value of 1 of the attribute x1 and the edge E2 corresponds to an attribute value of 0 of the attribute x1”, the edges being, under a BRI, the transformers).
The motivation to combine Xiao and Yamagami is the same as discussed above with respect to claim 1.
Xiao fails to explicitly disclose but Georgescu discloses compute a non-linear function which acts to filter the medical image ([0090]; “The bias of this neuron is then added to this linear combination, and the resulting value is transformed by a non-linear mapping to obtain the activation value”, which discloses the use of a non-linear (activation) function used in a medical image analysis; and [0124]; “To achieve significant speed-up and save memory footprint, S needs to be reduced as much as possible. However, the present inventors have determined that, with a small S (e.g., 32), it is more difficult to approximate 3D filters than 2D filters. Non-linear functions g() are exploited in neural networks to bound the response to a certain range (e.g., [0, 1] using the sigmoid function)”; and [0072]; “The learned weights shown in FIG. 8 can be treated as filters for extracting high-level image features”, the images being medical images).
The motivation to combine Xiao, Yamagami, and Georgescu is the same as discussed above with respect to claim 9.

Regarding claim 18, the rejection of claim 17 is incorporated and Xiao further discloses the outcomes are the class labels ((Page 3, Figure 3; the figure discloses wherein the outcome is a class label or target output for the decision tree; and Page 3, Column 2; “Li,j is the adhoc label vector of i-th sample, where the true label position is 1 and otherwise 0”).
Xiao fails to explicitly disclose but Georgescu discloses the examples comprise medical image data ([0061]; “The first deep neural network operates directly on the voxels of the medical image, and not on handcrafted features extracted from the medical image. The first deep neural network inputs image patches centered at voxels of the medical image and calculates a number of position candidates in the medical image based on the input image patches” (emphasis added), which discloses that the example or input is a voxel or voxels of a medical image used for medical analysis; and [0107]).
The motivation to combine Xiao, Yamagami, and Georgescu is the same as discussed above with respect to claim 9.

Regarding claim 19, the rejection of claims 17 and 18 are incorporated and Xiao further discloses the edges (Page 3, Figure 2). 
Xiao fails to explicitly disclose but Georgescu discloses wherein the assigned modules on the individual ones of the edges are non-linear filters which act to filter the medical image ([0090]; “The bias of this neuron is then added to this linear combination, and the resulting value is transformed by a non-linear mapping to obtain the activation value”, which discloses the use of a non-linear (activation) function used in a medical image analysis; and [0124]; “To achieve significant speed-up and save memory footprint, S needs to be reduced as much as possible. However, the present inventors have determined that, with a small S (e.g., 32), it is more difficult to approximate 3D filters than 2D filters. Non-linear functions g() are exploited in neural networks to bound the response to a certain range (e.g., [0, 1] using the sigmoid function)”; and [0072]; “The learned weights shown in FIG. 8 can be treated as filters for extracting high-level image features”, the images being medical images).
The motivation to combine Xiao, Yamagami, and Georgescu is the same as discussed above with respect to claim 9.

Regarding claim 20, the rejection of claims 17, 18, and 19 are incorporated and Xiao further discloses the edges (Page 3, Figure 2). 
Xiao fails to explicitly disclose but Georgescu discloses wherein there are a plurality of different non- linear filters ([0090]; “The bias of this neuron is then added to this linear combination, and the resulting value is transformed by a non-linear mapping to obtain the activation value”, which discloses the use of a non-linear (activation) function used in a medical image analysis; and [0124]; “To achieve significant speed-up and save memory footprint, S needs to be reduced as much as possible. However, the present inventors have determined that, with a small S (e.g., 32), it is more difficult to approximate 3D filters than 2D filters. Non-linear functions g() are exploited in neural networks to bound the response to a certain range (e.g., [0, 1] using the sigmoid function)”; and [0072]; “The learned weights shown in FIG. 8 can be treated as filters for extracting high-level image features”, the images being medical images).
The motivation to combine Xiao, Yamagami, and Georgescu is the same as discussed above with respect to claim 9.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Brent Hoover whose telephone number is (303)297-4403. The examiner can normally be reached Monday - Friday 9-5 MST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on 571-270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRENT JOHNSTON HOOVER/Examiner, Art Unit 2127                                                                                                                                                                                                        



    
        
            
    

    
        1 Note that the Specification appears to provide sufficient structural support for “the module” in at least paragraphs [0018] and [0075] and Figures 2 and 9 of the originally filed specification, and all of the components appear to be generic processing elements.