Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Application 16/478,138 filed 7/16/2019 have been examined.
Preliminary amendments have amended claims 3-9, 13, and 17-19.
Claims 1-20 are currently pending.



Claim Objections
Claim 16 recites: “16. The computer implemented method of testing a processing node of a taxonomy-based architecture data classifier, further comprising”.
It appears Claim 16 was intended to depend from independent claim 15 (hence the included “further comprising” language), however the claim language does not explicitly state this.
Appropriate correction is requested.










Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. 
Such claim limitation(s) is/are: 
classifying module(s) and selector module(s) in claim 1;
descriptor module in claim 2;
classifying module(s) and selector module(s) in claim 3;
classifying module(s) in claim 4;
selector module(s) in claim 5;
selector module(s) in claim 8;
classifying module(s) and selector module(s) in claim 11;
classifying module(s) and selector module(s) in claim 1;
descriptor module(s) and classifying module(s) and selector module(s) in claim 12;
classifying module(s) and selector module(s) in claim 15; and
descriptor module(s) and classifying module(s) and selector module(s) in claim 16

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.

If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.






Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.



Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an
abstract idea without significantly more.
Claim 1 recites:
Generating classification predictions.
The limitation of generating classification predictions, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “processing nodes”, nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the processing nodes language, generating in the context of this claim encompasses the user manually determining generic “predictions” regarding generic “classification”. Similarly, the limitation(s) of receiving; receiving; sending; receiving; receiving; generating; sending; receiving; distributing; receiving; distributing, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. For example, but for the processing nodes language, receiving; receiving; sending; receiving; receiving; generating; sending; receiving; distributing; receiving; distributing in the context of this claim encompasses the user manually generating a listing of generic predictions based on generic classifications. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)).
Further, these concepts also recite “Certain Methods of Organizing Human Activity”; (such as
commercial or legal interactions (including agreements in the form of contracts; legal
obligations; advertising, marketing or sales activities or behaviors; business relations) where
generating classification predictions is a method of human activity in advertising/marketing
activities.
Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application. In particular, the claim only
recites one additional element – using processing nodes to perform both the receiving; receiving; sending; receiving; receiving; generating; sending; receiving; distributing; receiving; distributing and generating steps. The processing nodes in both steps is recited at a
high level of generality (i.e., as a generic processor performing a generic computer function of
generating classification predictions) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any
meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more
than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using processing nodes to perform
both the receiving; receiving; sending; receiving; receiving; generating; sending; receiving; distributing; receiving; distributing and generating steps amounts to no more
than mere instructions to apply the exception using a generic computer component. Mere
instructions to apply an exception using a generic computer component cannot provide an
inventive concept. The claim(s) is/are not patent eligible.

Dependent claims 2-10 are merely add further details of the abstract steps/elements recited in
claim 1 without integrating the idea into a practical application; or including an improvement to
another technology or technical field, an improvement to the functioning of the computer itself,
or meaningful limitations beyond generally linking the use of an abstract idea to a particular
technological environment. Therefore, dependent claims 2-10 are also directed towards
nonstatutory subject matter.

As per independent claims 11 and 15, are also rejected as ineligible subject matter under 35
U.S.C. 101 for substantially the same reasons as the method claim(s) 1. The components (i.e.,
system/medium described in independent claims 11 and 15 do not provide for integrating the
abstract idea into a practical application. At best, the claim(s) are merely providing alternate
environments to implement the abstract idea.

Dependent claims 12-14 and 16-20 merely add further details of the abstract steps/elements
recited in claim 1 without integrating the idea into a practical application; or including an
improvement to another technology or technical field, an improvement to the functioning of the
computer itself, or meaningful limitations beyond generally linking the use of an abstract idea to
a particular technological environment. Therefore, dependent claims 12-14 and 16-20 are also
directed towards non-statutory subject matter.








Furthermore, claim 20 is not limited to statutory embodiments, instead being explicitly defined as including non-statutory embodiments (e.g., claim 20 recites: “carried on a carrier signal”). 
In particular, the disclosure refers to “carried on a carrier signal” which improperly includes unlimited transitory media, such as signals propagating through space, radio waves,
infrared signals, etc. See, e.g., In re Nuitjen, Docket no. 2006-1371 (Fed. Cir. Sept. 20,
2007) (slip. op. at 18) “A transitory, propagating signal like Nuitjen's is not a process,
machine, manufacture, or composition of matter.' …Thus, such a signal cannot be
patentable subject matter.”; see also Ex parte Barness, Patent Trial and Appeal Board
Appeal 2010-011009, 6/25/2013, pages 5-6.
Importantly, please note the current precedential opinion Ex parte Mewherter
(https://www.uspto.gov/sites/default/files/ip/boards/bpai/decisions/prec/fd2012_007692_precedential.pdf) issued by the Patent Trial and Appeals Board. In Ex parte Mewherter, the Board held that unless defined by Applicant to specifically exclude non-transitory forms of media, the broadest reasonable interpretation of a storage medium includes signals.











Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over 
Venkatesh el al: "Deep Decision Network for Multi-class Image Classification",
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, 27 June 2016 (2016-06-27), pages 2240-2248, XP033021357, DOI: 10.1109/CVPR.2016.246.

As to claim 1, Venkatesh discloses a taxonomy-based architecture data classifier
(Venkatesh  "clusters are identified at each node of the Deep Deep Decision Network using the spectral co-clustering algorithm," section 3.2, page 2242, right-hand column, lines 19-21; "we observed three clusters - Cluster - 1 :{0-airplane, 8-ship}, Cluster - 2 {1-automobile, 9-truck}, Cluster – 3 {2-bird, 3-cat, 4-deer, 5-dog, 6-frog, 7-horse}. This clustering can be interpreted as a data hierarchy automatically generated from the data." section 4.2, page 2244, left-hand column, line 49 - page 2245, right-hand column, line 1) architecture data classifier ( "deep decision network is a tree structured deep neural network with decision stumps at each node to classify easy separable data earlier in the network and to determine the subsequent expert node for the difficult cases," section 3. 1, page 2242, left-hand column, lines 35-38) operable in a training mode of operation (Piece-wise training for DNN, section 3.4, page 2243) and in a testing mode of operation (Classification using DNN, section 3.6, page 2244))

operable in a training mode of operation 
(Venkatesh section 1 lines 24-25 “Our contributions are as follows: (a) proposed stagewise
training strategy for the DDN helps alleviate problems”; see also 3.4. Piecewise
training for DDN)

and in a testing mode of operation, 


comprising:
a plurality of processing nodes arranged in a tree-based architecture having parent and child nodes, 
(Venkatesh "deep decision network is a tree structured deep neural network with decision stumps at each node to classify easy separable data earlier in the network and to determine the subsequent expert node for the difficult cases," section 3.1, page 2242, left-hand column, lines 35-38;
Figure 1 - "we observe N levels in the DON tree structured network and at each level there could be K clusters of confusion classes, Figure 1 - caption, page 2241),

wherein a root processing node of the plurality of processing nodes receives descriptions from a neural network,
(Venkatesh "Given an image, we feedforward it through the root node at level-1 of the DON and obtain the confidence score from the softmax layer. If the score is higher than the threshold value (determined during the training process) then we declare it as the final output. If not, the sample gets routed to the appropriate branch of the network based on its prediction label and the process is repeated either until the prediction score is higher than the confidence value or until it reaches the leaf node to get the final response," section 3.6, page 2244, left-hand column, lines 10-18; Figure 1, page 2241)

each child processing node receives from a parent node, during training mode descriptions and annotations associated to the sample pieces of data, 
(Venkatesh "For each cluster, a node is added to the decision network. A node itself is a shallow network (or expert network) trained to distinguish between a subset of classes belonging to that cluster. Note that when we train the new layers, we freeze the previously trained layers by setting their learning rate to zero," section 3.4, page 2243, right-hand column, lines 5-11), and during testing mode descriptions of sample piece of data (" If not, the sample gets routed to the appropriate branch of the network based on its prediction label and the process is repeated either until the prediction score is higher than the confidence value or until it reaches the leaf node to get the final response," section 3.6, page 2244, left-hand column, lines 14-18; Figure 1, page 2241)

and during testing mode descriptions of sample piece of data, each processing node comprising a classifying module and a selector module:
(Venkatesh "During testing, a sample is routed through DON until its class is determined (via early classification or at the leaf node)," section 3. 1, page 2242, right-hand column, lines 6-8; "Given an image, we feedforward it through the root node at level-1 of the DON and obtain the
confidence score from the softmax layer. If the score is higher than the threshold value (determined during the training process) then we declare it as the final output If not, ... ," section 3.6, page 2244, left-hand column, lines 10-18; Figure 1, page 2241 )

wherein the classifying module, during training mode is configured, to receive the descriptions,
generate classification predictions, send the classification predictions to an error calculator to calculate a gradient using an objective function, and receive the gradient from the error calculator,
(Venkatesh "Given a dataset, root (level 1) network is trained using the back propagation algorithm, section Form PCT/ISA/237 (Separate Sheet) (Sheet 2) (EPO-April 2005)
3. 1, page 2242, left-hand column, lines 41-42; "For each cluster, a node is added to the decision network. A node itself is a shallow network (or expert network) trained to distinguish between a subset of classes belonging to that cluster. Note that when we train the new layers, we freeze the previously trained layers by setting their learning rate to zero," section 3.4, page 2243, right-hand column, lines 5-11;
See also section 3.2),


and
during a testing mode of operation is configured to receive the descriptions,
generate classification predictions, and send the classification predictions to the selector module;
(Venkatesh "During testing, a sample is routed through DON until its class is determined (via early classification or at the leaf node)," section 3. 1, page 2242, right-hand column, lines 6-8; "Given an image, we feedforward it through the root node at level-1 of the DON and obtain the
confidence score from the softmax layer. If the score is higher than the threshold value (determined during the training process) then we declare it as the final output If not, ... ," section 3.6, page 2244, left-hand column, lines 10-18; Figure 1, page 2241 )

wherein the selector module, during training mode is configured to
receive the description and the annotations associated to the sample piece of data, and
distribute the descriptions and annotations to child nodes corresponding to the annotations, 
(Venkatesh "Firstly, all the layers in the previous levels are frozen while
training the newly introduced network layers which forms a new node at the
next level. Secondly, each node is built on the parent node's feature space to
specifically handle a subset of classes." section 3. 1, page 2242, right-hand
column, lines 10-15; "when we train the new layers, we freeze the previously
trained layers," section 3.4, page 2243, right-hand column, lines 9-11)

and
during testing mode is configured to receive descriptions and predictions, and distribute descriptions to the child nodes corresponding to the predictions
(Venkatesh " If not, the sample gets routed to the appropriate branch of the network
based on its prediction label and the process is repeated either until the
prediction score is higher than the confidence value or until it reaches the leaf
node to get the final response," section 3.6, page 2244; Figure 1, page 2241).

It would have been obvious to one having ordinary skill in the art at the time the time of the effective filing date to apply a deep learning network (DDN) as taught by Venkatesh, since it was known in the art that classification systems provide a deep learning network (DDN) that significantly improves the performance. DDN is a tree-like structured network built with NIN as the root node and all the expert network branch nodes made up of mlpconv layer where a DDN significantly improved over the current state-of-the-art results on publicly available datasets (Venkatesh Sec. 5)

As to claim 2, Venkatesh discloses the taxonomy-based architecture data classifier according to claim 1, further comprising a descriptor module, configured to receive the description from the neural network and generate a refined description corresponding to a classification task of the processing node (Venkatesh "each node is built on the parent node's feature
space to specifically handle a subset of classes, "section 3. 1, page 2242,
right-hand column, lines 13-15).

As to claim 3, Venkatesh discloses the taxonomy-based architecture data classifier according to claim 1, wherein the selector module comprises:
a first input to receive the description;
(Venkatesh "each node is built on the parent node's feature space to specifically handle a subset of classes, "section 3. 1, page 2242, right-hand column, lines 13-15).
a second input to receive the annotations during training mode and the predictions during testing mode; an activation output, coupled to one or more child nodes,
wherein the selector module is configured to process the annotations corresponding to the depth of the processing node during training mode 
(Venkatesh "Firstly, all the layers in the previous levels are frozen while training the newly introduced network layers which forms a new node at the next level. Secondly, each node is built on the parent node's feature space to specifically handle a subset of classes." section 3. 1, page 2242, right-hand column, lines 10-15; "when we train the new layers, we freeze the previously trained layers," section 3.4, page 2243, right-hand column, lines 9-11).

and the predictions from the respective classifying module during testing mode and send the description through the activation output to select one or more child nodes based on the received annotations or predictions, respectively
(Venkatesh "fine-tuning using the weighted contrastive loss (as explained in Section 3.3). After fine-tuning, the samples are split according to their cluster /D's," section 3.4, page 22443, right-hand column, lines 1-4; during training of subsequent nodes - "when we train the new layers, we freeze the previously trained layers by setting their learning rate to zero," section 3.4, page 2243, right-hand column, lines 9-11) or predictions (" If not, the sample gets routed to the appropriate branch of the network based on its prediction label and the process is repeated either until the prediction score is higher than the confidence value or until it reaches the leaf node to get the final response," section 3.6, page 2244; Figure 1, page 2241))..

As to claim 4, Venkatesh discloses the taxonomy-based architecture data classifier according to claim 1, wherein the classifying module, during training mode is configured to:
identify annotations relevant to the processing node, 
(Venkatesh ''after fine-tuning, the
samples are split according to their cluster /O's. For each cluster, a node is
added to the decision network." section 3.4, page 2243, right-hand column,
lines 3-6)
update probabilities of classification predictions of the processing node based on the identified relevant annotations
(Venkatesh "a subsequent expert network is trained for data within each cluster to correctly classify the previously misclassified samples and/or the samples classified with low confidence,"
section 3.1, page 2242, left-hand column, line 50 - right-hand column, line 2; Note that when we train the new layers, we freeze the previously trained layers by setting their learning rate to zero." section 3.4, page 2243, righthand column, lines 9-11).

As to claim 5, Venkatesh discloses the taxonomy-based architecture data classifier according to claim 1, further comprising a mini-batch mode of operation wherein, during training mode, the selector module is configured to receive a mini-batch of descriptions, split the mini-batch of descriptions during forward passes and regroup the gradients during backward passes, according to the corresponding annotations
(Venkatesh sec. 3.4 right col. Lines 3-4 “After fine-tuning, the samples are split
according to their cluster ID’s.”).

As to claim 6, Venkatesh discloses the taxonomy-based architecture data classifier according to claim 1, comprising an end-to-end data classifier
(Venkatesh ''proposed network architecture can make early decisions thereby reducing computational time without compromising on the performance," section 1, page 2240, right-hand
column, lines 30-33).

As to claim 7, Venkatesh discloses the taxonomy-based architecture data classifier according to claim 1, comprising interconnected processing nodes (Venkatesh Figure 1 , page 2241).

As to claim 8, Venkatesh discloses the taxonomy-based architecture data classifier according to claim 1, wherein the selector module, during training mode, is configured to process received annotations and send description and annotations to child processing nodes if the annotations processed correspond to the child processing nodes (Venkatesh "finetuning using the weighted contrastive loss (as explained in Section 3.3). After fine-tuning, the samples are split according to their cluster /O's," section 3.4, page 22443, right-hand column, lines 1-4; during training of subsequent nodes - "when we train the new layers, we freeze the previously trained layers by setting their learning rate to zero," section 3.4, page 2243, righthand column, lines 9-11).

As to claim 9, Venkatesh discloses the taxonomy-based architecture data classifier according to claim 1, comprising an image classifier, such as a garment image classifier (Venkatesh describes an image classifier and is applied to the CIFAR-10 dataset, see section 4.2, page 2244).

As to claim 10, Venkatesh discloses the taxonomy-based architecture data classifier according to claim 9, wherein the neural network is a convolutional neural network (Venkatesh discloses each node of a decision tree being a CNN, see section 2, page 2242, left-hand column, lines 17-18).


As to claim 11, Venkatesh discloses a computer implemented method of training a processing node of a taxonomy-based architecture data classifier, 
(Venkates "clusters are identified at each node of the Deep Deep Decision Network using the spectral co-clustering algorithm," section 3.2, page 2242, right-hand column, lines 19-21; "we observed three clusters - Cluster - 1 :{0-airplane, 8-ship}, Cluster - 2 {1-automobile, 9-truck}, Cluster – 3 {2-bird, 3-cat, 4-deer, 5-dog, 6-frog, 7-horse}. This clustering can be interpreted as a data hierarchy automatically generated from the data." section 4.2, page 2244, left-hand column, line 49 - page 2245, right-hand column, line 1) architecture data classifier ( "deep decision network is a tree structured deep neural network with decision stumps at each node to classify easy separable data earlier in the network and to determine the subsequent
expert node for the difficult cases," section 3. 1, page 2242, left-hand column, lines 35-38) operable in a training mode of operation (Piece-wise training for DNN, section 3.4, page 2243) and in a testing mode of operation (Classification using DNN, section 3.6, page 2244),

comprising:
receiving from a neural network, descriptions and annotations associated to sample pieces of data;
(Venkatesh “when we train the new layers, we freeze the previously trained layers by setting their learning rate to zero," section 3.4, page 2243, right-hand column, lines 9-11;
See also "For each cluster, a node is added to the decision network. A node itself is a shallow
network (or expert network) trained to distinguish between a subset of classes belonging to that
cluster. Note that when we train the new layers, we freeze the previously trained layers by
setting their learning rate to zero," section 3.4, page 2243, right-hand column, lines 5-11), and
during testing mode descriptions of sample piece of data (" If not, the sample gets routed to the
appropriate branch of the network based on its prediction label and the process is repeated
either until the prediction score is higher than the confidence value or until it reaches the leaf
node to get the final response," section 3.6, page 2244, left-hand column, lines 14-18; Figure 1,
page 2241)

generating at a classifying module of the processing node classification predictions;
(Venkatesh "During testing, a sample is routed through DON until its class is determined (via
early classification or at the leaf node)," section 3. 1, page 2242, right-hand column, lines 6-8;
"Given an image, we feedforward it through the root node at level-1 of the DON and obtain the
confidence score from the softmax layer. If the score is higher than the threshold value
(determined during the training process) then we declare it as the final output If not, ... ," section
3.6, page 2244, left-hand column, lines 10-18; Figure 1, page 2241 )

sending the generated classification predictions to an error calculator;
(Venkatesh "For each cluster, a node is added to the decision network. A node itself is a shallow
network (or expert network) trained to distinguish between a subset of classes belonging to that cluster," "it also helps in avoiding getting stuck in poor solutions during the gradient optimization process, and converges to network parameters that provide better generalization," section 3.4, page 2243, righthand column, lines 5-24;
See also "Given a dataset, root (level 1) network is trained using the back propagation
algorithm, section Form PCT/ISA/237 (Separate Sheet) (Sheet 2) (EPO-April 2005)
3. 1, page 2242, left-hand column, lines 41-42; "For each cluster, a node is added to the
decision network. A node itself is a shallow network (or expert network) trained to distinguish
between a subset of classes belonging to that cluster. Note that when we train the new layers,
we freeze the previously trained layers by setting their learning rate to zero," section 3.4, page
2243, right-hand column, lines 5-11;
See also section 3.2),

receiving at the selector module the descriptions and annotations;
(Venkatesh "For each cluster, a node is added to the decision network. A node itself is a shallow
network (or expert network) trained to distinguish between a subset of classes
belonging to that cluster," "it also helps in avoiding getting stuck in poor
solutions during the gradient optimization process, and converges to network
parameters that provide better generalization," section 3.4, page 2243, righthand
column, lines 5-24);

distributing by the selector module the descriptions and the annotations to child processing nodes based on the annotations corresponding to the depth of the child processing node (Venkatesh "fine-tuning using the weighted contrastive loss (as explained in Section 3.3). After fine-tuning, the samples are split according to their cluster /D's," section 3.4, page 22443, right-hand column, lines 1-4; during training of subsequent nodes - "when we train the new layers, we freeze the previously trained layers by setting their learning rate to zero," section 3.4, page 2243, right-hand column, lines 9-11).


As to claim 12, Venkatesh discloses the computer implemented method of training a processing node of a taxonomy-based architecture data classifier according to claim 11, further comprising refining the received description by a descriptor module of the processing node to correspond to a classification task of the processing node and sending the refined description to the classifying module and to the selector module (Venkatesh "each node is built on the parent node's feature
space to specifically handle a subset of classes, "section 3. 1, page 2242,
right-hand column, lines 13-15).

As to claim 13, Venkatesh discloses a computer implemented method of training a plurality of processing nodes of a taxonomy-based architecture data classifier, the nodes interconnected in a tree-based architecture, each node trained according to claim 11 (Venkatesh "deep decision network is a tree structured deep neural network with decision stumps at each node to classify easy separable data earlier in the network and to determine the subsequent expert node for
the difficult cases," section 3.1, page 2242, left-hand column, lines 35-38; Figure 1 - "we observe N levels in the DON tree structured network and at each level there could be K clusters of confusion classes, Figure 1 - caption, page 2241)).

As to claim 14, Venkatesh discloses the computer implemented method of training a plurality of processing nodes of a taxonomy-based architecture data classifier according to claim 13, comprising end-to-end training of the nodes interconnected in the tree-based architecture ( Venkatesh sec. 3.1 right col. Lines 9-17 “There are a few key differences between the DDN architecture and the traditional deep networks. Firstly, all the layers in the previous levels are frozen while training the newly introduced network layers which forms a new node
at the next level. Secondly, each node is built on the parent node’s feature space to specifically handle a subset of classes. Note that each node can be trained starting from any layer of the parent node, and this choice of the layer can be determined using a cross validation data set “).


As to claim 15, Venkatesh discloses a computer implemented method of testing a processing node of a taxonomy-based
(Venkatesh  "clusters are identified at each node of the Deep Deep Decision Network using the spectral co-clustering algorithm," section 3.2, page 2242, right-hand column, lines 19-21; "we observed three clusters - Cluster - 1 :{0- airplane, 8-ship}, Cluster - 2 {1-automobile, 9-truck}, Cluster - 3 {2-bird, 3-cat, 4-deer, 5-dog, 6-frog, 7-horse}. This clustering can be interpreted as a data hierarchy automatically generated from the data." section 4.2, page 2244, lefthand
column, line 49 - page 2245, right-hand column, line 1)) 

architecture data classifier, 
(Venkatesh "deep decision network is a tree structured deep neural network with decision stumps at each node to classify easy separable data earlier in the network and to determine the subsequent expert node for the difficult cases," section 3.1, page 2242, left-hand column, lines 35-38) 
comprising:
receiving from a neural network, descriptions associated to sample pieces of data;
(Venkatesh "For each cluster, a node is added to the decision network. A node itself is a
shallow network (or expert network) trained to distinguish between a subset of
classes belonging to that cluster. Note that when we train the new layers, we
freeze the previously trained layers by setting their learning rate to zero,"
section 3.4, page 2243, right-hand column, lines 5-11)

generating at a classifying module of the processing node classification predictions;
(Venkatesh "Given a dataset, root (level 1) network is trained using the back propagation algorithm, section 3. 1, page 2242, left-hand column, lines 41-42; "For each cluster, a node is
added to the decision network. A node itself is a shallow network (or expert network) trained to distinguish between a subset of classes belonging to that cluster. Note that when we train the new layers, we freeze the previously trained layers by setting their learning rate to zero," section 3.4, page 2243, right-hand column, lines 5-11) 
sending the generated classification predictions to a selector module;
(Venkatesh "fine-tuning using the weighted contrastive loss
(as explained in Section 3.3). After fine-tuning, the samples are split
according to their cluster /D's," section 3.4, page 22443, right-hand column,
lines 1-4; during training of subsequent nodes - "when we train the new
layers, we freeze the previously trained layers by setting their learning rate to
zero," section 3.4, page 2243, right-hand column, lines 9-11).

receiving at the selector module the generated classification predictions;
(Venkatesh "During testing, a sample is routed through DON until its class is determined (via early classification or at the leaf node)," section 3. 1, page 2242, right-hand column, lines 6-8; "Given an image, we feedforward it through the root node at level-1 of the DON and obtain the
confidence score from the softmax layer. If the score is higher than the threshold value (determined during the training process) then we declare it as the final output If not, ... ," section 3.6, page 2244, left-hand column, lines 10-18; Figure 1, page 2241 )

distributing by the selector module the descriptions to child processing nodes based on the received classification predictions
(Venkatesh "Given an image, we feedforward it through the root node at level-1 of the DON and
obtain the confidence score from the softmax layer. If the score is higher than the threshold value (determined during the training process) then we declare it as the final output. If not, the sample gets routed to the appropriate branch of the network based on its prediction label and the process is repeated either until the prediction score is higher than the confidence value or until it reaches the leaf node to get the final response," section 3.6, page 2244, left-hand
column, lines 10-18; Figure 1, page 2241; "During testing, a sample is routed through DON until its class is determined (via early classification or at the leaf node)." section 3. 1, page 2242, right-hand column, lines 6-8).


As to claim 16, Venkatesh discloses the computer implemented method of testing a processing node of a taxonomy-based architecture data classifier, further comprising refining the received description by a descriptor module of the processing node to correspond to a classification task of the processing node and sending the refined description to the classifying module and to the selector module
(Venkatesh  "each node is built on the parent node's feature space to specifically handle a subset of classes," section 3. 1, page 2242, right-hand column, lines 13-15)).
 
As to claim 17, Venkatesh discloses the computer implemented method of testing a processing node of a taxonomy-based architecture data classifier, wherein the processing node has been trained according to claim 11
(Venkatesh section 1 lines 24-25 “Our contributions are as follows: (a) proposed stagewise
training strategy for the DDN helps alleviate problems”; see also 3.4. Piecewise
training for DDN;
see also Venkatesh "Firstly, all the layers in the previous levels are frozen while training the newly introduced network layers which forms a new node at the next level. Secondly, each node is built on the parent node's feature space to specifically handle a subset of classes." section 3. 1, page 2242, right-hand column, lines 10-15; "when we train the new layers, we freeze the previously trained layers," section 3.4, page 2243, right-hand column, lines 9-11)..

As to claim 18, Venkatesh discloses a computer implemented method of testing a plurality of processing nodes of a taxonomy-based architecture data classifier, the nodes interconnected in a tree-based architecture, each node tested according to claim 15
(Venkatesh "deep decision network is a tree structured deep neural network with decision stumps at each node to classify easy separable data earlier in the network and to determine the subsequent expert node for the difficult cases," section 3.1, page 2242, left-hand column, lines 35-38; Figure 1 - "we observe N levels in the DON tree structured network and at each level there could be K clusters of confusion classes, Figure 1 - caption, page 2241)

As to claim 19, Venkatesh discloses a computer program product comprising program instructions for causing a computing system to perform a method according to claim 11
(Venkatesh Abstract; see also p. 2244 Figure 3. DDN method idea validation on classification of digit ’6’ and ’8’ of MNIST dataset. left image indicates some of the confusion classes at the level-1 and the right one indicates some confusion cases at level-2. One could observe that some of the confusion cases of level-1 are resolved at level-2.).

As to claim 20, Venkatesh discloses a computer program product according to claim 19, embodied on a storage medium or carried on a carrier signal
(Venkatesh 4.2. CIFAR10
Experimental Setup: The CIFAR-10 dataset [11] consists of 10 classes of natural images with a total of 50K training images and a total of 10K testing images. Each image is of size 32x32 and we follow the same pre-processing of global contrast normalization and ZCA whitening as
in [5, 15]. For the validation dataset, we used the last 10K samples of the training to determine the confidence level threshold and data splits based on the confusion matrix. After determining the data-splits and the confidence level threshold, we combined the training and validation dataset to re-train the network before splitting.; see also Abstract
In this paper, we present a novel Deep Decision Network (DDN) that provides an alternative approach towards build- ing an efficient deep learning network. During the learn-
ing phase, starting from the root network node, DDN au- tomatically builds a network that splits the data into dis- joint clusters of classes which would be handled by the sub- sequent expert networks.).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

Li et al., US Pub. No. 2018/0157638 A1, teaches an improved processing unit that can operate an end-to-end recurrent neural network (RNN) with limited contextual dialogue memory that can be jointly trained by supervised signals-user slot tagging, intent prediction and/or system action prediction. The end-to-end RNN, or joint model has shown advantages over separate models for natural language understanding (NLU) and dialogue management and can capture expressive feature representations beyond conventional aggregation of slot tags and intents, to mitigate effects of noisy output from NLU. The joint model can apply a supervised
signal from system actions to refine the NLU model. By back-propagating errors associated with system action prediction to the NLU model, the joint model can use machine
learning to predict user intent, and perform slot tagging, and make system action predictions based on user input, e.g., utterances across a number of domains; and

Sugaberry et al. US Pub. No. 2018/0268015 A1, teaches an improved method and apparatus for recognizing errors in documents which may comprise text and images and resolving recognized errors automatically comprise application of a search manager for analyzing parameters of a plurality of databases for a plurality of objects, the databases comprising a product database, a product provider database, a service database, a service provider database and an image database whereby the databases store data objects containing identifying features, source information and document properties and context including time and frequency varying data. Data acquisition and communication devices may comprise near field
communication and camera devices for collecting document data. The method comprises application of multivariate statistical analysis and principal component analysis in
combination with content-based image retrieval for providing two-dimensional attributes of three dimensional objects, for example, via preferential image segmentation using a
tree of shapes to recognize document errors such as tax application errors and to resolve errors/issues by means of k-means clustering and related methods via a client/cloudbased
server system. By way of example, an example of an erroneous application of sales tax to clothing/food which may/may not be taxed in a given jurisdiction (Delaware, Pa.)
may be recognized and resolved by client/server/database query and issue escalation.






CONTACT INFORMATION
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EVAN S ASPINWALL whose telephone number is (571)270-7723. The examiner can normally be reached Monday-Friday 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Neveen Abel-Jalil can be reached on 571-270-0474. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Evan Aspinwall/Primary Examiner, Art Unit 2152