DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Specification
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.


Drawings
The applicant’s submitted drawings appear to be acceptable for examination purposes. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the drawings.


Information Disclosure Statement
As required by M.P.E.P. 609(c), the applicant's submission of the Information Disclosure Statements, dated 6 June 2019 and 17 November 2021, are acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending.  As required by M.P.E.P 609 C(2), a copy of the PTOL-1449 forms, initialed and dated by the examiner, are attached to the instant office action.

The listing of references in the specification is not a proper information disclosure statement.  37 CFR 1.98(b) requires a list of all patents, publications, or other information submitted for consideration by the Office, and MPEP § 609.04(a) states, "the list may not be incorporated into the specification but must be submitted in a separate paper."  Therefore, unless the references have been cited by the examiner on form PTO-892 (or indicated on form PTOL-1449), they have not been considered.


Claim Rejections - 35 USC § 101
Examiner’s Note: according to paragraph 141 of the specification as filed, the examiner has interpreted the “computer readable storage medium” of claims 10-17 to be defined as non-transitory.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 5-7, 14-16, and 22-24 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 5 recites the limitations "the hierarchy” and “the second label” and “the hierarchical classification ontology data structure T” in lines 2-6.  There is insufficient antecedent basis for these limitations in the claim.
Claims 6-7 depend upon claim 5, and thus include the aforementioned limitation(s).
Claim 6 further recites the limitations "the lower leaf tier levell“ and “the parent tier levelp" in lines 1-2.  There is insufficient antecedent basis for these limitations in the claim.
Claim 7 depends upon claim 6, and thus includes the aforementioned limitation(s).
Claim 7 further recites the limitations “the value in the indicative layer” and “the relevant node” in lines 5-6. There is insufficient antecedent basis for these limitations in the claim.

Claim 6 also recites two equations.  The intended scope of the claim is not clear because it is not clear what all of the variables in the equations are meant to represent, including the y and ŷ variables.
Claim 7 depends upon claim 6, and thus includes the aforementioned limitation(s).

As per claims 14-16, see the rejections of claims 5-7, above.

As per claims 22-24, see the rejections of claims 5-7, above.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 1-5, 8-14, 17-22, and 25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Perera et al. (Optimizing Hierarchical Classification with Adaptive Node Collapses, Feb 2018, pgs. 372-375 – cited in an attached IDS) in view of Duncan (US 2017/0213127).

As per claim 1, Perera teaches a method to implement a training system for finding an optimal surface for hierarchical classification on an ontology [a system for finding an optimal surface of nodes, of an ontology or labelling hierarchy, over which to train a machine learning model (pg. 373, “methodology”; pg. 375, “experiments and results”; etc.)], the method comprising: receiving, by the training system, a training data set and a hierarchical classification ontology data structure [a system for finding an optimal surface of nodes, of an ontology or labelling hierarchy, over which to train a machine learning model (pg. 373, “methodology”; pg. 375, “experiments and results”; etc.) using a training data set (pgs. 372-373, etc.)]; generating, by the training system, a neural network architecture based on the training data set and the hierarchical classification ontology data structure [the machine learning model generated and trained may be a convolutional neural network (CNN) (pg. 374, “initial results” and “experiments and results”; etc.)], wherein the neural network architecture comprises an indicative layer, a parent tier (PT) output and a lower leaf tier (LLT) output [the machine learning model generated and trained may be a convolutional neural network (CNN – where a neural network has at least one layer, which in this case is an indicative layer) (pg. 374, “initial results” and “experiments and results”; etc.) which determines a leaf node class (LLT output) and parent class (PT output) (pg. 373, “introduction” and “methodology”; etc.)]; training, by the training system, the neural network architecture to classify the training data set to leaf nodes at the LLT output and parent nodes at the PT output [the machine learning model generated and trained may be a convolutional neural network (CNN) (pg. 374, “initial results” and “experiments and results”; etc.) which determines a leaf node class (LLT output) and parent class (PT output) (pg. 373, “introduction” and “methodology”; etc.) using a training data set (pgs. 372-373, etc.)]; determining, by the indicative layer in the neural network architecture, a surface that passes through each path from a root to a leaf node in the hierarchical ontology data structure [the neural network (pg. 374, “initial results” and “experiments and results”; etc.) is used by the system to determine a surface over the hierarchy that is a set of nodes that intersect once each path from the root to the leaves of the hierarchy (pg. 372, “introduction”; pg. 373, “methodology”; etc.)]; and training, by the training system, a classifier model for a cognitive system using the surface and the training data set [training a classifier using the surface and training data (pgs. 372-373, “introduction” and “methodology”; etc.)].
While Perera teaches training and using a machine learning system (see above) it does not explicitly teach the physical implementation, and thus does not teach a data processing system having a processor and a memory, wherein the memory comprises instructions which are executed by the processor to cause the processor to implement the system.
Duncan teaches a data processing system having a processor and a memory, wherein the memory comprises instructions which are executed by the processor to cause the processor to implement the system [the system may be implemented as a program stored on computer readable storage media (memory) to be executed by the computer (processor) (para. 0174, claim 6, etc.)].
Perera and Duncan are analogous art, as they are within the same field of endeavor, namely using machine learning models including deleting/collapsing nodes.
It would have been obvious to one of ordinary skill, before the effective filing date of the claimed invention, to utilize a processor and memory to implement the machine learning system, as taught by Duncan, for the implementation of the system taught by Perera.
Because both Perera and Duncan teach machine learning systems, but Perera does not disclose how the system is actually physically implemented, it would have been obvious to one of ordinary skill in the art, to utilize a processor and memory to implement the machine learning system, as taught by Duncan, for the implementation of the system taught by Perera, to achieve the predictable result of allowing an actual implementation of the system, including training and testing the model(s) and using the model on further input data.

As per claim 2, Perera/Duncan teaches wherein the neural network architecture comprises a convolutional neural network architecture [the machine learning model generated and trained may be a convolutional neural network (CNN) (Perera: pg. 374, “initial results” and “experiments and results”; etc.)].

As per claim 3, Perera teaches the method of claim 1, as described above.
While Perera teaches the indicative layer of the NN determining whether the surface contains each node, and utilizing an accuracy determination for adding nodes to the surface (see, e.g., Perera: pg. 373) it does not explicitly teach wherein the indicative layer generates for each given node in the lower leaf tier a probability value representing a probability that the surface contains the given node or its parent.
Duncan teaches wherein the indicative layer generates for each given node in the lower leaf tier a probability value representing a probability that the surface contains the given node or its parent [a confidence (probability) is determined for the output of the model which can include propagation from parent to child nodes, which can be compared to a threshold to determine which nodes to prune from the hierarchy (creating the surface) (see paras. 0014-27, 0039, 108-113 regarding the confidence/probability and hierarchical clusters; see also para 0638 regarding pruning; etc.)].
Perera and Duncan are analogous art, as they are within the same field of endeavor, namely using machine learning models including deleting/collapsing nodes.
It would have been obvious to one of ordinary skill, before the effective filing date of the claimed invention, to utilize a confidence/probability value for removing nodes (creating a surface in the hierarchy), as taught by Duncan, for the creation of the surface in the hierarchy by the neural network layer in the system taught by Perera.
Duncan provides motivation as [using machine learning confidence to create and shape the hierarchy nodes provides more accurate paths, handles conflicts in classification, and requires less user input (paras. 0005, 0053-55, ].

As per claim 4, Perera/Duncan teaches wherein determining the surface comprises: comparing the probability value for each given node in the lower leaf tier to a threshold; adding the given node to the surface responsive to the probability value being greater than or equal to the threshold; and adding a parent node of the given node to the surface responsive to the probability value being less than the threshold [a confidence (probability) is determined for the output of the model, which can be compared to a threshold to determine which nodes to prune from the hierarchy (creating the surface) (see paras. 0014-27, 0039, 108-113 regarding the confidence/probability and hierarchical clusters; see also para 0638 regarding pruning; etc.); where the threshold comparison determines whether the child node will be deleted (thus adding the parent to the surface of the hierarchy) or whether the child remains (adding the child to the surface)].

As per claim 5, Perera/Duncan teaches wherein the training data set is labeled training data set D = {(x1,y1l),…,(xn,ynl)}, where yll is a leaf node in the hierarchy, the method further comprising obtaining a parent of each yll and adding the parent as the second label for each training instance to form a modified dataset Dm = {(x1,y1l, y1p),…,(xn,ynl,ynp)}, where ylp is the parent of yll according to the hierarchical classification ontology data structure T [the system takes the labeled training data corresponding to the leaf nodes and re-labels the training set using the surface (Perera: pgs. 373-374, “methodology”; etc.) and the user nodes may be relabeled with ancestor node data (Duncan: paras. 0984-988, 1024-1028, etc.)].

As per claim 8, Perera/Duncan teaches wherein training the classifier model comprises relabeling the training data set based on the surface and one or more adaptive node collapses in the hierarchical classification ontology data structure [the system takes the labeled training data corresponding to the leaf nodes and re-labels the training set using the surface (Perera: pgs. 373-374, “methodology”; etc.) where the user nodes may be relabeled with ancestor node data (Duncan: paras. 0984-988, 1024-1028, etc.) using node collapses to create the surface (Perera: pg. 372; Duncan: para. 0638; etc.)].

As per claim 9, Perera/Duncan teaches wherein receiving the hierarchical classification ontology data structure comprises receiving a single level of labels and generating the hierarchical classification ontology data structure based on the single level of labels [the labeling hierarchy (hierarchical classification ontology) includes using given labels to create the hierarchy (Perera: pgs. 373-374, “methodology”; etc.)].

As per claim 10, see the rejection of claim 1 above, wherein Perera/Duncan also teaches a computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on at least one processor of a computing device, causes the at least one processor to implement the training system [the system may be implemented as a program stored on computer readable storage media (memory) to be executed by the computer (processor) (Duncan: para. 0174, claim 6, etc.)].

As per claim 11, see the rejection of claim 2, above.

As per claim 12, see the rejection of claim 3, above.

As per claim 13, see the rejection of claim 4, above.

As per claim 14, see the rejection of claim 5, above.

As per claim 17, see the rejection of claim 8, above.

As per claim 18, see the rejection of claim 1, above, wherein Perera/Duncan teaches at least one processor; and a memory coupled to the at least one processor, wherein the memory comprises instructions which, when executed by the at least one processor, cause the at least one processor to implement the training system [the system may be implemented as a program stored on computer readable storage media (memory) to be executed by the computer (processor) (Duncan: para. 0174, claim 6, etc.)].

As per claim 19, see the rejection of claim 2, above.

As per claim 20, see the rejection of claim 3, above.

As per claim 21, see the rejection of claim 4, above.

As per claim 22, see the rejection of claim 5, above.

As per claim 25, see the rejection of claim 8, above.


Claim(s) 6, 15 and 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Perera and Duncan as applied to claims 5, 14, and 22, above, and further in view of Yao (Negative Log Likelihood Ratio Loss for Deep Neural Network Classification, April 2018, pgs. 1-4).

As per claim 6, Perera/Duncan teaches the method of claim 5, as described above.
While Perera/Duncan also teaches using a cost function (see, e.g. Perera: pg. 374), it does not explicitly teach wherein a loss at the lower leaf tier levell and a loss at the parent tier levelp are determined according to the following equations: 

    PNG
    media_image1.png
    115
    212
    media_image1.png
    Greyscale

where n and m are the number of nodes at levell and levelp in T respectively.
 Yao teaches wherein a loss at the lower leaf tier levell and a loss at the parent tier levelp are determined according to the following equations: 

    PNG
    media_image1.png
    115
    212
    media_image1.png
    Greyscale

where n and m are the number of nodes at levell and levelp in T respectively [the cross entropy loss function may be used to determine the loss function for the network using the outputs of the output nodes (pgs. 1-2, section II; where equation (4) shows the claimed loss function equation)].
Perera/Duncan and Yao are analogous art, as they are within the same field of endeavor, namely optimizing a machine learning model using a cost/loss function.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to use the cross-entropy loss function taught by Yao as the cost (loss) function in the system of Perera/Duncan.
Yao provides motivation as [cross entropy provides a reasonable, efficient loss function to optimize a neural network based classifier (pg. 1, abstract and section I; etc.)].

As per claim 15, see the rejection of claim 6, above.

As per claim 23, see the rejection of claim 6, above.


Conclusion
The following is a summary of the treatment and status of all claims in the application as recommended by M.P.E.P. 707.07(i): claims 1-25 are rejected.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Hahm (US 2018/0121601), Milton (US 2016/0239857), and Mammone (US 5,634,087) – disclose various systems which prune nodes from a hierarchy/tree structure based upon a threshold (creating a surface).
Berrada et al. (Smooth Loss Functions for Deep Top-K Classification, Feb 2018, pgs. 1-25) – discloses different kinds of loss functions that can be used for classifiers, including cross-entropy.

The examiner requests, in response to this Office action, that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line number(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.

When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections.  See 37 CFR 1.111(c).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to GEORGE GIROUX whose telephone number is (571)272-9769. The examiner can normally be reached M-F 10am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GEORGE GIROUX/Primary Examiner, Art Unit 2128