DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Applicant’s Amendments, filed 01-June-2022, have been entered. Claims 1, 8, 10, 18 and 20 have been amended, claims 2 and 12 have been canceled, and claims 1, 3-11, 13-20 are currently pending. The 112(b) rejections directed to claims 10 and 20 are withdrawn due to amendments to the claims. The 112(b) rejections directed to claims 1 and claims that depend from claim 1 is sustained as set forth below (see Non-Final Rejection dated 08-March-2022, pp. 3-4). The claims objections directed to claims 8 and 18 are withdrawn due amendments to the claims.
Response to Arguments
Applicant's arguments, see Remarks pp. 6-8, filed 01-June-2022 have been fully considered but they are not persuasive. Applicant argues that Shah et al. (Pub. No. US 2018/0218374 A1, hereinafter “Shah”) does not teach or suggest that “the information gain of obtaining answers corresponding to one of the indicator variables is a difference between an initial uncertainty…” as recited in amended claim 1 (Remarks p. 7). Examiner agrees with this argument, however respectfully submits that cited prior art Yamagami et al. (Pub. No. US 2018/0005126 A1, hereinafter “Yamagami) teaches the amended claim 1 language (see below).
Applicant argues that Shah does not teach the tree structure as presently claimed, specifically arguing that Shah does not teach that each label are edges of a graph (Remarks pp. 7-8). In response, examiner respectfully submits that the information stored in the knowledge base configures a knowledge graph (i.e. decision tree) including a network of interconnected nodes and branches (i.e. edges). The knowledge graph may be systematically pruned to match a user query to a system query [0034]. See Fig. 3, where the user inputs the query 306b “How do I access the printer?”, the virtual agent may respond with a corresponding output message 308b, which includes system queries related to the query 306b. The system queries may be retrieved from the knowledge base 210. The system may be caused to parse the query 306b to generate n-grams and match the n-grams with the tagged keywords corresponding to the system queries stored in the knowledgebase system 210 of the system to retrieve the system queries related to the query 306b [0056]. If the user’s response to the displayed system queries indicates no match with the system query, then the processor may be configured to prune the knowledge graph (i.e. remove parts of the knowledge graph associated with the previous set of system queries from a current search domain) to identify a different set of system queries that are more likely to match the user query [0045]. Examiner interprets that query 306b discloses a node corresponding to the indicator variable and the answer 308b discloses edges corresponding to the labels.
Applicant argues that Boyan et al. (Pub. No. US 2010/0145902 A1, hereinafter “Boyan”) does teach pruning the graph in order to maximize information gain (Remarks pp. 8-9). Examiner agrees with this argument. However, examiner submits that Shah teaches pruning the decision tree (see Shah [0034, 0045], where the knowledge graph is pruned to identify system queries more likely to match the user query].
Applicant argues that Yamagami does not teach the determination of information gain as the difference between an inquiry uncertainty and an average uncertainty and that Yamagami does not describe pruning the graph (Remarks p. 9). In response, examiner respectfully submits that Yamagami teaches that the user’s answer reliability calculator collects the user’s answer instance data stored on the user’s answer instance data memory and calculates, as reliability, a percentage of the user’s correct answers to inquiries asking about attributes (correct answer rate). The reliability is an index that represents the user’s correct answer rate to an inquiry asking about an attribute [0074]. The information gain is calculated by subtracting from the amount of reduction in entropy the conditional entropy (i.e. initial uncertainty) obtained when the specific attribute value that is the answer to the inquiry asking about the specific attribute is obtained. The information gain thus obtained precisely reflects the uncertainty of the user’s answer using the conditional entropy [0052].
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 1 limitations “a labelling engine to apply to each datum within the corpus…a clarification engine to: generate a decision tree…” invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. Claim 1 recites generic placeholder “engine” coupled with functional language “to apply to each datum within the corpus a label corresponding to each one of a plurality of predetermined indicator variables” and “generate a decision tree using the set of search results…prune the decision tree in response to a question posed to a user…” without reciting sufficient structure to achieve the function. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. Claims 3-10 depend from claim 1, include all the limitations of claim 1, and are rejected accordingly.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 3-5, 8-10, 11, 13-15 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Shah in view of Yamagami.
Regarding claim 1, Shah teaches:
a labelling engine to apply to each datum within the corpus a label corresponding to each one of a plurality of predetermined indicator variables, each indicator variable relating to context of the respective data (Shah – the memory 204 in Fig. 2 includes a knowledge base 210 (i.e. corpus) that serves as a store of user queries (i.e. indicator variables) that are anticipated at the service desk of the enterprise. The system queries are stored along with corresponding answers (i.e. labels) in the knowledge base 210 [0034]. The queries in the knowledge base are tagged [0056]. The processor may be configured to communicate with public data sources (for example, sources like Wikipedia, technical community forums, etc.) and private data sources (for example, online technical libraries) to augment information stored in the knowledge base [0034]. Also see [0049], where the knowledge base can be updated by a human agent to identify an information gap in the knowledge base.)
and a clarification engine to: generate a decision tree using the set of search results, the decision tree comprising nodes corresponding to the indicator variables and edges corresponding to the labels, the decision tree generated to maximize information gain based on pruning the decision tree in response to obtaining a desired label for a selected indicator variable, each indicator variable corresponds to a question and each label of associated edges corresponds to an answer associated with the question, (Shah – the information stored in the knowledge base configures a knowledge graph (i.e. decision tree) including a network of interconnected nodes and branches (i.e. edges). The knowledge graph may be systematically pruned to match a user query to a system query [0034]. See Fig. 3, where the user inputs the query 306b “How do I access the printer?”, the virtual agent may respond with a corresponding output message 308b, which includes system queries related to the query 306b. The system queries may be retrieved from the knowledge base 210. The system may be caused to parse the query 306b to generate n-grams and match the n-grams with the tagged keywords corresponding to the system queries stored in the knowledgebase system 210 of the system to retrieve the system queries related to the query 306b [0056]. If the user’s response to the displayed system queries indicates no match with the system query, then the processor may be configured to prune the knowledge graph (i.e. remove parts of the knowledge graph associated with the previous set of system queries from a current search domain) to identify a different set of system queries that are more likely to match the user query [0045]. Examiner interprets that query 306b discloses a node corresponding to the indicator variable and the answer 308b discloses edges corresponding to the labels. Examiner also interprets that pruning the knowledge graph by removing parts of the knowledge graph discloses maximizing information gain.)
and prune the decision tree in response to a question posed to a user to obtain a label for an indicator variable (Shah – the information stored in the knowledge base configures a knowledge graph (i.e. decision tree) including a network of interconnected nodes and branches. The knowledge graph may be systematically pruned to match a user query to a system query [0034]. More specifically, if the user’s response to the displayed system queries indicates no match with the system query, then the processor may be configured to prune the knowledge graph (i.e. remove parts of the knowledge graph associated with the previous set of system queries from a current search domain) to identify a different set of system queries that are more likely to match the user query [0045].)
Shah does not appear to teach:
the information gain of obtaining answers corresponding to one of the indicator variables is a difference between an initial uncertainty and an average uncertainty that a user would give the answer
However, Yamagami teaches:
the information gain of obtaining answers corresponding to one of the indicator variables is a difference between an initial uncertainty and an average uncertainty that a user would give the answer (Yamagami – the user’s answer reliability calculator collects the user’s answer instance data stored on the user’s answer instance data memory and calculates, as reliability, a percentage of the user’s correct answers to inquiries asking about attributes (correct answer rate). The reliability is an index that represents the user’s correct answer rate to an inquiry asking about an attribute [0074]. The information gain is calculated by subtracting from the amount of reduction in entropy  the conditional entropy (i.e. initial uncertainty) obtained when the specific attribute value that is the answer to the inquiry asking about the specific attribute is obtained. The information gain thus obtained precisely reflects the uncertainty of the user’s answer using the conditional entropy [0052].)
Accordingly, it would have been obvious to a person of ordinary skill in the art at the time the invention was effectively filed, having the teachings of Shah and Yamagami before them, to modify the system of Shah of a labelling engine to apply to each datum within the corpus a label corresponding to each one of a plurality of predetermined indicator variables, each indicator variable relating to context of the respective data, and a clarification engine to: generate a decision tree using the set of search results, the decision tree comprising nodes corresponding to the indicator variables and edges corresponding to the labels, the decision tree generated to maximize information gain based on pruning the decision tree in response to obtaining a desired label for a selected indicator variable, each indicator variable corresponds to a question and each label of associated edges corresponds to an answer associated with the question, and prune the decision tree in response to a question posed to a user to obtain a label for an indicator variable with the teachings of Yamagami of the information gain of obtaining answers corresponding to one of the indicator variables is a difference between an initial uncertainty and an average uncertainty that a user would give the answer. One would have been motivated to make such a modification to generate a decision tree that is used to determine an order of inquiries when candidates of classification results from a user’s answer to an inquiry made to the user in dialog are narrowed (Yamagami - [0001]).
	Claim 11 corresponds to claim 1 and is rejected accordingly.
Regarding claim 3, Shah teaches:
wherein each indicator variable represents a category of interest to a particular field represented by the corpus of data (Shah – system queries (i.e. indicator variables) are associated with a search domain (i.e. category of interest) [0045].)  
Claim 13 corresponds to claim 3 and is rejected accordingly.
Regarding claim 4, Shah teaches:
wherein at least one of the labels is unknown (Shah – if a user query is determined to an incident, then an answer of a matching system query (a system query matching the user query) may be provided as a reply to the user. In some embodiments, the reply provided to the user may be deemed to be unsatisfactory by the user [0047]. The processor is configured to identify an information gap (unknown label) in the knowledge base subsequent to the provisioned reply being deemed unsatisfactory by the user [0049]. The system may further be caused to update the knowledge base with the resolution to the query provided by the human agent in response to the identification of the knowledge gap. The updating of the knowledge base may address the identified information gap in the knowledge base [0049].)  
Claim 14 corresponds to claim 4 and is rejected accordingly.
Regarding claim 5, Shah teaches:
wherein each datum in the corpus comprises one or more webpages (Shah – the processor may be configured to communicate, using the communication interface, with public data sources (for example, sources like Wikipedia, technical community forums, etc.) and private data sources (for example, online technical libraries) to augment information stored in the knowledge base [0034].)
Claim 15 corresponds to claim 5 and is rejected accordingly.
Regarding claim 8, Shah teaches:
wherein maximizing information gain comprises determining the information gained in knowing a value of each indicator variable, the indicator variable with a largest potential information gain being used to split the search results into subsets according to its value to produce a node in the decision tree, and wherein the question posed to the user results in obtaining a label or value for the indicator variable (Shah – the information stored in the knowledge base configures a knowledge graph (i.e. decision tree) including a network of interconnected nodes and branches. The knowledge graph may be systematically pruned to match a user query to a system query [0034]. More specifically, if the user’s response to the displayed system queries indicates no match with the system query, then the processor may be configured to prune the knowledge graph (i.e. remove parts of the knowledge graph associated with the previous set of system queries from a current search domain) to identify a different set of system queries that are more likely to match the user query [0045].)  
Claim 18 corresponds to claim 8 and is rejected accordingly.
Regarding claim 9, Shah teaches:
wherein the clarification engine iteratively performs, to prune the decision tree, determining the information gained and posing the question to the user that will provide the largest information gain (Shah – if the user’s response to the displayed system queries indicates no match with the system query, then the processor may be configured to prune the knowledge graph (i.e. remove parts of the knowledge graph associated with the previous set of system queries from a current search domain) to identify a different set of system queries that are more likely to match the user query. The system may be caused to repeat the steps (i.e. iteratively perform) of displaying system queries and causing identification of a matching system query, a predefined number of times, if an user input in response to the displaying the system queries is indicative of no match between the system queries and the query provided by the user [0045-0046].)  
Claim 19 corresponds to claim 9 and is rejected accordingly.
Regarding claim 10, Shah does not appear to teach:
wherein information gain further comprises determining a probability that the answer provided by the user to each question is accurate, and that a desired search result to the search query will be found in the set of documents represented within that answer
However, Yamagami teaches:
wherein information gain further comprises determining a probability that the answer provided by the user to each question is accurate, and that a desired search result to the search query will be found in the set of documents represented within that answer (Yamagami – the user’s answer reliability calculator collects the user’s answer instance data stored on the user’s answer instance data memory and calculates, as reliability, a percentage of the user’s correct answers to inquiries asking about attributes (correct answer rate). The reliability is an index that represents the user’s correct answer rate to an inquiry asking about an attribute [0074]. The information gain is calculated by subtracting from the amount of reduction in entropy (i.e. initial uncertainty) the conditional entropy obtained when the specific attribute value that is the answer to the inquiry asking about the specific attribute is obtained. The information gain thus obtained precisely reflects the uncertainty of the user’s answer using the conditional entropy [0052]. Also see [0076], where the information gain calculator calculates, on a per attribute basis of the classification target data included in the pre-segmentation data set, an amount of reduction in the entropy of the data set caused by the segmentation.)  
Accordingly, it would have been obvious to a person of ordinary skill in the art at the time the invention was effectively filed, having the teachings of Shah and Yamagami before them, to modify the system of Shah of a labelling engine to apply to each datum within the corpus a label corresponding to each one of a plurality of predetermined indicator variables, each indicator variable relating to context of the respective data, and a clarification engine to: generate a decision tree using the set of search results, the decision tree comprising nodes corresponding to the indicator variables and edges corresponding to the labels, the decision tree generated to maximize information gain based on pruning the decision tree in response to obtaining a desired label for a selected indicator variable, each indicator variable corresponds to a question and each label of associated edges corresponds to an answer associated with the question, the information gain of obtaining answers corresponding to one of the indicator variables is a difference between an initial uncertainty and an average uncertainty that a user would give the answer, and prune the decision tree in response to a question posed to a user to obtain a label for an indicator variable with the teachings of Yamagami of wherein information gain further comprises determining a probability that the answer provided by the user to each question is accurate, and that a desired search result to the search query will be found in the set of documents represented within that answer. One would have been motivated to make such a modification to generate a decision tree that is used to determine an order of inquiries when candidates of classification results from a user’s answer to an inquiry made to the user in dialog are narrowed (Yamagami - [0001]).
Claim 20 corresponds to claim 10 and is rejected accordingly.
Claims 6, 7, 16 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Shah in view of Yamagami further in view of Boyan.
Regarding claim 6, Shah modified by Yamagami does not appear to teach:
wherein at least a portion of the data are manually labelled and the labelling engine applies inheritance of the labels to webpages associated with the manually labelled data
However, Boyan teaches:
wherein at least a portion of the data are manually labelled and the labelling engine applies inheritance of the labels to webpages associated with the manually labelled data (Boyan – source training documents may be tagged in response to user input. In response to tagging, a system may construct an automated agent to traverse an information source to extract its structured data into a database with a given schema. For example, tagging a few pages of a website may allow the system to construct a web-scraping agent that traverses the entire website to acquire and restructure its data [0077]. Also see [0080] where data tags are applied and a root type is defined (i.e. inheritance). A root type can be chosen arbitrarily from amongst the various entity types defined in the domain model, wherein a given root type can apply throughout a domain, or for each individual bucket [0080].)   
Accordingly, it would have been obvious to a person of ordinary skill in the art at the time the invention was effectively filed, having the teachings of Shah, Yamagami and Boyan before them, to modify the teachings of Shah and Yamagami of a labelling engine to apply to each datum within the corpus a label corresponding to each one of a plurality of predetermined indicator variables, each indicator variable relating to context of the respective data, and a clarification engine to: generate a decision tree using the set of search results, the decision tree comprising nodes corresponding to the indicator variables and edges corresponding to the labels, the decision tree generated to maximize information gain based on pruning the decision tree in response to obtaining a desired label for a selected indicator variable, each indicator variable corresponds to a question and each label of associated edges corresponds to an answer associated with the question, the information gain of obtaining answers corresponding to one of the indicator variables is a difference between an initial uncertainty and an average uncertainty that a user would give the answer, and prune the decision tree in response to a question posed to a user to obtain a label for an indicator variable, wherein each datum in the corpus comprises one or more webpages with the teachings of Boyan of wherein at least a portion of the data are manually labelled and the labelling engine applies inheritance of the labels to webpages associated with the manually labelled data. One would have been motivated to make such a modification to integrate scraped information into a comprehensive, consistently structured database (Boyan - [0005, 0007]).
Claim 16 corresponds to claim 6 and is rejected accordingly.
Regarding claim 7, Shah modified by Yamagami does not appear to teach:
wherein the labelling engine uses a trained supervised learning classifier for each of the indicator variables to label the data, the supervised learning classifier trained using a set of manually labelled data for training and testing
However, Boyan teaches:
wherein the labelling engine uses a trained supervised learning classifier for each of the indicator variables to label the data, the supervised learning classifier trained using a set of manually labelled data for training and testing (Boyan – each page in a tree may be manually or automatically assigned to a bucket of similarly formatted pages. For websites and information sources where the type of page returned by following a navigational element varies dynamically, the agent may use a classifier to automatically determine which bucket each page belongs to. To establish training data for such classifier, bucket identities can be assigned manually during the hand-tagging process, or can be inferred during that stage by an unsupervised clustering algorithm based on the features of the page [0088-0089].)  
Accordingly, it would have been obvious to a person of ordinary skill in the art at the time the invention was effectively filed, having the teachings of Shah, Yamagami and Boyan before them, to modify the teachings of Shah and Yamagami of a labelling engine to apply to each datum within the corpus a label corresponding to each one of a plurality of predetermined indicator variables, each indicator variable relating to context of the respective data, and a clarification engine to: generate a decision tree using the set of search results, the decision tree comprising nodes corresponding to the indicator variables and edges corresponding to the labels, the decision tree generated to maximize information gain based on pruning the decision tree in response to obtaining a desired label for a selected indicator variable, each indicator variable corresponds to a question and each label of associated edges corresponds to an answer associated with the question, the information gain of obtaining answers corresponding to one of the indicator variables is a difference between an initial uncertainty and an average uncertainty that a user would give the answer, and prune the decision tree in response to a question posed to a user to obtain a label for an indicator variable with the teachings of Boyan of wherein the labelling engine uses a trained supervised learning classifier for each of the indicator variables to label the data, the supervised learning classifier trained using a set of manually labelled data for training and testing. One would have been motivated to make such a modification to integrate scraped information into a comprehensive, consistently structured database (Boyan - [0005, 0007]).
Claim 17 corresponds to claim 7 and is rejected accordingly.
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RANJIT P DORAISWAMY whose telephone number is (571)270-5759. The examiner can normally be reached Monday-Friday 9:00 AM - 5:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mark Featherstone can be reached on (571) 270-3750. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/R.P.D./Examiner, Art Unit 2166                                                                                                                                                                                                        
/MARK D FEATHERSTONE/Supervisory Patent Examiner, Art Unit 2166