Response to Amendment
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is in response to amendment filed on January 11, 2021.  Claims 1-12, 14-29 are presented for examination.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 1/11/21 has been entered.
 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-7 and 11, 12, 14-29 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Guha (USPN. 2009/0164416).

1 and 19.    A computer program product comprising a non-transitory computer readable storage medium retaining program instructions, which program instructions when read by a processor, cause the processor to perform a method comprising (fig. 1, Guha):
Obtaining a labeled dataset from a plurality of data sources including at least one structured data source, wherein each instance in the labeled dataset comprising one or more attributes and representing an entity having an associated entity identifier (figs. 1 and 2, info repository 104 of unstructured data to loosely search data, pars. 4, 6, 24 and 27);
(Note: initial set of unstructured data is used to create an initial classification using combination of keywords and the like, see par. 20, “unstructured data”, “initial classification”, “keywords” i.e., products or class, and any document such as structured emails or other unstructured word documents are mapped to index to a relevance class, pars. 4 and 27)
Automatically and without user intervention generating a query based on at least one attribute of at least one instance in the dataset and the entity identifier (pars. 6, 24, 5 and 27, query based on known attribute of document/keyword, note that if user does not provide explicit feedback then implicit feedback is collected by the system by reviewing the user user’s search history and the like, par. 25);
(Note: query is generated once the initial classification is preliminary or complete, see par. 23, the user query is submitted comprising attribute such as “name of a data object… or keyword”, see par. 41 and fig. 5A)
providing the query to a search engine, wherein the search engine is configured to provide one or more results from an unstructured data corpus that match the query (fig. 4, item 408, pars. 38 and 
wherein the query is restricted to search within entities of specific types, or the query has a concatenated domain restrictor, wherein the domain restrictor is a search engine operator  configured to limit a domain from the unstructured data corpus from which results of the query are obtained (par. 41, pharma domain comprising pharma class is searched thus reducing the search within entities of specific type (pharma));
obtaining the one or more results (fig. 5 par. 41, search results and second set of unstructured data);
automatically determining a hypothesis for the dataset, wherein the hypothesis is based on a new attribute whose value is defined based on the one or more results (fig. 6, items 604-612, updated classifier data set based on user explicit/implicit feedback, wherein explicit and implicit encompasses manual and automated steps, pars. 46-47); 
and outputting an output, wherein the output is based on the hypothesis (par. 47, query results are changed based on updated relevance classes to adopt to classification mapping).

2.    The computer program product of Claim 1, wherein said generating comprises generating a query based on a single instance, whereby auxiliary unstructured data relating to an entity represented by the 

3.    The computer program product of Claim 2, wherein the hypothesis is that the one or more results comprise at least one result that comprises a term, wherein the new attribute is indicative of an existence of a result that comprises the term (pars. 46-47 query results are changed based on updated relevance classes to adopt to classification mapping).

4.    The computer program product of Claim 1, wherein said generating comprises generating a query based on a plurality of instances, whereby auxiliary unstructured data relating to a plurality of entities represented by the plurality of instances is obtained and used for determining a value for the new attribute for the plurality of instances (pars. 6, 24 and 27, query based on known attribute of document/keyword thus comprise many instances).

5.    The computer program product of Claim 4, wherein the hypothesis is an inclusion of an entity identifier of an instance within a document in the auxiliary unstructured data is indicative of the instance having a property, wherein the new attribute is indicative whether the entity identifier of an instance is included within a document in the auxiliary unstructured data (par. 41, domain searching).

6.    The computer program product of Claim 4, wherein the dataset is a labeled dataset, wherein the plurality of instances are instances sharing a same label, whereby the auxiliary unstructured data is potentially indicative of features relating to the same label (par. 18, mapping indexed attributes of the one or more data objects to the class label and user interaction).


determining one or more potential hypotheses, wherein each of the one or more potential hypotheses is based on a different new attribute whose value is based on at least a portion of the unstructured data corpus ((pars. 38 and 40, mapping attributes of the sample set of unstructured data to class labels); and
for each of the potential hypotheses, validating or refuting the potential hypothesis based on the labeled dataset, whereby determining the hypothesis by identifying a potential hypothesis that is validated (fig. 4, user feedback of data validation based on interaction).

11.    The computer program product of Claim 1, wherein said generating comprises generating a query based on an entity identifier of at least one instance in the dataset, wherein the entity identifier is extracted, at least partially, from the at least one instance. (pars. 41, query based on known attribute of document/keyword i.e., stock, thus comprise many instances, see also pars. 46-47).

12.    The computer program product of Claim 1, wherein said generating the query comprises concatenating a type restrictor, wherein the type restrictor is a search engine operator configured to limit a type of unstructured data which can match the query (pars. 41, 45-47, query executed against a set of data such as pharma.)

14.    The computer program product of Claim 1, wherein said obtaining the one or more results comprises obtaining, from the search engine, a results page comprising one or more lists of links to the one or more results and traversing the links of the one or more lists of links to obtain the one or more results (pars. 41-42, result set and feedback).

15.    An apparatus comprising a processor and a memory, wherein said memory retaining the computer program product of Claim 1 (fig. 1-3).

16.  A computer program product comprising a non-transitory computer readable storage medium retaining program instructions, which program instructions when read by a processor, cause the processor to perform a method comprising (fig. 1):
Obtaining a labeled dataset from a plurality of data sources including at least one structured data source, wherein each instance in the labeled dataset comprising one or more attributes and representing an entity having an associated entity identifier (figs. 1 and 2, info repository 104 of unstructured data to loosely search data, pars. 4, 6, 24 and 27);
(Note: initial set of unstructured data is used to create an initial classification using combination of keywords and the like, see par. 20, “unstructured data”, “initial classification”, “keywords” i.e., products or class, and any document such as structured emails or other unstructured word documents are mapped to index to a relevance class, pars. 4 and 27)
obtaining from the dataset, a plurality of hypotheses (figs. 5A-5B and 6, items 604-612, updated classifier data set based on user explicit/implicit feedback, pars. 46-47);
obtaining a set of keywords from the plurality dataset (fig. 6, item 610, all queries are responded to based on updated information, pars. 41, 45-47); 
(Note: query is generated once the initial classification is preliminary or complete, see par. 23, the user query is submitted comprising attribute such as “name of a data object… or keyword”, see par. 41 and fig. 5A)
automatically and without user intervention generating a query based on the keywords and the entity identifier  (pars. 6, 24, 5 and 27, query based on known attribute of document/keyword, note that 
(Note: query is generated once the initial classification is preliminary or complete, see par. 23, the user query is submitted comprising attribute such as “name of a data object… or keyword”, see par. 41 and fig. 5A)
providing the query to a search engine, wherein the search engine is configured to provide one or more results from an unstructured data corpus that match the query  (fig. 4, item 408, pars. 38 and 41, search results and second set of unstructured data), wherein the unstructured data corpus is other than the dataset (par. 21, “unstructured data can be performed by preparing a sample random small set, for example, 5 %”, this implies the initial set used to classify data is very small, par. 24, “Once the initial classification 114 is complete, the user 102 inputs his/her search query to retrieve information from the unstructured data”, this implies that the unstructured data set is other than the initial classification. See further Fig. 5A, elements 502 and 504 and par. 41 that clearly differentiates initial classification of data followed by user query) and
wherein the query is restricted to search within entities of specific types, or within specific domains (par. 41, pharma domain comprising pharma class is searched thus reducing the search within entities of specific type (pharma));
obtaining the one or more results (fig. 5A, result);
augmenting at least one instance with a new attribute, wherein a value of the new attribute is computed based on the one or more results (par. 47, query results are changed based on updated relevance classes to adopt to classification mapping).



28 and 29.  The method of claim 1, wherein said operations are performed iteratively for each of the one or more attributes (par. 41, interactive querying and results in combination with learning classifying).

Regarding method claims 20-26, they comprise substantially the same subject matter as rejected product claims 2-15, and are therefore rejected to on the merits.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 8-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Guha (USPN. 2009/0164416) in view of Wilson et al (USPN. 2018/0144269).


encoding features for the labeled instances of the labeled dataset, wherein the encoded features comprise at least one feature encoded based on the new attribute, training a predictive model using the encoded features of the labeled dataset, obtaining an unlabeled instance (par. 18, mapping indexed attributes to class label and updating the sample set, note that the updated sample set is unlabeled);
encoding features for the unlabeled instance, wherein the encoded features comprise the at least one feature encoded based on the new attribute and applying the predictive model on the encoded features of the unlabeled instance to predict a label thereof (par. 40, adaptive classification adjusts the data set and performs reclassification of the data set and or attributes). 
To the degree that Guha’s encoding features differs from that claimed, it is well known in the field of art that labeling data comprises encoding data.  One such system, Wilson teaches encoding features by adding labels such as “0” or “1” labels (par. 41, encoding features by adding labels such as “0” and “1” to convert data, Wilson).
It would have been obvious to one of ordinary skill in the art at the effective filing time of the application to encode data labeled and data unlabeled in Guha system by assigning labels including zeros and ones (par. 41, assign labels including zeros and ones, Wilson).  One would have been motivated to assign labels to any data set to converting and manage data (par. 41, converting data, Wilson).   

Regarding claim 9, Guha/Wilson teach wherein said encoding features for the unlabeled instance comprises:

obtaining from the search engine, at least one result that matches the second query (item 612 and par. 47, results are changed, Guha); and
determining a value for the new attribute for the unlabeled instance, based on the at least one result (fig. 6, item 612, par. 47, collect feedback on new result set to adapt classification mapping, Guha).

Regarding claim 10, Guha/Wilson teach wherein said encoding features for the unlabeled instance comprises determining a value for the new attribute for the unlabeled instance based on the one or more results of the query, whereby said encoding for the unlabeled instance is performed without an invocation of the search engine  (par. 41, encoding features by adding labels such as “0” and “1” to convert data, Wilson and fig. 6, Guha).

Response to Arguments
Applicant's arguments filed 1/11/21 have been fully considered but they are not persuasive.  See comments below:

Applicant alleges the currently amended claim requires that plurality of data sources and structured data.

Examiner disagrees.  
Initial set of unstructured data is used to create an initial classification using combination of keywords and the like, see par. 20, “unstructured data”, “initial classification”, “keywords” i.e., products 

Applicant alleges the amended claim feature of domain restrictor/search within entities of specific types is not taught by specific pharma domain of par. 41.
Examiner disagrees.
Pharma domain comprising pharma class is searched thus reducing the search within entities of specific type (pharma).  Other entity types are not searched as they do not meet the specific entity class/domain.

Applicant alleges the automated function without user intervention is not taught.
Examiner disagrees.
Updated classifier data set based on user explicit/implicit feedback, wherein explicit and implicit encompasses manual and automated steps of classifying and querying, see pars. 46-47.  When user does not directly make selections, the system reviews the search history and analyzes data for classification and other purpose.
	All allegations are believed moot.


Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARCIN R FILIPCZYK whose telephone number is (571)272-4019.  The examiner can normally be reached on M-F 7-4 EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, BORIS GORNEY can be reached on 571-270-5626.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






March 19, 2021
/MARCIN R FILIPCZYK/Primary Examiner, Art Unit 2158