Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Compact Prosecution
Examiner would like to propose amending the independent claims to include the limitation: a label index; wherein the label index indicate which fields in the dataset include PII, the application can mask just those fields with PII as needed and  the application can access the data store storing the dataset fewer times, and less data can be transmitted to reduce bandwidth usage. 
      a user interface that provides feedback to a user by displaying reports about what data fields are labeled and with what probability each dataset is classified.  This amendment will overcome the current rejection. 

Claim Objections
Claim 5 is objected to because of the following informalities:  
           Line 4 reads “Selecting a label proposal test of the plurality of label proposal tests that are that is related to the primary key”. The repetition of the phrase “that are that is”  makes the claim not clear and therefore appropriate correction is required.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-21 are rejected under 35 U.S.C. 103 as being unpatentable over Walters et al. (US10459954) in view of  Procops et al. (US 20140108357). Refer to as Walter in view of Procops from here on in this document. 
Claim 1, Walters discloses a method implemented by a data processing system (Fig. 1, Col. 4, lines 52-55- System 100) for discovering a semantic meaning of data of a field included in one or more data sets, ( Col. 13, lines 48-50- based on the schema (label or field) the data element (semantic meaning) is discovered-indicated)  the method including: 
identifying a field included in one or more data sets, (Col. 14, lines 13-22, - thus the data schema (Label or field) is identified in the dataset) with the field associated with an identifier; (Col. 9 lines 49-50 “indicator of whether data element is actual data or synthetic)  and for that field profiling by a data processing system one or more data values of the field to generate a data profile; (Col. 14, lines 19-22- The data profiling module generates a data profile by identifying the data schema from the received dataset) 
(Col. 14 lines 13-19- thus “data profile module may identify a data schema.. where the data schema may include at least one of a label, a field or an index”--- thus based on a plurality of data schema at least one of them is applied) generating a set of label proposals; (Col. 13 lines 9-12 thus labels such as actual data, fully composed of synthetic data or partially composed of synthetic data are generated as the set of label proposals). 

    PNG
    media_image1.png
    198
    200
    media_image1.png
    Greyscale

Figure 1: Shows the label proposals which dataset can be classified into based on similar features.  

determining a similarity among the label proposals in the set of label proposals based at least on the similarity among the label proposals in the set, (Col. 12 lines 1-4 “similarities between dataset and a previously classified dataset are used to classify the dataset based on labels shown in Fig. 1 as shown above) selecting a classification; (Col. 13 lines 5-12- thus “dataset connector system classify dataset based on the data schema… or edges”- classifying dataset means a classification is selected for that dataset) 
(Col. 9 lines 56-58- thus “Clustered datasets (Classified data) may include graphical data” this means the Clustered datasets can be represented using a graphical data) 
(regarding the graphical user interface the secondary reference also addresses this limitation- please see Fig. 3, section 0032, lines 10-14 in Procops)
identifying one of the label proposals as identifying the semantic meaning; (identifying one of the labels set in fig. 1 above) and grouping (Col. 16 lines 35-36- thus storing the segmented cluster of  datasets in a data storage) the identifier of the field with the identified one of the label proposals that identifies the semantic meaning. (Col. 13, lines 9-12- thus based on the similarity in the schema of the datasets, the datasets may be classified as one of the labels (where the labels are either fully composed of actual data, fully composed of synthetic data or partially composed of synthetic data- see fig. 1 above)) 
(Understand the labels describes if the datasets are actual data or fully or partially synthetic data) 
Walter does not discloses storing the classification of the datasets in storage/database.
Procops discloses storing the classification of the datasets based on fields in storage/database. (Section 0032, lines 5-8- thus the source dataset are stored in a database table based on fields with similar features). 
 

Claim 2, Walters in view of Procops discloses wherein profiling the one or more data values of the field (Walters: Col.  9 lines 37-40- the data such as Social security are profiled) includes determining a format of a data value of the field. (Procops: Section 0044- thus the format for the data value is that the data have to be an integer) 
Claim 3, Walters in view of Procops discloses wherein profiling the data values of the field includes determining a statistical value representing the data values included in the field. (Walter: Col. 10 Lines 19-21- Thus the statistical profile of the dataset reads on the statistical value of the field) 
Claim 4, Walters in view of Procops discloses wherein the statistical value (Walter: Col. 10 lines 19-21)  comprises at least one of a minimum length of the data values of the field, a maximum length of the data values of the field, (Procops: Section 0047-0048- thus the maximum length of the data should be as specified by the user) a most common data value of the field, a least common data value of the field, a maximum data value of the field, and a minimum data value of the field.
Claim 5, Walters in view of Procops discloses wherein applying the plurality of label proposal tests includes determining that the field includes a primary key for a data set of the one or more data sets; (Walter: Col. 3 lines 10-14- “foreign Key”) and 
 (Walter: Col. 11 lines 7-10- thus labels associated with the candidate foreign key is selected for the dataset)
Claim 6, Walters in view of Procops discloses wherein applying the plurality of label proposal tests includes performing a metadata (Walter: Col. 4 lines 26-28 “metadata”) comparison of data values of the field to terms in a glossary of terms. (Walter: Col. 3 lines 35-36 data mapping reads on the metadata comparison, regarding glossary of terms, the directory disclosed in Col. 4 lines 26-28 reads on it.) 
Claim 7, Walters in view of Procops discloses wherein applying the plurality of label proposal tests includes determining from the data profile a pattern represented by the data values stored of the field (Walter: Col. 11 lines 25-27- thus datasets stored in clustered dataset reads on stored of the field)  determining a particular label that is mapped to the pattern and labeling the field with the particular label. (Walter: Col. 11 lines 20-27 “Data mapping maps the received dataset  based on similarities such as parent-child relationships (pattern represented by data stored in the field))
Claim 8, Walters in view of Procops discloses wherein applying the plurality of label proposal tests includes retrieving a list of values that are representative of a data collection; (Walter: Col. 11 lines 35-40- retrieving a data mapping model used for a dataset reads on the data collection) comparing the data values of the field to the list of values; determining, in response to the comparing, (Walter: Col. 11 lines 43-46 Mapping reads on comparing)  that a threshold number of the data values match the values of the list and in response to the determining, labeling the field with a particular (Walter: Col. 11 lines 43-46- thus “…the statistical similarity metric with one of the received datasets that meets a threshold criterion”- this means the mapping module maps/compares received dataset to a threshold value to cluster the dataset in that group or assign that label) 
Claim 9, Walters in view of Procops discloses wherein applying the plurality of label proposal tests includes generating at least two labels for the field; (Walter: Col. 8 lines 40-42- thus a label can have at least actual data, synthetic data field and relevant data field so at least two fields can be generated from the labels)  and 
determining whether the at least two labels are exclusive or inclusive of one another. (Walter: Fig. 4 element 430 shows that one dataset can have plurality of relationships meaning one dataset can be inclusive). 
Claim 10, Walters in view of Procops discloses further including determining, in response to applying the plurality of label proposal tests, a relationship between the field and another field of the one or more data sets. (Walter: Col. 12 lines 25-31- thus the Arrows and distance between discs showed on Fig. 4 represents data or field relationships between the dataset where dataset are connected).
Claim 11, Walters in view of Procops discloses wherein the relationship includes one of an indication that a first data value the field determines a second data value stored in the other field, (Walter: the shade shown in  Fig. 4 represents data that are stored in fields)  an indication that the first data value correlates to the second data value, or an indication that the first data value is identical to the second data value. (Walter: Col. 12 lines 27-32- thus “Arrows and distance between discs represents aspects of data relationships between the dataset and shading represents classification of the datasets” this means the arrows indicates that the dataset connected are identical. It also means that the data values are identical) 
Claim 12, Walters in view of Procops discloses wherein the plurality of label proposal tests are each associated with at least one weight value, (Walter: foreign key scores for the dataset- Col. 15 lines 1-3)  the method further including updating a weight value associated with at least one label proposal test; and 
reapplying the label proposal test to the data profile using the updated weight value. (Walter: Col. 2 lines 10-12- discloses using neural network models such as recurrent neural network and deep learning models which teaches constantly updating the parameters for classification which in this case is an example of the schema the foreign key scores- Col. 12 lines 20-25) 
Claim 13, Walters in view of Procops discloses further including training the plurality of label proposal tests using a machine learning process. (Walter: Col. 10 lines 53-58- thus the mapping modules includes machine learning models) 
Claim 14, Walters in view of Procops discloses the method further comprising retrieving from a data quality rules environment one or more data quality rules that are assigned to the label proposal specifying the semantic meaning (Walter: Col. 13 lines 51-53 actual data or synthetic data reads on the semantic meaning of the data) and assigning a data quality rule of the one or more data quality rules to the field. (Procops: Section 0078, lines 4-8- thus a list of fields has one or more validation rules which specifies which labels can be stored)  
Claim 15, Walters in view of Procops discloses wherein comparing the label proposals generated from the label proposal tests includes applying a score value to each label proposal for each label of the label proposals combining the score values associated with that label; (Walter: Col. 15 lines 21-25- thus “generating a plurality of edges between the selected dataset  and the received dataset based on foreign key scores.. this means each clustered dataset has an assigned key score/value) and ranking the labels according to the score value associated with each label. (Walter: pluralities of edges generated between the dataset based on score means the dataset are ranked or classified based on the scores)
Claim 16, Walters in view of Procops discloses receiving validation of the label proposals from the plurality of label proposal tests (Procops: Section 0036, lines 4-7 validation rules to the store the dataset) and responsive to receiving the validation weighting the plurality of label proposal tests with the label proposals. (Walter: Col. 14 lines 9-12 “label classifying the dataset as relevant to an analysis goal or topic”  this means the labeled dataset are scored) 
Claim 17, Walters in view of Procops discloses wherein the data store includes a data dictionary. (Walter: Col. 14 lines 17-19- thus the directory of words or data reads on the dictionary) 
Claim 18, Walters in view of Procops discloses outputting the label proposals to a data quality rules environment. (Procops: Section 0056, lines 3-5- thus user-specified validation rules of the dataset) 
Claim 19, Walters in view of Procops discloses reducing based on the identified one of the label proposals a number of errors for processing data for the field using data (Procops: Section 0064, thus the validation rule checks the correctness of the dataset being entered into the field and thus reduce wrong dataset in the wrong data field) from the data quality environment relative to another number of errors for processing the data for the field without using the identified one of the label proposals. (Walter: Col. 6 lines 12-15- thus the classification error are measured) 
Claim 20, Walters discloses a data processing system (Fig. 1, Col. 4, lines 52-55- System 100) for discovering a semantic meaning of a field included in one or more data sets, ( Col. 13, lines 48-50- based on the schema (label or field) the data element (semantic meaning) is discovered-indicated)  the system including: 
a data storage storing instructions; and at least one processor configured to execute the instructions stored by the data storage (Col. 3 lines 44-47 – thus Non-Transitory computer readable storage media that stores program instructions)  to perform operations including identifying a field included in one or more data sets, (Col. 14, lines 13-22, - thus the data schema (Label or field) is identified in the dataset) with the field having an identifier and for the field. (Col. 9 lines 49-50 “indicator of whether data element is actual data or synthetic) profiling, by a data processing system, one or more data values of the field to generate a data profile; (Col. 14, lines 19-22- The data profiling module generates a data profile by identifying the data schema from the received dataset) 
accessing a plurality of label proposal tests based on applying at least the plurality of label proposal tests to the data profile, (Col. 14 lines 13-19- thus “data profile module may identify a data schema.. where the data schema may include at least one of a label, a field or an index”--- thus based on a plurality of data schema at least one of them is applied) generating a set of label proposals; (Col. 13 lines 9-12 thus labels such as actual data, fully composed of synthetic data or partially composed of synthetic data are generated as the set of label proposals). 


    PNG
    media_image1.png
    198
    200
    media_image1.png
    Greyscale

Figure 2: Shows the label proposals which dataset can be classified into based on similar features.
determining a similarity among the label proposals in the set of label proposals; 
based at least on the similarity among the label proposals in the set, (Col. 12 lines 1-4 “similarities between dataset and a previously classified dataset are used to classify the dataset based on labels shown in Fig. 1 as shown above) selecting a classification; (Col. 13 lines 5-12- thus “dataset connector system classify dataset based on the data schema… or edges”- classifying dataset means a classification is selected for that dataset) 

based on the classification, rendering a graphical user interface that requests input in identifying a label proposal that identifies the semantic meaning or determining (Col. 9 lines 56-58- thus “Clustered datasets (Classified data) may include graphical data” this means the Clustered datasets can be represented using a graphical data) 
(regarding the graphical user interface the secondary reference also addresses this limitation- please see Fig. 3, section 0032, lines 10-14 in Procops)

identifying one of the label proposals as identifying the semantic meaning; (identifying one of the labels set in fig. 1 above) and 
the identifier of the field with the identified one of the label proposals that identifies the semantic meaning. (Col. 13, lines 9-12- thus based on the similarity in the schema of the datasets, the datasets may be classified as one of the labels (where the labels are either fully composed of actual data, fully composed of synthetic data or partially composed of synthetic data- see fig. 1 above)) 
(Understand the labels describes if the datasets are actual data or fully or partially synthetic data) 
Walter does not discloses storing the classification of the datasets in storage/database.
Procops discloses storing the classification of the datasets based on fields in storage/database. (Section 0032, lines 5-8- thus the source dataset are stored in a database table based on fields with similar features). 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching storing the dataset  

Claim 21, Walters discloses One or more non-transitory computer readable media storing instructions (Col. 3 lines 44-47 – thus Non-Transitory computer readable storage media that stores program instructions) for discovering a semantic meaning of a field included in one or more data sets, ( Col. 13, lines 48-50- based on the schema (label or field) the data element (semantic meaning) is discovered-indicated) the instructions being executable by one or more processors (Col. 3 lines 44-47) configured to perform operations including:
identifying a field included in one or more data sets, (Col. 14, lines 13-22, - thus the data schema (Label or field) is identified in the dataset) with the field having an identifier; (Col. 9 lines 49-50 “indicator of whether data element is actual data or synthetic) and for that field profiling by a data processing system, one or more data values of the field to generate a data profile; (Col. 14, lines 19-22- The data profiling module generates a data profile by identifying the data schema from the received dataset) 

accessing a plurality of label proposal tests; based on applying at least the plurality of label proposal tests to the data profile, (Col. 14 lines 13-19- thus “data profile module may identify a data schema.. where the data schema may include at least one of a label, a field or an index”--- thus based on a plurality of data schema at least one of them is applied) generating a set of label proposals; (Col. 13 lines 9-12 thus labels such as actual data, fully composed of synthetic data or partially composed of synthetic data are generated as the set of label proposals)



    PNG
    media_image1.png
    198
    200
    media_image1.png
    Greyscale

Figure 3: Shows the label proposals which dataset can be classified into based on similar features.  

determining a similarity among the label proposals in the set of label proposals; based at least on the similarity among the label proposals in the set, (Col. 12 lines 1-4 “similarities between dataset and a previously classified dataset are used to classify the dataset based on labels shown in Fig. 1 as shown above) selecting a  classification; (Col. 13 lines 5-12- thus “dataset connector system classify dataset based on the data schema… or edges”- classifying dataset means a classification is selected for that dataset) 
based on the classification, rendering a graphical user interface that requests input in identifying a label proposal that identifies the semantic meaning or determining that no input is required; (Col. 9 lines 56-58- thus “Clustered datasets (Classified data) may include graphical data” this means the Clustered datasets can be represented using a graphical data) 
(regarding the graphical user interface the secondary reference also addresses this limitation- please see Fig. 3, section 0032, lines 10-14 in Procops)
identifying one of the label proposals as identifying the semantic meaning; (identifying one of the labels set in fig. 1 above) and the identifier of the field with the identified one of the label proposals that identifies the semantic meaning. (Col. 13, lines 9-12- thus based on the similarity in the schema of the datasets, the datasets may be classified as one of the labels (where the labels are either fully composed of actual data, fully composed of synthetic data or partially composed of synthetic data- see fig. 1 above)) 
(Understand the labels describes if the datasets are actual data or fully or partially synthetic data) 
Walter does not discloses storing the classification of the datasets in storage/database.
Procops discloses storing the classification of the datasets based on fields in storage/database. (Section 0032, lines 5-8- thus the source dataset are stored in a database table based on fields with similar features). 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching storing the dataset grouped together in a database table under similar fields. The motivation is that retrieving the data at a later time will be faster. 

Cited Art
 
Seigel et al. (US20170161503) discloses large computer system, maintaining informa­tion security is a difficult task as, in many cases, a security system may have difficulties distinguishing legitimate activities from the unauthorized access of data. Currently, a risk associated with a user account may be determined by looking at the resources to which the user account has access, groups to which the user account belongs, and resources which the user account owns.
Redlich et al. (US20150199405) discloses a method of organizing and processing data in a distributed computing system. The computing system has a plurality of select content data stores for respective ones of a plurality of enterprise designated categorical filters which include content-based filters, contex­tual filters and taxonomic classification filters, all operatively coupled over a communications network.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Akwasi M Sarpong whose telephone number is (571)270-3438. The examiner can normally be reached Mon-Fri. 8:00am-4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/AKWASI M SARPONG/           Primary  Examiner, Art Unit 2675 
02/11/2022.