DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-10 remain pending and are ready for examination.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 01/22/2021 was filed. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.


Double Patenting
The nonstatutory double patenting rejections are based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the "right to exclude" granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejections are appropriate where the claims at issue are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the reference application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP §§ 706.02(1)(1) - 706.02(1)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/forms/. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA '25, or PTO/AIA '26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to 
http://www.uspto.gov/patents/process/fil e/efs/g uid ance/e TD-info-1.jsp. 

Claims 1-10 provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of co-pending Application No. 16/548,503. Although the claims at issue are not identical, they are not patentably distinct from each other because instant claims are computer program product and system claims corresponding to the copending method claims. This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented. 
In addition, Claims 1-8 non-provisionally rejected on the ground of non-statutory double patenting as being unpatentable over claims 19-25, 27-34 and 36-38 of US. Patent No. 10,585864 and claims 1-6, 8-10 of US. Patent No. 10,585865.  This is a non-provisional nonstatutory double patenting rejection.
Although the conflicting claims are not identical, they are not patentably distinct from each other because all the claimed limitations recited in the instant application are found in the US. Patent No. 10,585865. Claim 1 of the instant application recites the limitation “comparing the data” whereas claim 1 of the US. Patent No. 10,585865 is more specific in reciting the same limitation, i.e. “comparing, by a processor, the data”.



Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: 
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole 

Claims 1-6  are rejected under 35 U.S.C. 103 as being unpatentable over Park et al., U.S. Patent No: 8266115 A1 (Hereinafter “Park”) in view of Papotti et al., U.S. Pub No: US 20160154830 A1 (Hereinafter “Papotti”).

Regarding claim 1, Park discloses A method for determining a data standardization score for an attribute of a dataset, the method comprising:
providing attribute metadata descriptive of the attribute (see at least col.8 lines [7-17] and fig. 2 step 202, wherein when the process 200 begins (201), the client device receives metadata associated with two or more items of electronic content (202). The item of electronic content may be, for example, a movie or other video recording, an audio recording, a ring tone, an electronic book, a whitepaper, a periodical, a video game, a software program, wallpaper, a mobile device application, or some other type of electronic content. The metadata may include information Such as a title, an author, an artist, an album, a release date, a publication date, a filename, and other information pertaining to the electronic content);
providing a data standardization score algorithm for finding potential duplicates in attribute values and calculating a data standardization score accordingly, the calculated data standardization score reflecting whether data quality of attribute values would increase if a standardization rule is applied to the attribute values (see at least col.8 lines [53-64] and fig. 2 step 202, wherein client device generates a score based on the comparison between the first and second metadata (206). In some implementations, the client device compares first and second ; 
comparing the data standardization score value to a predefined criterion to determine whether data standardization is to be applied on the attribute (see at least col. 6 lines [62-67] and fig.2 step 208, wherein The score may be compared to a predetermined threshold value (208). If the score does not satisfy (e.g., is lower than) the threshold value, the two items may contain different content, and thus the client device may display both items (209) and the process ends (211). If the score satisfies (e.g., is equal to or greater than) the threshold value, the two items of electronic content may be considered to be potentially duplicates (e.g., the two items are the same song or same video clip or same book). The client device may then display either the first or second item of electronic content (210)).
applying data standardization on the attribute to transform data to a predefined format in response to determining data standardization is to be applied on the attribute (Park, col.10 line 21 -28, teach the potential duplicate item may be displayed in a format different from the other songs, for example, a different color, font, indentation, or other format. The song4 324 received from the server 304 
Park fails to explicitly disclose determining, based on the metadata for the attribute, whether an indication to carry or not to carry out standardization is available for at least part of attribute values of the dataset; 
in response to finding the indication to carry out standardization, setting a respective value for the data standardization score; 
in response to not finding the indication to carry out standardization, running the data standardization score algorithm on the at least part of attribute values of the dataset.
In the same field of endeavor, Papotti discloses that determining, based on the metadata for the attribute, whether an indication to carry or not to carry out standardization is available for at least part of attribute values of the dataset (see at least paragraph [0030, 0035-0036] and fig.2, wherein based on metadata, an estimated effort is calculated which then will be used to determine preforming or not preforming data integration which include data cleansing/standardization); 
in response to finding the indication to carry out standardization, setting a respective value for the data standardization score (see at least paragraph [0030,0035-0036] and fig.2, wherein the measured effort represented as amount of work in hours or days or in a monetary unit can be the standardization score); 
in response to not finding the indication to carry out standardization, running the data standardization score algorithm on the at least part of attribute values of the dataset (see at least paragraph [0030,0035-0036] and fig.2, wherein the measured effort represented as amount of work in hours or days or in a monetary unit can be the standardization score).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to modify the compare metadata 204 functionality of Park invention to incorporate using metadata to do determination on whether to applied standardization for attribute/content or not, since doing so would have achieved the desirable result of providing improved methods and systems for estimating data integration and cleansing effort (see Papotti paragraph [0005]).

Regarding claim 2, the combination of Park and Papotti teach all the features with respect to claim 1 as outlined above. The combination of Park and Papotti further disclose that wherein the attribute values are distinct values of the attribute that are obtained by a deduplication algorithm (Park, see at least col. 1 lines [39-48]).

Regarding claim 3, the combination of Park and Papotti teach all the features with respect to claim 1 as outlined above. The combination of Park and Papotti further disclose that wherein the attribute values are all attribute values of the attribute in the dataset (Park, see at least col. 1 lines [39-48], metadata that identify an item of electronic content more uniquely, such as a title, album name, or release date, may increase the score to a greater extent than matches of metadata that identify the item of electronic content less uniquely, such as a track number, genre, or the first letter of the artist's name. The metadata for all items of electronic content may be compared against .

Regarding claim 4, the combination of Park and Papotti teach all the features with respect to claim 1 as outlined above. The combination of Park and Papotti further disclose that providing a set of criterions Park, see at least (col. 5 lines [63-67]-col.6 lines [1-12]), wherein the determining, based on the metadata for the attribute, of whether the indication to carry or not to carry out standardization is available comprises: checking each of the criterions for the values of the attribute (Park, see at least (col. 5 lines [63-67]-col.6 lines [1-12]), wherein A score 139 may be generated using the comparisons of the metadata fields. The score may be a function of the correspondence between matching fields. For example, the score may be a weighted or unweighted aggregation of the number of fields that match between the two files).

Regarding claim 5, the combination of Park and Papotti teach all the features with respect to claim 4 as outlined above. The combination of Park and Papotti further disclose that wherein the set of criterions comprises one or more of the following: 
the attribute values are resulting from a data standardization algorithm; 
the attribute values are resulting from an ETL process that is applied on source data that has been standardized (Papotti, see at least fig. 2 where ETL); 
the attribute is representing a primary or foreign key of the dataset; the attribute values have a predefined data class; the attribute has similar characteristics as another attribute of the dataset, wherein values of the other attribute are standardized; a number of different formats of the attribute is above a number of formats threshold; an average length of the attribute values is above a length threshold; an average number of words of the attribute is above a number of words threshold; and a fraction of distinct values is above a fraction threshold (Papotti, see at least paragraph [0070] and fig.3a).

Regarding claim 6, the combination of Park and Papotti teach all the features with respect to claim 1 as outlined above. The combination of Park and Papotti further disclose that wherein the data standardization score algorithm comprises an algorithm for calculating similarity between attribute values and calculating the score based on the similarities (Park, see at least col.5 lines [58-62], wherein algorithms may be used to determine a similarity between the two values, for example, accounting for typographical errors, where higher levels of similarity may affect a score more profoundly than lower levels of similarity).


Allowable Subject Matter

Claims 7-10 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Response to Arguments
	Applicant’s arguments regarding the 35 U.S.C. 103 rejection have been considered but are not persuasive.

(1) Applicant argues that Park and Papotti do not teach or suggest “providing a data standardization score algorithm for finding potential duplicates in attribute values and calculating a data standardization score accordingly, the calculated data standardization score reflecting whether data quality of attribute values would increase if a standardization rule is applied to the attribute values” as recited in claim 1.

(1) Examiner respectfully disagrees.
- Park, in col.9 line 4-9, teach the weighting may reflect a confidence that matching values indicate potentially duplicate content. Furthermore col.9 line 10-19, discloses a score determine if two items of electronic content may be considered to be potentially duplicates. The score may be compared to a predetermined threshold value
(208) . If the score does not satisfy (e.g., is lower than) the threshold value, the two items may contain different con tent, and thus the client device may display both items (209)    and the process ends (211). If the score satisfies (e.g., is equal to or greater than) the threshold value, the two items of electronic content may be considered to be potentially duplicates (e.g., the two items are the same song or same video clip or same book). The client device may then display either the first or second item of electronic content.


(2) Examiner respectfully disagrees.
-    Papotti, paragraph [0030, 0035-0036] and fig.2, teach, wherein based on metadata, an estimated effort (corresponds to the indication) is calculated which then will be used to determine preforming (carry) or not preforming (not carry) data integration which include data cleansing/standardization.

(3) Applicant argues that Park and Papotti do not teach or suggest “setting a respective value for the data standardization score” as recited in claim 1.
(3) Examiner respectfully disagrees.
-    Papotti, paragraph [0030, 0035-0036] and fig.2, teach the measured effort represented as amount of work in hours or days or in a monetary unit which corresponds to the value for the data standardization score. Wherein based on metadata, an estimated effort (corresponds to the indication) is calculated which then will be used to determine preforming (carry) or not preforming (not carry) data integration which include data cleansing/standardization.

(4) Applicant argues that Park and Papotti do not teach or suggest “running the data standardization score algorithm on the at least part of attribute values of the dataset” as recited in claim 1.

- Papotti, paragraph [0030, 0035-0036] and fig.2, teach the measured effort represented as amount of work in hours or days or in a monetary unit which corresponds to the value for the data standardization score. Wherein based on metadata, an estimated effort (corresponds to the indication) is calculated which then will be used to determine preforming (carry) or not preforming (not carry) data integration which include data cleansing/standardization.

(5) Applicant argues that Park and Papotti do not teach or suggest “applying data standardization on the attribute to transform data to a predefined format in response to determining data standardization is to be applied on the attribute” as recited in claim 1.
(5) Examiner respectfully disagrees.
- Park, col.10 line 21-28, teach the potential duplicate item may be displayed in a format different from the other songs, for example, a different color, font, indentation, or other format. The song4 324 received from the server 304 may, for example, have a different song title that, when normalized, is determined to be the same as the song title of the song4314 received from the additional client device 308. Wherein the normalization corresponds to the standardization being applied.

(6) Applicant argues that Park and Papotti do not teach or suggest “wherein the set of criterions comprises one or more of the following: the attribute values are resulting from a data standardization algorithm; the attribute values are resulting from an ETL process that is applied on source data that has been standardized; the attribute 
(6) Examiner respectfully disagrees.
-    First, the claim, as currently presented, is required ONLY one or more the set of criterions.
-    Second, Papotti, paragraph [0070] for example, discloses that schemas define a set of constraints, such as primary keys (e.g., id in records), foreign keys (record in tracks, represented with dashed arrows), and not nullable values (title in tracks), which corresponds to criterion : the attribute is representing a primary or foreign key of the dataset, as claim.

(7) Applicant argues that Park and Papotti do not teach or suggest “Examiner fails to provide a rational underpinning for modifying Park with Papotti to include the missing claim limitations of claims 1 and 11.
(7) Examiner respectfully disagrees.
- it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to combine the systems of Park and Papotti, since doing so would have achieved the desirable result of providing improved methods 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAHER N ALGIBHAH whose telephone number is (571)272-0718.  The examiner can normally be reached on Monday-Thursday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aleksandr Kerzhner can be reached on (571) 270-1760.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/MAHER N ALGIBHAH/           Examiner, Art Unit 2165   

/ALEKSANDR KERZHNER/           Supervisory Patent Examiner, Art Unit 2165