DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1, 15 and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recite, “computing, from each token using a processor and a memory, new data comprising a corresponding vector”, the limitation is indefinite as to what is being computed and what the result of computation is.  Is the new data the result of computing/processing each token? Is the new data the result of computing/processing each token and corresponding vector? Is the new data with corresponding vector token the result of computing/processing each token?  Further clarification is required.
Claims 15 and 20 recite similar limitation indicated above, therefore also required further clarification.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1, 4, 5, 7, 9, 11, 13-15, 18, 19 and 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Faruquie et al. (US 20110270808 A1).
Regarding claim 1, Faruquie discloses:
a method comprising: creating new data comprising a first token, from a first attribute-value pair in a record, the record comprising a set of rows, a row in the set of rows comprising a set of values wherein a value in the set of values corresponds to an attribute, the value and the attribute comprising the first attribute-value pair, and the first token comprising a text string corresponding to the first attribute-value pair, at least by (paragraph [0032] describes a standardized address extracted from addresses in each cluster, the values in the standardized address teaches creating new data attribute-value pairs in a record.  Because the cluster includes multiple similar addresses, which is interpreted as records comprising a set of rows (see Ref. 202, 202a-b of Fig. 2), and each address has multiple attribute value pair, such as house number, area, roads, cities states, pin and corresponding values (see para. 0025).  Furthermore the standardized address includes text/tokens corresponding to each of the attribute-
computing, from each token using a processor and a memory, new data comprising a corresponding vector, at least by (paragraph [0034] describes calculating a set of signatures for the address, the set of signatures are equivalent to vectors)
selecting, from the record, a target row, wherein a target attribute-value pair in the target row includes a value requiring correction, at least by (paragraph [0032] which discloses identifying one or more missing components of an address within the cluster of addresses)
determining, using a similarity measure, a set of most similar rows to the target row, wherein each row in the set of most similar rows to the target row has a corresponding similarity measure above a threshold similarity measure and wherein each row in the set of most similar rows includes the target attribute, at least by (paragraph [0034] describes comparing the address with other address and clustering the address with a threshold level of similarity between address which would include the target attribute)
determining, from values corresponding to the target attribute in the set of most similar rows, a replacement value,
and replacing, with the replacement value, the value requiring correction in the target row, at least by (paragraph [0034] the synonyms are used to standardize addresses which means replacing terms in the address with a synonym chosen as the standard term, it is further discloses the synonym are used to fill in missing terms and the like.
As per claim 4, claim 1 is incorporated and Faruquie further discloses:
wherein the set of most similar rows to the target row comprises a predetermined number of rows having the highest similarity measure, at least by (paragraph [0034] which discloses clustering address/rows based on a predetermined threshold number of similarity)
As per claim 5, claim 1 is incorporated and Faruquie further discloses:
wherein the vector comprises a set of real numbers and the set of real numbers comprises a predetermined number of elements, at least by (paragraph [0034] where set of signatures are interpreted as vectors, where the signature is further described as k signature, where k can be a the set of real numbers comprises a predetermined number/length of elements)
As per claim 7, claim 1 is incorporated and Faruquie further discloses:
wherein determining a set of most similar rows to the target row comprises: computing a first degree of similarity between a target primary key vector corresponding to a target primary key token in the target row and a second primary key vector corresponding to a second primary key token in a second row in the record, the target and the second primary key tokens having an attribute acting as a primary key; adding, to a set of similarities corresponding to rows in the record, the first degree of similarity; and selecting, from the set of similarities, the set of most similar rows, each row in the set of most similar rows having a corresponding degree of similarity within a tolerance of a specified degree of similarity, at least by (paragraph [0034] “Signatures capture sequence of n length character windows. It can then be determined if addresses share a threshold level of similarity 420. Two addresses that share at least some threshold number of n length character windows can be grouped together in a cluster 430. All the addresses that have at least some k signatures in common can belong to the same cluster”)
As per claim 9, claim 1 is incorporated and Faruquie further discloses:
wherein determining a set of most similar rows to the target row comprises: computing a first degree of similarity between a target vector corresponding to a target token in the target row and a second vector corresponding to a second token in a second row in the record, the target and the second tokens having the target attribute; adding, to a set of similarities corresponding to rows in the record, the first degree of similarity; and selecting, from the set of similarities, the set of most similar rows, each row in the set of most similar rows having a corresponding degree of similarity within a tolerance of a specified degree of similarity, at least by (paragraph [0034] “Signatures capture sequence of n length character windows. It can then be determined if addresses share a threshold level of similarity 420. Two addresses that share at least some threshold number of n length character windows can be grouped together in a cluster 430. All the addresses that have at least some k signatures in common can belong to the same cluster”)
As per claim 11, claim 1 is incorporated and Faruquie further disclose:
wherein determining a set of most similar rows to the target row comprises: computing a first degree of similarity between a target vector corresponding to the target row and a second vector corresponding to a second row in the record; adding, to a set of similarities corresponding to rows in the record, the first degree of similarity; and selecting, from the set of similarities, the set of most similar rows, each row in the set of nearest rows having a corresponding degree of similarity within a tolerance of a specified degree of similarity, at least by (paragraph [0034] “Signatures capture sequence of n length character windows. It can then be determined if addresses share a threshold level of similarity 420. Two addresses that share at least some threshold number of n length character windows can be grouped together in a cluster 430. All the addresses that have at least some k signatures in common can belong to the same cluster”)
As per claim 13, claim 1 is incorporated and Faruquie further disclose:
wherein determining a replacement value comprises selecting, as the replacement value, the most common non-null value in a set of values corresponding to the target attribute in the set of most similar rows, at least by (paragraph [0034] further discloses that from the cluster of similar address/row, synonyms are identified for terms in the addresses which are used to standardize the addresses therefor the identified standardized synonyms are replacement values that are most common non-null value)
As per claim 14, claim 1 is incorporated and Faruquie further disclose:
wherein determining a replacement value comprises selecting, as the replacement value, the centroid value in a set of values corresponding to the target attribute in the set of most similar rows, wherein the values in the set of values are numerical, at least by (paragraph [0034] further discloses that from the cluster of similar address/row, synonyms are identified for terms in the addresses which are used to standardize the addresses therefor the identified standardized synonyms are replacement values that are most common non-null value which are also the centroid value)
Claim 15 recite equivalent claim limitations as claim 1 above, except that they set forth the claimed invention as a computer usable program product comprising one or more computer-readable storage devices; Claim 20 recite equivalent claim limitations as claim 1 above, except that they set forth the claimed invention as a system; such they are rejected for the same reasons as applied hereinabove. 
As per claim 18, claim 15 is incorporated and Faruquie further disclose:
wherein the computer usable code is stored in a computer readable storage device in a data processing system, and wherein the computer usable code is transferred over a network from a remote data processing system, at least by (paragraph [0058] which describes its software stored on computer storage and accessible/downloadable over a network/stored on remote servers.)
As per claim 19, claim 15 is incorporated and Faruquie further disclose:
wherein the computer usable code is stored in a computer readable storage device in a server data processing system, and wherein the computer usable code is downloaded over a network to a remote data processing system for use in a computer readable storage device associated with the remote data processing system, at least by (paragraph [0058] which describes its software stored on computer storage and accessible/downloadable over a network/stored on remote servers.)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 2, 3, 16 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Faruquie in view of Lester et al. (US 20120117036 A1).
As per claim 2, claim 1 is incorporated and Faruquie fails to disclose:
wherein the record comprises a first row, wherein the first row has a degree of correctness that is less than a preset threshold degree of correctness.
However, Lester teaches the above limitations at least by (paragraph [0089] which discloses confidence scores associating degree of precision or correctness for incomplete or inaccurate contacts). 
Therefore it would have been obvious to one of the ordinary skill in the art at the time of the invention filed to incorporate the teaching of Lester into the teaching of Faruquie as they relate to correcting incorrect/incomplete records and one of the ordinary skill in the art would have been motivated to use such a modification for the purpose of providing a measure of 
As per claim 3, claim 1 is incorporated and Faruquie fails to disclose:
wherein the record comprises a first row, wherein the first row has a degree of completeness that is less than a preset threshold degree of completeness.
However, Lester teaches the above limitations at least by (paragraph [0089] which discloses confidence scores associating degree of precision or correctness for incomplete or inaccurate contacts). 
Therefore it would have been obvious to one of the ordinary skill in the art at the time of the invention filed to incorporate the teaching of Lester into the teaching of Faruquie as they relate to correcting incorrect/incomplete records and one of the ordinary skill in the art would have been motivated to use such a modification for the purpose of providing a measure of confidence that a record may be correct or incorrect, which improves the system’s ability to suggest actions related to correcting error based on the confidence measure as taught by Lester in para. 0089.
Claims 16 and 17 recite equivalent claim limitations as claims 2 and 3 above, except that they set forth the claimed invention as a computer usable program product comprising one or more computer-readable storage devices; such they are rejected for the same reasons as applied hereinabove. 

Claim 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Faruquie in view of Sadkin et al. (US 20140222793 A1).
As per claim 6, claim 5 is incorporated and Faruquie fails discloses:
wherein each vector comprises a set of real numbers, each real number being between -1 and 1.
However, Sadkin teaches the above limitations at least by (Fig. 6 and corresponding para, which discloses confidence scores as vectors with real numbers being between -1 and 1). 
Therefore it would have been obvious to one of the ordinary skill in the art at the time of the invention filed/made to incorporate the teaching of Sadkin into the teaching of Faruquie as they relate to correcting incorrect/incomplete records and one of the ordinary skill in the art would have used real numbers between -1 and 1 as merely an alternative design choice in representing the vector.  
Claim 8, 10, 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Faruquie in view of Kazi et al. (US 20120226692 A1)
As per claim 8, claim 7 is incorporated and Faruquie fails to disclose:
wherein the degree of similarity between a target primary key vector corresponding to a target primary key token in the target row and a second primary key vector corresponding to a second primary key token in a second row in the record comprises a cosine similarity, the cosine similarity comprising a cosine of an angle between two vectors.
Kazi teaches the above limitations at least by (paragraph [0080] “present invention may use several methods for computing the spatial proximity of the vectors, such as the Cosine, Dice and Jacquard similarity models.”). 
Therefore it would have been obvious to one of the ordinary skill in the art at the time of the invention filed/made to incorporate the teaching of Kazi into the teaching of Faruquie as they relate to correcting incorrect/incomplete records and one of the ordinary skill in the art would have used cosine similarity measure as merely an alternative design choice in determining similarity as Kazi describes other methods which would also provide meausures of similarity see para. 0080.  
As per claim 10, claim 9 is incorporated and Faruquie fails to disclose:
wherein degree of similarity between a target vector corresponding to a target token in the target row and a second vector corresponding to a second token in a second row in the record comprises a cosine similarity, the cosine similarity comprising a cosine of an angle between two vectors.
However, Kazi et al. (US 20120226692 A1) teaches the above limitations at least by (paragraph [0080] “present invention may use several methods for computing the spatial proximity of the vectors, such as the Cosine, Dice and Jacquard similarity models.”). 
Therefore it would have been obvious to one of the ordinary skill in the art at the time of the invention filed/made to incorporate the teaching of Kazi into the teaching of Faruquie as they relate to correcting incorrect/incomplete records and one of the ordinary skill in the art would have used cosine similarity measure as merely an alternative design choice in Kazi describes other methods which would also provide meausures of similarity see para. 0080.  
As per claim 12, claim 7 is incorporated and Faruquie fails to disclose:
wherein the degree of similarity between a target vector corresponding to the target row and a second vector corresponding to a second row in the record comprises a cosine similarity, the cosine similarity comprising a cosine of an angle between two vectors.
However, Kazi et al. (US 20120226692 A1) teaches the above limitations at least by (paragraph [0080] “present invention may use several methods for computing the spatial proximity of the vectors, such as the Cosine, Dice and Jacquard similarity models.”). 
Therefore it would have been obvious to one of the ordinary skill in the art at the time of the invention filed/made to incorporate the teaching of Kazi into the teaching of Faruquie as they relate to correcting incorrect/incomplete records and one of the ordinary skill in the art would have used cosine similarity measure as merely an alternative design choice in determining similarity as Kazi describes other methods which would also provide meausures of similarity see para. 0080.  
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENNIS TRUONG whose telephone number is (571)270-3157.  The examiner can normally be reached on Monday - Friday 7:00 am - 3:30 pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Neveen Abel-Jail can be reached on 571-270-0474.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/DENNIS TRUONG/Primary Examiner, Art Unit 2152