DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

In response to Applicant’s claims filed on December 11, 2020, claims 1-27 are now pending for examination in the application.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-5, 10, 13-19, and 24-27 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sloan (US Patent No. 9953043) in view of Kabra et al. (US Pub. No. 20180137151).

With respect to claim 1, Sloan teaches a computer implemented method for cataloging database metadata using a signature matching process, comprising:

receiving an input name to be matched to a key in a seed table (“an initial code signature,” See Column 1 Lines 58-67 and “rules from the artifact seeded by the signature to obtain a list of terminology to code mappings,” See Column 1 Lines 58-67); 



identifying a matching key by matching any combination of the first fingerprint, the second fingerprint, and the third fingerprint against keys in the seed table (“matching the instance code signature to one or more exemplar code signatures,” See Column 2 Lines 5-9); and 
cataloging the metadata with the matching key as a tag (“indexing of the article's metadata and then its content using all-encompassing clinical taxonomies, for example SNOMED, MeSH, or ICD-10,” See Column 5 Lines 1-5).  Sloan teaches generating, based on the received input name, a second fingerprint using a predetermined pronunciation schema, wherein the second fingerprint is a phonetic fingerprint (“Converting the data to phonetic (like Metaphone) during preprocessing may reduce drastically the number of bigrams generated and thus make the clustering much faster,” See Paragraph 34).
However, Kabra et al. teaches generating, based on the received input name, a second fingerprint using a predetermined pronunciation schema, wherein the second fingerprint is a phonetic fingerprint (“Converting the data to phonetic (like Metaphone) during preprocessing may reduce drastically the number of bigrams generated and thus make the clustering much faster,” See Paragraph 34);

generating a third fingerprint by decomposing the second fingerprint into a second set of n-grams (“According to one embodiment, the bigram is a sequence of two or more adjacent elements or characters of the attribute value. The present method may be applied using N-grams as described herein with the 2-grams, wherein the N-gram is a sequence of N adjacent characters of an attribute value,” See Paragraph 56).
Therefore, it would have been obvious before the effective filing date of invention was made to a person having ordinary skill in the art to modify Sloan (fingerprinting) with Kabra et al. (data 

The Sloan reference as modified by Kabra et al. teaches all the limitations of claim 1.  With respect to claim 2, Kabra et al. teaches the computer implemented method of claim 1, wherein the tag is description of the input name  (“The metadata 161 of the attribute 167 may for example comprise or stores values of parameters or variables that describe the attribute 167. For example, the metadata 161 may comprise a Boolean that indicates whether this column or attribute 167 can be null; the type of the attribute 167; Boolean indicating whether this column or attribute 167 is a primary key. The metadata 161 may further contain the data class of the attribute 167, references to external terms or tags linked to the attribute 167, information indicating where the values of the attribute 167 come from (e.g., data lineage information), indication if a standardization or cleansing already occurred on the values of the attributes 167 before being stored in the dataset 127 and statistical information on the values of the attribute 167 and their characteristics,” See Paragraph 81).

The Sloan reference as modified by Kabra et al. teaches all the limitations of claim 1.  With respect to claim 3, Kabra et al. teaches the computer implemented method of claim 1, wherein the seed table includes pairs of keys and values related to input names and their corresponding tags (“the attribute representing a primary (PK) or foreign key (FK) of the dataset, the attribute values have a predefined data class, the attribute does not have similar characteristics as another attribute of the dataset, wherein values of the other attribute are standardized, the number of different formats of the attribute is lower than a number of formats threshold, the average length of the values of the attribute is lower than a length threshold, the average number of words of the attribute is lower than a number of words threshold, the fraction of the distinct values of the attribute is lower than fraction threshold. This embodiment may provide as much conditions as possible that may eradicate the need for the column to be considered for data standardization and the need for data standardization can return a very low number,” See Paragraph 21).

The Sloan reference as modified by Kabra et al. teaches all the limitations of claim 1.  With respect to claim 4, Kabra et al. teaches the computer implemented method of claim 3, wherein the input names are column names of one or more data sources, and wherein values in the seed tables are tags of previously matched column names (“The column names could be found in standardization dictionaries (e.g. that is part of the metadata 161) and the column with those names could be ignored. 0.0 can be returned for these. Thus, if column 167 is already standardized step 417 may be performed; otherwise inquiry 405 may be performed,” See Paragraph 121).
The Sloan reference as modified by Kabra et al. teaches all the limitations of claim 1.  With respect to claim 5, Kabra et al. teaches the computer implemented method of claim 1, wherein a n-gram includes any one off: 
consecutive letters, consecutive symbols, and combination of consecutive letters and symbols (“According to one embodiment, the bigram is a sequence of two or more adjacent elements or characters of the attribute value. The present method may be applied using N-grams as described herein with the 2-grams, wherein the N-gram is a sequence of N adjacent characters of an attribute value,” See Paragraph 56).

The Sloan reference as modified by Kabra et al. teaches all the limitations of claim 1.  With respect to claim 10, Sloan. teaches the computer implemented method of claim 1, further comprising: 

performing a dictionary search process to find an exact matching key of the input name from the seed table (“it is important to note that each term used herein refers to that which the Ordinary Artisan would understand such term to mean based on the contextual use of such term herein. To the extent that the meaning of a term used herein--as understood by the Ordinary Artisan based on the contextual use of such term--differs in any way from any particular dictionary definition of such term, it is intended that the meaning of the term as understood by the Ordinary Artisan should prevail,” See Column 4 Lines 18-26); and

performing a similarity search process between input name and keys in the seed table to identify the matching key of the input name form the seed table “rules from the artifact seeded by the signature to obtain a list of terminology to code mappings,” See Column 1 Lines 58-67.



The Sloan reference as modified by Kabra et al. teaches all the limitations of claim 1.  With respect to claim 13, Levoit and Woodrow et al. teaches the computer implemented method of claim 1, further comprising: 

generating at least one fourth fingerprint based on any combination of the first fingerprint, the second fingerprint, and the third fingerprint (“generate the Exemplar code signature (i.e., fingerprint, “ See Column 9 Lines 5-10); and

identifying a matching key by matching any combination of the first fingerprint, the second fingerprint, the third fingerprint, and the fourth fingerprint against keys in a seed table (“matching the instance code signature to one or more exemplar code signatures,” See Column 2 Lines 5-9); and

cataloging the metadata with the matching key as the tag (“indexing of the article's metadata and then its content using all-encompassing clinical taxonomies, for example SNOMED, MeSH, or ICD-10,” See Column 5 Lines 1-5).

With respect to claim 14, it is rejected on grounds corresponding to above rejected claim 1, because claim 14 is substantially equivalent to claim 1.

With respect to claim 15, Sloan teaches a computer implemented method for cataloging database metadata using a signature matching process, comprising:

a processing circuitry (“a matching processor 350,” See Column 4 Lines 49-55); and

a memory (“memory,” See Column 4 Lines 27-32), the memory containing instructions that, when executed by the processing circuitry, configure the system to:

receive an input name to be matched to a key in a seed table (“an initial code signature,” See Column 1 Lines 58-67 and “rules from the artifact seeded by the signature to obtain a list of terminology to code mappings,” See Column 1 Lines 58-67); 

generate a first fingerprint by decomposing the received input name into a first set n-grams (“generate the Exemplar code signature (i.e., fingerprint, “ See Column 9 Lines 5-10);

identify a matching key by matching any combination of the first fingerprint, the second fingerprint, and the third fingerprint against keys in the seed table (“matching the instance code signature to one or more exemplar code signatures,” See Column 2 Lines 5-9); and 
catalog the metadata with the matching key as a tag (“indexing of the article's metadata and then its content using all-encompassing clinical taxonomies, for example SNOMED, MeSH, or ICD-10,” See Column 5 Lines 1-5).  Sloan teaches generating, based on the received input name, a second fingerprint using a predetermined pronunciation schema, wherein the second fingerprint is a phonetic fingerprint (“Converting the data to phonetic (like Metaphone) during preprocessing may reduce drastically the number of bigrams generated and thus make the clustering much faster,” See Paragraph 34).
However, Kabra et al. teaches generate, based on the received input name, a second fingerprint using a predetermined pronunciation schema, wherein the second fingerprint is a phonetic fingerprint (“Converting the data to phonetic (like Metaphone) during preprocessing may reduce drastically the number of bigrams generated and thus make the clustering much faster,” See Paragraph 34);

generate a third fingerprint by decomposing the second fingerprint into a second set of n-grams (“According to one embodiment, the bigram is a sequence of two or more adjacent elements or characters of the attribute value. The present method may be applied using N-grams as described herein with the 2-grams, wherein the N-gram is a sequence of N adjacent characters of an attribute value,” See Paragraph 56).
Therefore, it would have been obvious before the effective filing date of invention was made to a person having ordinary skill in the art to modify Sloan (fingerprinting) with Kabra et al. (data standardization).  This would have determined a data standardization score for an attribute of a dataset.  See Kabra et al. Paragraphs 2-5.  In addition, both references teach features that are directed to analogous art and they are directed to the same field of endeavor: fingerprinting.  

With respect to claim 16, it is rejected on grounds corresponding to above rejected claim 2, because claim 16 is substantially equivalent to claim 2.

With respect to claim 17, it is rejected on grounds corresponding to above rejected claim 3, because claim 17 is substantially equivalent to claim 3.

With respect to claim 18, it is rejected on grounds corresponding to above rejected claim 4, because claim 18 is substantially equivalent to claim 4.

With respect to claim 19, it is rejected on grounds corresponding to above rejected claim 5, because claim 19 is substantially equivalent to claim 5.

With respect to claim 24, it is rejected on grounds corresponding to above rejected claim 10, because claim 24, is substantially equivalent to claim 10.

With respect to claim 25, it is rejected on grounds corresponding to above rejected claim 11, because claim 25, is substantially equivalent to claim 11.

With respect to claim 26, it is rejected on grounds corresponding to above rejected claim 12, because claim 26, is substantially equivalent to claim 12.

With respect to claim 27, it is rejected on grounds corresponding to above rejected claim 13, because claim 27, is substantially equivalent to claim 13.



Claim(s) 6-9 and 20-23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sloan (US Patent No. 9953043) and Kabra et al. (US Pub. No. 20180137151) in further view of Newell et al. (US Pub. No. 20090240684).

The Sloan reference as modified by Kabra et al. teaches all the limitations of claim 1.  With respect to claim 6, Sloan reference as modified by Kabra et al. does not disclose truncating the input names to minimize variations of n-grams with different order in the first set of n-grams.
However, Newell et al. teaches the computer implemented method of claim 1, further comprising:

truncating the input names to minimize variations of n-grams with different order in the first set of n-grams (See Paragraph 18 “Noise might also be deliberately introduced by parties wishing to encode additional data in an image (sometimes called stenography) or to circumvent a fingerprint function for the purpose of avoiding an adverse categorization of an image”).
Therefore, it would have been obvious before the effective filing date of invention was made to a person having ordinary skill in the art to modify Sloan (fingerprinting) and Kabra et al. (data standardization) with Newell et al. (content categorization).  This would have improved querying.  See Newell et al. Paragraph 3.  In addition, both references teach features that are directed to analogous art and they are directed to the same field of endeavor: fingerprinting.  The close relation between both of the references highly suggest an expectation of success.


The Sloan reference as modified by Kabra et al. and Newell et al. teaches all the limitations of claim 6.  With respect to claim 7, Kabra et al.  teaches the computer implemented method of claim 6, wherein an order of an n-gram is a number of the any one of:
consecutive symbols, and combination of consecutive letters and symbols (See Paragraph 15 “n-gram production using individual word tokens, or k-mers production using text character chunks of any length”).


The Sloan reference as modified by Kabra et al. and Newell et al. and Newell et al. teaches all the limitations of claim 6.  With respect to claim 8, Newell et al. teaches the computer implemented method of claim 6, wherein truncating the input names is performed using a stenography-based process (See Paragraph 18 “Noise might also be deliberately introduced by parties wishing to encode additional data in an image (sometimes called stenography) or to circumvent a fingerprint function for the purpose of avoiding an adverse categorization of an image”).

The Sloan reference as modified by Kabra et al. and Newell et al. teaches all the limitations of claim 6.  With respect to claim 9, Kabra et al. teaches the computer implemented method of claim 6, further comprising: 
decomposing the first set of n-grams and the second set of n-grams from input names having different ordered n-grams (“Converting the data to phonetic (like Metaphone) during preprocessing may reduce drastically the number of bigrams generated and thus make the clustering much faster,” See Paragraph 34).

With respect to claim 20, it is rejected on grounds corresponding to above rejected claim 6, because claim 20, is substantially equivalent to claim 6.

With respect to claim 21, it is rejected on grounds corresponding to above rejected claim 7, because claim 21, is substantially equivalent to claim 7.

With respect to claim 22, it is rejected on grounds corresponding to above rejected claim 8, because claim 22, is substantially equivalent to claim 8.

With respect to claim 23, it is rejected on grounds corresponding to above rejected claim 9, because claim 23, is substantially equivalent to claim 9.

Claims(s) 11-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sloan (US Patent No. 9953043) and Kabra et al. (US Pub. No. 20180137151) in further view of Lambright (US Pub. No. 20090240684).

The Sloan reference as modified by Kabra et al. teaches all the limitations of claim 10.  With respect to claim 11, Sloan reference as modified by Kabra et al. does not disclose a hash.
However, Lambright teaches the computer implemented method of claim 10, wherein the dictionary search process is any of: 

a string-based match and a hash-based match (“when using minhash techniques, each fingerprint may be represented as an n-byte "shingle." For a 128-bit fingerprint, the shingle may be 16 bytes in size. Once a shingle is generated for a fingerprint, the shingle may then be hashed. The hashed single may be smaller in size then the shingle. For example, the hashed shingle may be k number of bits (e.g., k=4 bits). Then, a minhash algorithm may be used to generate a list of random permutations of the shingle hashes (e.g., eight permutations). The random permutations of the shingle hashes may represent some or all of the possible combinations of shingle hashes for the number of bits. For example, when k=4 bits, the set of random permutations may include up to 64 different permutations,” See Paragraph 32).
Therefore, it would have been obvious before the effective filing date of invention was made to a person having ordinary skill in the art to modify Sloan (fingerprinting) and Kabra et al. (data standardization) with Lambright (locality sensitive hashing).  This would have facilitated fingerprint matching.  See Lambright Paragraph(s) 10-15.  In addition, both references teach features that are directed to analogous art and they are directed to the same field of endeavor: fingerprinting.  The close relation between both of the references highly suggest an expectation of success.

The Sloan reference as modified by Kabra et al. teaches all the limitations of claim 10.  With respect to claim 12, Sloan reference as modified by Kabra et al. does not disclose a minhash.
However, Lambright teaches the compueter implemented method of claim 10, wherein the similarity search process is MinHash LSH Forest search process (“when using minhash techniques, each fingerprint may be represented as an n-byte "shingle." For a 128-bit fingerprint, the shingle may be 16 bytes in size. Once a shingle is generated for a fingerprint, the shingle may then be hashed. The hashed single may be smaller in size then the shingle. For example, the hashed shingle may be k number of bits (e.g., k=4 bits). Then, a minhash algorithm may be used to generate a list of random permutations of the shingle hashes (e.g., eight permutations). The random permutations of the shingle hashes may represent some or all of the possible combinations of shingle hashes for the number of bits. For example, when k=4 bits, the set of random permutations may include up to 64 different permutations,” See Paragraph 32).
Therefore, it would have been obvious before the effective filing date of invention was made to a person having ordinary skill in the art to modify Sloan (fingerprinting) and Kabra et al. (data standardization) with Lambright (locality sensitive hashing).  This would have facilitated fingerprint matching.  See Lambright Paragraph(s) 10-15.  In addition, both references teach features that are directed to analogous art and they are directed to the same field of endeavor: fingerprinting.  The close relation between both of the references highly suggest an expectation of success.

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-23 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS E ALLEN whose telephone number is (571)270-3562.  The examiner can normally be reached on Monday through Thursday 830-630.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain Alam can be reached on (571) 272-3978.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/N.E.A/Examiner, Art Unit 2154                                                                                                                                                                                                        
/HOSAIN T ALAM/Supervisory Patent Examiner, Art Unit 2154