DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-5, 8-9, 11-15, 18-19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Elkington et al. (US 2016/0357790 A1), hereinafter “Elkington”.

As per claim 1, Elkington teaches a method for updating a local database comprising:
“training a selector machine learning model to accept as input a list comprising a plurality of labels, wherein each label is associated with a set of features scores” at [0041], [0079]-[0113], and Figs. 2-3;
(Elkington teaches the steps of training the machine learning (ML) model using training data)
“output for each respective label of the plurality of labels an identifier of one score of the set of features scores associated with the respective label” at [0118]-[0131], [0143]-[0149] and Figs. 2-3;
(Elkington teaches the ML model accept as input a plurality of data records comprising a plurality fields and outputting a plurality of resolved records comprising a plurality of fields and associated confidence scores. These resolved records form a recommended merging solution)
“receiving a plurality of database records from a plurality of remote databases” at [0042], [0150] and Figs. 4-5;
(Elkington teaches once the ML model is trained, input is received including a plurality of records. Elkington shows at Fig. 4 an example of a set of records 401A, 401B, and 401C from a plurality of remote sources (i.e., “referral”, “Trade Show”, “Web Form”))
“determining that a single local database record should be created in the local database based on the plurality of database records, each database record comprising data items associated with each of the plurality of labels” at [0150]-[0156] and Figs. 4-6;
(Elkington teaches the steps of determining that the set of records are duplicated records that represent the same entities and should be merged/resolved to generate a single resolved record. Each record comprising data items (e.g., “Johnson”, “John”, “123-456-7890”, “Vice President”) associated with each of the label (e.g., “L.Name”, “F.Name”, “Phone”, “Title”))
“generating an aggregated list comprising the plurality of labels by: for each respective label of the plurality of labels computing a respective set of feature scores based on data items associated with the respective label of the plurality of database records” at [0150]-[0156] and Figs. 4-6;
(Elkington teaches generating a set of features vectors 501A, 501B, 501C from the records 401A, 401B, 401C, each feature vector comprises a plurality of labels (e.g., “Last Name”, “First Name”, “Phone”, “Title”), data associated with the labels (e.g., “Johnson”, “John”, 123-456-7890”, “Vice President”) and respective feature scores (e.g., Completeness score, source quality score, voting score)) 
“inputting the aggregated list into the selector machine learning model to generate for each respective label of the plurality of labels an identifier of one of the plurality of database records” at [0150]-[0156] and Figs. 4-6;
(Elkington teaches the feature vectors are inputted into the machine learning model to generate resolved records 603A, 603B, 603C, each of the resolved records includes a plurality of labels and associated confidence score)
“generating a new record for the local database by selecting, for each respective data item of the new record, a data item from a database record of the plurality of database records that is identified by the generated identifier and corresponding to a label of the plurality of the labels that is associated with the respective data item; storing the new record in the local database” at [0143], [0150]-[0156] and Figs. 4-6;
(Elkington teaches the ML model 112 generate resolve records with confidence scores. These resolved records form a recommended merging solution. A user can select one of a plurality of these generated as a merged record or the system itself can make the selection. The merge record is a new record including data items selected from the plurality of inputted data records)


	As per claim 2, Elkington teach the method of claim 1, wherein “each respective set of feature scores is generated based on comparing respective database record to predetermined rules associated with constructing database records for the local database” at [0045]-[0078], [0150]-[0156] and Figs. 4-6.

	As per claim 3, Elkington teaches the method of claim 1, wherein “each respective set of feature scores is generated based on comparing the plurality of records to each other” at [0045]-[0078], [0150]-[0156] and Figs. 4-6.

	As per claim 4, Elkington teaches the method of claim 1, wherein “the data items include metadata for a media asset, the metadata comprising a plurality of metadata items, each label is associated with respective metadata items of the plurality of metadata items” at [0031], [0048]-[0078].

	As per claim 5, Elkington teaches the method of claim 4, wherein “at least one label corresponds to a movie or show title of the media asset, an episode title of the media asset, a description of the media asset, a genre of the media asset, a duration of the media asset, or a release date of the media asset” at [0031], [0048]-[0078].

	As per claim 8, Elkington teaches the method of claim 1, wherein “the new record stored in the local database includes a first data item associated with a first label and a second data item associated with a second label, the first data item being selected from a different database record of the plurality received database records than the second data item” at [0045]-[0078], [0150]-[0156] and Figs. 4-6.

As per claim 9, Elkington teaches the method of claim 1, wherein “the selector machine learning model is trained using a plurality of training sets, each training set including a list comprising a plurality of labels, wherein each respective label is associated with a respective set of feature scores, each training set associated with a previously confirmed decision in which a feature score for each respective label is selected” at [0079]-[0117].
Claims 11-15, 18-19 recite similar limitations as in claims 1-5, 8-9 and are therefore rejected by the same reasons.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 6-7, 10, 16-17, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Elkington as applied to claims 1-5, 8-9, 11-15, 18-19 above, and in view of Jagota et al. (US 2021/0342353 A1), hereinafter “Jagota”. 

As per claim 6, Elkington teaches the method of claim 1 discussed above. Elkington does not teach “wherein the determining that the single local database record should be created in the local database based on the plurality of database records comprises: training a match machine learning model to accept as input a list of features scores calculated based on a first database record and a second database record and output a probability of the first database record matching the second database record; generating a plurality of lists of features scores by: for a first received database record of the plurality of database records received from the plurality of remote databases, generating respective lists of features scores based on the first received database record and each respective other received database record of the plurality of database records; calculating respective probabilities that the first received database record matches each respective other database record of the plurality of database records by inputting in the match machine learning model the respective lists of features scores; and determining that the single local database record should be created in the local database based on the calculated respective probabilities” as claimed. However, Jagota teaches a method for adaptive field-level matching of database records using a match machine learning model including the steps of “training a match machine learning model to accept as input a list of features scores calculated based on a first database record and a second database record and output a probability of the first database record matching the second database record; generating a plurality of lists of features scores by: for a first received database record of the plurality of database records received from the plurality of remote databases, generating respective lists of features scores based on the first received database record and each respective other received database record of the plurality of database records; calculating respective probabilities that the first received database record matches each respective other database record of the plurality of database records by inputting in the match machine learning model the respective lists of features scores; and determining that the single local database record should be created in the local database based on the calculated respective probabilities” at [0059]-[0065] and Fig. 1. 
Thus, it would have been obvious to one of ordinary skill in the art to combine Jagota with Elkington’s teaching because “a machine learning approach provides better incremental enhancements than human-designed algorithms provide an dis therefore more accurate than human-designed algorithm. Furthermore, the structure of the machine-learning classifier’s software is simpler because record field-specific features customizations are isolated into the feature extractions of a quasi-generic machine-learning classifier, which is unlike record field-level matching algorithm that use different logic for matching each different type of record field. The machine-learning classifier's engineering manual can provide an easily understood list of record field-specific features, which enables the convenient testing of record field-specific matchers, which is typically unavailable for record field-level matching algorithms that use different logic for matching each different type of record field. A machine-learning classifier can learn the features required for calculating field match scores from a relatively small training set of data scenarios. A machine-learning classifier can also shorten the cycle from identifying quality issues (such as specific false positive or false negative scenarios) to correcting the underlying causes. Since a machine-learning classifier for individual record fields can train on training data to determine whether or not two record fields' values match, a system administrator is no longer needed to configure different algorithms, such as edit distance, which are typically required for matching record field values”, as suggested by Jagota at [0020].

As per claim 7, Elkington teaches the method of claim 1 discussed above. Elkington does not teach “training a match machine learning model to accept as input a list of features scores calculated based on a first database record and a second database record and output a probability of the first database record matching the second database record; identifying, in the local database, a plurality of candidate database records that potentially match a first received database record of the plurality of database records; calculating a plurality of match probabilities by: for each respective candidate database record, calculating, using the match machine learning model, a respective probability of the respective candidate database record matching the first received database record; and determining, based on a largest of the plurality of match probabilities that the new record for the local database should be created” as claimed. However, Jagota teaches a method for adaptive field-level matching of database records using a match machine learning model including the steps of “training a match machine learning model to accept as input a list of features scores calculated based on a first database record and a second database record and output a probability of the first database record matching the second database record; identifying, in the local database, a plurality of candidate database records that potentially match a first received database record of the plurality of database records; calculating a plurality of match probabilities by: for each respective candidate database record, calculating, using the match machine learning model, a respective probability of the respective candidate database record matching the first received database record; and determining, based on a largest of the plurality of match probabilities that the new record for the local database should be created” at [0059]-[0065] and Fig. 1. 
Thus, it would have been obvious to one of ordinary skill in the art to combine Jagota with Elkington’s teaching because “a machine learning approach provides better incremental enhancements than human-designed algorithms provide and is therefore more accurate than human-designed algorithm. Furthermore, the structure of the machine-learning classifier’s software is simpler because record field-specific features customizations are isolated into the feature extractions of a quasi-generic machine-learning classifier, which is unlike record field-level matching algorithm that use different logic for matching each different type of record field. The machine-learning classifier's engineering manual can provide an easily understood list of record field-specific features, which enables the convenient testing of record field-specific matchers, which is typically unavailable for record field-level matching algorithms that use different logic for matching each different type of record field. A machine-learning classifier can learn the features required for calculating field match scores from a relatively small training set of data scenarios. A machine-learning classifier can also shorten the cycle from identifying quality issues (such as specific false positive or false negative scenarios) to correcting the underlying causes. Since a machine-learning classifier for individual record fields can train on training data to determine whether or not two record fields' values match, a system administrator is no longer needed to configure different algorithms, such as edit distance, which are typically required for matching record field values”, as suggested by Jagota at [0020].

Claims 10, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Elkington as applied to claims 1-5, 8-9, 11-15, 18-19 above, and in view of Sim et al. (US 11,216,701 B1), hereinafter “Sim”. 

As per claims 10, 20, Elkington teaches the method and system of claims 1, 11 discussed above. Elkington does not teach “training a word embedding machine learning model to generate respective semantic word vectors representing each word in a corpus of words; generating, by the trained word embedding machine learning model, respective semantic word vectors for each word in respective sets of words from the received plurality of database records; calculating, for each respective set of words, a first semantic word vector based on the respective semantic word vectors associated with the respective set of words; calculating respective similarity scores for each received database record, as between the first semantic word vectors, wherein the respective similarity scores correspond to a respective feature score in the respective set of features scores” as claimed. However, Sim teaches a method for generating record embeddings from structured records using a word embedding machine learning model including the steps of “training a word embedding machine learning model to generate respective semantic word vectors representing each word in a corpus of words; generating, by the trained word embedding machine learning model, respective semantic word vectors for each word in respective sets of words from the received plurality of database records; calculating, for each respective set of words, a first semantic word vector based on the respective semantic word vectors associated with the respective set of words; calculating respective similarity scores for each received database record, as between the first semantic word vectors, wherein the respective similarity scores correspond to a respective feature score in the respective set of features scores” at Col. 2 line 15 to Col. 6 line 65. Thus, it would have been obvious to one of ordinary skill in the art to combine Sim with Elkington’s teaching in order to improve the record matching process because “compared to the record attribute values from which the record embedding is derived, the vector is more suitable for use in matching, regression, and classification tasks, amongst other tasks. For example, two record embeddings may be mathematically compared to measure the likeness of the records they represented”, as suggested by Sim at Col. 2 lines 25-30.





Response to Arguments
Applicant's arguments filed 6/22/2022 have been fully considered but they are not persuasive. The examiner respectfully traverses Applicant’s arguments.
Regarding independent claims 1 and 11, Applicant argued that Elkington fails to teach or suggest “for each respective label of the plurality of labels, computing a respective set of feature scores based on data items associated with the respective labels of the plurality of database records; inputting the aggregated list into the selector machine learning model to generate for each respective label of the plurality of labels an identifier of one of the plurality of database records”. On the contrary, Elkington teaches at [0150]-[0156] and Figs. 5-6 the steps of generating a set of features vectors 501A, 501B, 501C from the records 401A, 401B, 401C, each feature vector  comprises  a set of features scores for each of the plurality labels. For example, as seen at Elkington’s Figs. 5, the field/label “F.Name” includes a set of feature/voting scores {1, 1, 1}, the field/ label “Phone” includes a set of feature/voting scores {2/3, 2/3, 1/3}, and the field/label “Title” includes a set of feature/voting scores {2/3, 2/3, 1/3}, wherein the feature/voting scores are based on data item (e.g., “John”, “123-456-7890”, “123-652-9959”, “Vice president”, “Sale Manager”) associated with the respective field/label (e.g., “F.Name”, “Phone”, “Title”) of the plurality of database records 401A, 401C, 401B. Elkington’s set of feature/voting scores is similar to Applicant’s set of feature scores as shown at Applicant’s Fig. 2B. For example, the set of features scores for the label “Show Title” of the database records {Amazon, Hulu, Netfix} is {0.8, 0.9, 1.0}, while Elkington’s Fig. 5 shows the set of features scores for the label “Phone” of the database records {401A, 401B, 401C} is {2/3, 2/3, 1/3}

    PNG
    media_image1.png
    434
    905
    media_image1.png
    Greyscale

 
Applicant further argued that “Elkington contain scores for multiple data fields corresponding to respective input records 401A, 401B, 401C, which is not the same as computing a set of feature score for each respective label, e.g., grouping together feature scores from across database records for a particular labels, let alone inputting sets of such features scores for each respective label into a machine learning model”. On the contrary, as discussed above, Elkington teaches each field/label including a set of feature/voting scores. Elkington also teaches at [0155] and Fig. 6 that a particular of machine learning models MLP 601 provided for each field of the records, which receives feature/voting scores for a particular field/label such as F.Name, Phone, Title. Elkington also teaches at [0127]-[0129] each of the MLP predicts the ranking for each of the fields/lables, wherein the output predicted ranking for the fields are used  to generated the new records, as discussed further below.

    PNG
    media_image2.png
    455
    923
    media_image2.png
    Greyscale


Applicant further argued that “rather than generating a new record based on selecting generated identifiers for each respective label of the plurality of labels, as required by applicant’s claims, Elkington merely evaluate candidate records 603A-603C as a whole, i.e., by outputting records 603A-603C and associated confidence scores, where each candidate record includes a combination of various data field from the input records, and selecting the candidate record with the highest score or receiving manual selection of the candidate records”. On the contrary, Elkington teaches at [0155]-[0156] the composite classifier 602 generates resolve record 603A, wherein the resolve record 603A uses the phone number and title from records 401A and 401B. Elkington therefore teaches a new resolve record 603A is generated based on selecting data items from generated identifiers (e.g., 401A, 401B) for each respective label (e.g., “Phone”, “Title”), because the record 401A and 401B include highest feature/voting scores (e.g., “2/3”) for the labels “Phone” and “Title”. This step is similar to step 254 of Applicant’s Fig. 2B, which generates a new record using the show title from Netflix because Netflix database record has highest score of 1.0 and Episode title from Amazon database record, which has highest score of 1.0 for the label “Episode Title”.

	In light of the foregoing arguments, the 35 U.S.C 102 rejection is hereby sustained.




Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KHANH B PHAM whose telephone number is (571)272-4116. The examiner can normally be reached Monday - Friday, 8am to 4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mark Featherstone can be reached on (571)270-3750. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KHANH B PHAM/Primary Examiner, Art Unit 2166                                                                                                                                                                                                        
July 21, 2022