Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on November 17, 2021, has been entered.

Remarks
	The Office Action is in response to applicant’s amendment field on November 17, 2021, under which claims 1-20 are pending and under consideration.

Response to Arguments
	Applicant’s amendments have overcome the previous § 112 rejections. Therefore, the previous § 112 rejections have been withdrawn.
	Applicant’s amendments have overcome the previous § 103 rejections. However, upon further consideration, new grounds of rejection has been set forth below. Applicant’s arguments are moot under the new grounds of rejection.
Applicant argues that the previously cited references fail to teach the newly recited limitations of “at least one of the set of machine mining models being built as a domain fingerprint using tokenized data values classified as a certain tag, the domain fingerprint allowing to estimate a likelihood that an unknown token value belongs to the certain tag…each tag candidate being determined using at least one of a standard classifier, a previously classified token, and the domain fingerprint” recited in claim 1. This argument is moot because new reference Nelke has been applied to account for the domain fingerprint feature, as set forth in the rejections below. Nelke pertains to computing fingerprints that characterize the class of data contained in the corresponding data set (see Nelke, paragraph 27). 
The Examiner also notes that the phrase “each tag candidate being determined using at least one of a standard classifier, a previously classified token, and the domain fingerprint” is an alternate expression due to the term “at least one of.” Therefore, this phrase does not require “the domain fingerprint” to be actually used in the “determining” operation. 
 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

1.	Claims 1-2, 8-10, 11-12, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Christen et al., “Automated Probabilistic Address Standardisation and Verification,” Proceedings of the 4th Australasian Data Mining Conference – AusDM05, December 5-6, 2005, Sydney, Australia, (“Christen”) in view of Wroczynski et al. (US 2014/0136188 A1) (“Wroczynski”), Kim, “Greedy ensemble learning of structured predictors for sequence tagging,” Neurocomputing 150 (2015) 449–457 (“Kim”), and Nelke et al. (US 2012/0066214 A1) (“Nelke”).
As to claim 1, Christen teaches a computer-implemented method for generating data standardization rules, the method comprising:
receiving a training data set containing tokenized data values and associated tags, [§ 4, paragraphs 1-2: “segmented records in…an address database.” The database described in Christen (§ 4) is the “G-NAF” (Geocoded National Address File). See also § 4.2 (page 62): “The required input data for the training are…the G-NAF database containing cleaned and segmented address records.” As shown in Table 2, the G-NAF database includes various fields (e.g., “locality_name” and “street_name”, which are labels for the address data).” The data values in the database are tokenized and tagged, since “segmented” refers to tokenized data values having associated tags, as described in § 3.3 (“segmentation”): “having a list of elements (words, numbers and separators) and one or more corresponding tag lists, the task is to assign these elements to the appropriate output fields.” See also § 4, paragraph 2: “We extracted 26 address attributes (or output fields) as listed in Table 2.”] 
storing the tokenized data values and associated tags in a lookup dictionary; [§ 4.1: “The look-up tables are generated by extracting all the discrete (string) values for locality name, street name and building name into tables and then combining those tables with manually generated tables containing typographical variations (like common misspellings of suburb names).” Since the look-up tables are generated for use later on, the tables are understood to be stored in memory.]
based on the training data set, building a set of machine mining models using a  learning algorithms for identifying tags and tag patterns; [§ 4, paragraph 1: “automated HMM training”; § 4.2 (page 62): “The required input data for the training are…the G-NAF database containing cleaned and segmented address records.” Note that “HMM” refers to “Hidden Markov model,” which is a learning algorithm as described in § 3.3, paragraph 2-3. The limitation of “set of machine mining models” is met by the combination of the HMM and the search algorithm used to search the look-up tables (see page 58, middle paragraph: “the look-up tables are searched using a greedy matching algorithm”), which is based on the training data set as described above. It is noted that a “machine mining model” does not necessarily require a machine learning model, which is discussed separately in paragraph 18 of the specification.]
receiving a data set comprising one or more data values, each of the one or more data values including an electronic text composed of a linear sequence of symbols; [§ 3, paragraph 2 (page 57): “the raw input address records are stored as text files or database tables, and are made of one or more text strings.” For an example, an input address may be input address “42 meyer Rd COOMA 2371” as illustrated on the bottom of page 59. The raw input address records are electronic text, because they are “stored as text files or database tables, and are made of one or more text strings.” They are also composed of a linear sequence of symbols, since each character in the address is a symbol.] 
tokenizing each data value in the data set to obtain a set of tokens; [§ 3.2, paragraph 1: “After an address string has been cleaned, it is split at white-space boundaries into a list of words, numbers, punctuation marks and other possible characters.” In the example on the bottom of page 59, the tokens are “[`42', `meyer', `road', `cooma', `2371']”.] 
determining for each token in the set of tokens one or more tag candidates using the lookup dictionary and at least part of the set of machine mining models, [§ 3.2, paragraph 1: “Each of the list elements is assigned one or more tags. These tags are based on look-up tables generated using the values in the national address database, as well as more general features.” § 5, item 1: “During the tagging step of standardisation, each element in the address is assigned one or more tags depending if it can be found in one or more look-up tables. Once all tables have been checked, the element will also be given a feature tag.” In the example on the bottom of page 59, the tags are “[`N2', `SN/L5', `ST/L4', `LN/SN/L5', `PC/N4' ]” where “LN” and “SN”, for example, are tags for locality name and street name (§ 3.3, last paragraph). The tags generated in these stage are “candidate tags” because a specific combination of tags will be later selected from the candidate combinations. As described in § 3.3, last paragraph (page 60), the candidate tags in the above example comprise “24 tag sequences,” i.e., 24 combinations of candidate tags. With respect to the limitation of “using at least part of the set of machine mining models,” as noted above, Christen teaches that the determination of the initial tags is uses a rule-based algorithm (a “greedy search algorithm” as described in § 3.2, paragraph 3) that involves comparison of the tokens with data in the look-up table and a features table. Therefore, Christen meets this limitation.] each tag candidate being determined using at least one of a standard classifier, a previously classified token, and the domain fingerprint; [The limitation of “using...a previously classified token” is taught because Christen uses a look-up table that was constructed from previously classified tokens. See, § 4.1: “discrete (string) values for locality name, street name and building name” (that is, the string values were classified under locality name, street name, or building name fields); see also § 5, item 1. The term “previously classified” does not require a particular methodology for the previous classification. The alternative of “a standard classifier” is also taught because the assignment of tags (§ 3.2, paragraph 1, quoted above) constitutes classification. Since the term “standard” does not require any specific manner of being standard, it is met by Christen, whose look-up dictionary is standard at least in the manner of following a national address database its data fields. It is noted that the phrase “at least one of” denotes an alternate expression that is met if any of the items in the list of alternatives are met.] 
applying at least part of the set of machine mining models to determine confidence values for each unique combination of tag candidates associated with each token in the set of tokens [§ 3.3, paragraph 4 (page 59): “Once a HMM is trained, sequences of tags (one tag per input element) as generated in the tagging step can be given as input to the Viterbi algorithm, which returns the most likely path (i.e. state sequence) of the given tag sequence through the HMM, plus the corresponding probability.” Note that the “probability” in this context corresponds to a confidence value, and multiple sequences of tags are evaluated as to their probability to find the one with the highest probability, in accordance with the use of the Viterbi algorithm in conjunction with the hidden Markov model (HMM). A higher probability is a higher confidence of that sequence being correct. See also § 3.3, paragraph 3: “Given an observation sequence, one is often interested in the most likely path through a given HMM that generated this sequence. This path can be effectively be calculated for a given observation sequence using the Viterbi algorithm.”] for indicating an accuracy of the association between each unique combination of tag candidates and each token in the set of tokens; [The “probability” (confidence value) corresponding to the tag sequence discussed above satisfies the instant limitation of “for indicating an accuracy of association…” since a higher probability indicates that the association is more likely to be correct (accurate), and a lower probability indicates that the association is less likely to be correct (accurate). The present claim language does not require a particular way of measuring accuracy that distinguishes over the probabilities in Christen, nor does it does not require the confidence value to “indicate” said accuracy in any particular manner.]
determining, for each token, unique combinations of tag candidates having a highest confidence value, [§ 3.3, paragraph 5 (page 59): “The path with the highest probability is then taken and the corresponding state sequence will be used to assign the elements of the input list to the appropriate output fields.” In the example above, “the tag sequence with the highest probability that is returned is [`N2', `SN', `ST', `LN', `PC']” (§ 3.3, last paragraph). With respect to the limitation of plural “unique combinations,”] wherein the determined unique combinations of tag candidates are used as a standardization rule. [In general, “The aim of the standardisation process is to assign each element of a raw user input address to one of these 26 output fields.” § 3.3, last paragraph teaches that “the values of the input address will be assigned to the output fields” that were determined as having the highest probability. Examples of transformation processes include “transform the raw input address records into a well defined and consistent form…” or into “a well defined format…” (§ 2, paragraph 1). In the example shown in § 3.3, the tag candidates N2 and SN are used as a standardization rule for tokens “42” and “meyer,” and the tag candidates ST and LN are used as a standardization rule for tokens “road” and “cooma.”] 
Christen does not specifically teach: 
(1) 	“wherein each token in the tokenized data values is classified using a standard classifier for one or more domains and a lookup dictionary of previously processed tokens” [Christen teaches the use of the lookup dictionary to process and label input data, as noted above. However, Christen does not specifically teach the use of the lookup dictionary for building the lookup dictionary.]; 
(2)	building the set of models “using different learning algorithms for identifying tags and tag patterns”, and the confidence value being an “aggregated” confidence value that is obtained by “computing an aggregated confidence value for each unique combination of tag candidates using the determined confidence values”; and
(3)	“at least one of the set of machine mining models being built as a domain fingerprint using tokenized data values classified as a certain tag, the domain fingerprint allowing to estimate a likelihood that an unknown token value belongs to the certain tag.”
Wroczynski, in an analogous art, teaches item (1) listed above. Wroczynski generally relates natural language processing ([0001]) using grammatical parsers such as a “tagger” ([0003]) and “machine learning techniques” ([0004]) to process input data, such as sequences of one or more words (see Wroczynski claim 5). Since Wroczynski pertains to the tagging of tokenized data using machine learning models, Wroczynski is in the same field of endeavor as the claimed invention and is also reasonably pertinent to the problem of sequence labeling. 
In particular, Wroczynski teaches “wherein each token in the tokenized data values is classified using a standard classifier for one or more domains and a lookup dictionary of previously processed tokens” [In general [0065] teaches: “the input to the word tagger element 203A is text that is tokenized into sentences and words by preceding elements tokenize 203A1 and sentence divider 203A2. A dictionary 502 is accessed to find words with possible tags. All possible tags are assigned to each word (504)…The word tagger output consists of text tokenized into sentences and words, each of which has assigned exactly one tag.” Thus, the use of a dictionary and the assignment of tags based on the dictionary constitutes the use of a “lookup dictionary” and a “classifier,” since the assignment of tags constitutes performing classification. With respect to the limitation of “for one or more domains,” [0049] teaches “natural language processing (NLP) system 200 [that] accepts text as input,” and teaches that “Embodiments of the NLP system can thus work properly on different kinds of domains at the same time” ([0047]). Therefore the assignment of tags is for some domain (subject matter) in natural language processing. With respect to “previously processed tokens,” [0075] teaches that the parser in general is trained on a known corpus, and that “known tokens” were previously processed during training ([0058]: “The term ‘known token’ means that token appeared in the training corpus at least once.”). Furthermore, the dictionary can be expanded by further processing of tokens: “The indicated word is searched for in the dictionary 502 (606). If the word is not in the dictionary, it is added to the dictionary at 608. Then it is determines (612) whether possible tags for the newly word have been previously defined. If no tags were previously defined, a new set of rules is created for a new set of tags is created at 614.” It is also noted that the disclosed process is “automated” ([0077]) and occurs when there is an “exception” ([0074]) when the previous use of the dictionary did not result the tagging of the word.] 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Christen and Wroczynski by modifying the method of Christen to include the feature that “each token in the tokenized data values is classified using a standard classifier for one or more domains and a lookup dictionary of previously processed tokens.” The motivation for doing so would have been to improve the dictionary used by a word tagger, as suggested by Wroczynski, [0075] (teaching “improving the word tagger” by expanding the dictionary, as described in the parts quoted above).
Kim, in an analogous art, teaches “using different learning algorithms for identifying tags and tag patterns”, and the confidence value being an “aggregated” confidence value that is obtained by “computing an aggregated confidence value for each unique combination of tag candidates using the determined confidence values.” Kim generally relates to “the sequence tagging problem aimed for accurate prediction of the multiple output class labels that are correlated with one another in a complex manner” (§ 1, paragraph 1), using “several machine learning approaches” (§ 1, last paragraph). Since Kim pertains to the tagging of data sequences using machine learning models, Kim is in the same field of endeavor as the claimed invention and is also reasonably pertinent to the problem of sequence labeling.
In particular, Kim teaches a set of models built “using different learning algorithms for identifying tags and tag patterns” [§ 3, paragraph 2: “ensemble predictor…comprised [of] M base predators…derived from M CRF models.” That is, each CRF (conditional random field) predictor is trained using “training data” (§ 3, paragraph 4), and is a different learning algorithm for identifying tags and tag patterns.] and an “aggregated confidence value” that is obtained by “computing an aggregated confidence value for each unique combination of tag candidates by calculating an average of the determined confidence values” [§ 3.1, which teaches the aggregate confidence value                         
                            F
                            
                                
                                    Y
                                
                                
                                    X
                                
                            
                            =
                             
                            
                                
                                    ∑
                                    
                                        m
                                        =
                                        1
                                    
                                    
                                        M
                                    
                                
                                
                                    
                                        
                                            α
                                        
                                        
                                            m
                                        
                                    
                                    P
                                    (
                                    Y
                                    |
                                    X
                                    ;
                                    
                                        
                                            θ
                                        
                                        
                                            m
                                        
                                    
                                    )
                                
                            
                        
                    , where Y is the output label sequence, X is the input sequence, m corresponds to each member of the ensemble, and                         
                            P
                            (
                            Y
                            |
                            X
                            ;
                            θ
                            )
                        
                    , for each predictor m, is the conditional class probability, as defined in § 3.1. See also § 3, paragraph 2. With respect to the limitation of “by calculating an average,” § 3.1, paragraph 1, portion at top of page 452, teaches that                         
                            
                                
                                    ∑
                                    
                                        m
                                        =
                                        1
                                    
                                    
                                        M
                                    
                                
                                
                                    
                                        
                                            α
                                        
                                        
                                            m
                                        
                                    
                                
                            
                            =
                            1
                        
                    . Therefore,                         
                            
                                
                                    ∑
                                    
                                        m
                                        =
                                        1
                                    
                                    
                                        M
                                    
                                
                                
                                    
                                        
                                            α
                                        
                                        
                                            m
                                        
                                    
                                    P
                                    (
                                    Y
                                    |
                                    X
                                    ;
                                    
                                        
                                            θ
                                        
                                        
                                            m
                                        
                                    
                                    )
                                
                            
                        
                     is a weighted average of the                         
                            P
                            (
                            Y
                            |
                            X
                            ;
                            
                                
                                    θ
                                
                                
                                    m
                                
                            
                            )
                        
                     values. Note that                         
                            
                                
                                    ∑
                                    
                                        m
                                        =
                                        1
                                    
                                    
                                        M
                                    
                                
                                
                                    
                                        
                                            α
                                        
                                        
                                            m
                                        
                                    
                                
                            
                            =
                            1
                        
                     indicates that the weights                         
                            
                                
                                    α
                                
                                
                                    m
                                
                            
                        
                     are normalized to sum to 1. Therefore, the foregoing expression in Kim is a normalized weighted average (i.e., a type of “average” having the mathematical form                         
                            x
                            =
                            
                                
                                    ∑
                                    
                                        i
                                        =
                                        1
                                    
                                    
                                        n
                                    
                                
                                
                                    
                                        
                                            w
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                     where                         
                            
                                
                                    ∑
                                    
                                        i
                                        =
                                        1
                                    
                                    
                                        n
                                    
                                
                                
                                    
                                        
                                            w
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            =
                            1
                        
                    ), which reads on the instant claim limitation of “an average.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Christen and Wroczynski with the teachings of Kim by further modifying the method of Christen (as modified by Wroczynski) so as to use “different learning algorithms for identifying tags and tag patterns” to build the set of models, and to modify the confidence value to be an “aggregated confidence value” that is obtained by “computing an aggregated confidence value for each unique combination of tag candidates by calculating an average of the determined confidence values.” The motivation for doing so would have been to use “an ensemble of predictor models to boost the overall prediction accuracy” (Kim, abstract).
Nelke, in an analogous art, teaches “at least one of the set of machine mining models being built as a domain fingerprint using tokenized data values classified as a certain tag, the domain fingerprint allowing to estimate a likelihood that an unknown token value belongs to the certain tag.”  Nelke generally pertains to data classification (see [0024]: “Embodiments of the present invention…facilitate the classification of data in any domain”) and the processing of postal addresses (see [0028]: “a data set containing data with a very high number of formats, for example an address field”, [0044], [0046]). Therefore, Nelke is in the same field of endeavor as the claimed invention, namely data classification for addresses. 
In particular, Nelke teaches at least one of the set of machine mining models being built as a domain fingerprint [[0027]: “compute automatically for each data set characteristics so called ‘fingerprints' characterizing the class of data contained in the corresponding data set, e.g. a column. Such a fingerprint is made up of several metrics capturing different aspects of the data.” The fingerprints are used for domain classification, where a domain can be an address. See [0007]: “classical domains, such as US addresses, person names, etc. specialized algorithms are delivered out of the box”; [0024]: “facilitate the classification of data in any domain without requiring the use of specialized algorithms”] using tokenized data values classified as a certain tag, [[0033]: “the user would have first to look at a few data values of the data set and set the data class for this data set manually.” Note that the “data values of the data set” ([0027]) correspond or are analogous to the “tokenized data values” of the instant claim. See FIG. 3, which shows LASTNAME, ADDRESS, POSTAL CODE, and PHONE labels for respective values, and the respective fingerprints, as described in [0046].] the domain fingerprint allowing to estimate a likelihood that an unknown token value belongs to the certain tag [0029]: “When comparing two fingerprints of two data sets, a low score indicates no domain matching between the two data sets. So the two data sets contain data that look different. A score close to 100% indicates a domain matching between the two data sets. So the two data sets are likely to contain data of same type, because the data presents similar characteristics.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Christen, Wroczynski, and Kim with the teachings of Nelke by modifying the set of machine mining models such that “at least one of the set of machine mining models being built as a domain fingerprint using tokenized data values classified as a certain tag, the domain fingerprint allowing to estimate a likelihood that an unknown token value belongs to the certain tag.” The motivation would have been to use a model that can characterize the class of data contained in the corresponding data set (Nelke, [0027], parts quoted above) and facilitate the classification of data without requiring the use of specialized algorithms (Nelke, [0024]: “facilitate the classification of data in any domain without requiring the use of specialized algorithms”). 

As to claim 2, the combination of Christen, Wroczynski, Kim, and Nelke teaches the method of claim 1, wherein the unique combinations are provided in association with respective highest aggregated confidence values. [In the context of the instant claim, the phrase “respective highest aggregated confidence value” suggests that there may be multiple confidence values that are, in a collective sense, the “highest” compared to another set of confidence values. This limitation is met by Christen (as modified by Kim), which teaches calculating the probabilities of different sequences, in which case there is a set with the “highest” confidence values. See, e.g., Christen, § 3.3, paragraph 4 (page 59): “Once a HMM is trained, sequences of tags (one tag per input element) as generated in the tagging step can be given as input to the Viterbi algorithm, which returns the most likely path (i.e. state sequence) of the given tag sequence through the HMM, plus the corresponding probability.”]

As to claim 8, the combination of Christen, Wroczynski, Kim, and Nelke teaches the method of claim 1, wherein the training data set is obtained by at least one of: tokens and associated tags obtained by applying an ontology on the set of tokens; and tokens and associated user defined tags. [As noted in the rejection of claim 1, the training data includes tokens (“segmented records in…an address database” (§ 4, paragraphs 1-2); “input data for the training are…the G-NAF database containing cleaned and segmented address records” (§ 4.2). The address database from which the look-up dictionary is built is considered to be an ontology because it contains a relationship between addresses terms and field names.]

As to claim 9, the combination of Christen, Wroczynski, Kim, and Nelke teaches the method of claim 1, wherein the set of machine mining models comprises at least one of: a model for predicting a relative position of the one or more tag candidates of a given data value; a model for predicting an absolute position of the one or more tag candidates in a given data value; a model using as input a token associated to each tag candidate for predicting a confidence of an association token-tag given other associations token-tags for the same data value; and a model for predicting for each token a candidate tag. [The limitation of “a model for predicting for each token a candidate tag” in the list of alternatives is met by the look-up based labeler in Christen (§ 3.2, paragraph 1: “Each of the list elements is assigned one or more tags. These tags are based on look-up tables generated using the values in the national address database, as well as more general features”) or the hidden Markov model in Christen (§ 3.3, paragraph 5 (page 59): “The path with the highest probability is then taken and the corresponding state sequence will be used to assign the elements of the input list to the appropriate output fields.”).]

As to claim 10, the combination of Christen, Wroczynski, Kim, and Nelke teaches the method of claim 1, wherein each model of the set of machine mining models is configured to provide a confidence value for each prediction performed by the model. [Kim, § 3.1, which teaches the aggregate confidence value                         
                            F
                            
                                
                                    Y
                                
                                
                                    X
                                
                            
                            =
                             
                            
                                
                                    ∑
                                    
                                        m
                                        =
                                        1
                                    
                                    
                                        M
                                    
                                
                                
                                    
                                        
                                            α
                                        
                                        
                                            m
                                        
                                    
                                    P
                                    (
                                    Y
                                    |
                                    X
                                    ;
                                    
                                        
                                            θ
                                        
                                        
                                            m
                                        
                                    
                                    )
                                
                            
                        
                    , where Y is the output label sequence, X is the input sequence, m corresponds to each member of the ensemble, and                         
                            P
                            (
                            Y
                            |
                            X
                            ;
                            θ
                            )
                        
                    , for each predictor m, is the conditional class probability, as defined in § 3.1. The conditional class probability is predicted across all models m, weighted by                         
                            
                                
                                    α
                                
                                
                                    m
                                
                            
                        
                    .]

	As to claims 11-12 and 18-20, these claims are directed to a computer system for generating data standardization rules, wherein the system is capable of performing a method comprising the same or substantially the same operations as those recited in claims 1-2 and 8-10, respectively. Therefore, the rejection made to claims 1-2 and 8-10 are applied to claims 11-12 and 18-20, respectively.
	Furthermore, Christen teaches “one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage devices, and program instructions stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories” because it is implied that the method of Christen is performed using a computer. See, e.g., abstract, which describes “In this paper we present an automated probabilistic approach based on a hidden Markov model (HMM), which uses national address guidelines and a comprehensive national address database to
clean, standardise and verify raw input addresses.” It is implicitly disclosed that the computer used in Christen includes the above limitations, which are generic computer components. 

2.	Claims 3, 7, 13, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Christen in view of Wroczynski, Kim, and Nelke, and further in view of Tomanek et al., “Semi-Supervised Active Learning for Sequence Labeling,” Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pages 1039–1047,
Suntec, Singapore, 2-7 August 2009 (“Tomanek”). 
As to claim 3, the combination of Christen, Wroczynski, Kim, and Nelke teaches the method of claim 1, but does not teach the further limitation that “the highest aggregated confidence value of a unique combination is higher than a predefined threshold.” 
Tomanek, in an analogous art, teaches the above limitation. Tomanek generally relates to “machine learning approaches” (see § 1) for “many sequence labeling tasks” (see abstract). Since Tomanek pertains to the tagging of data sequences using machine learning models, Tomanek is in the same field of endeavor as the claimed invention and is also reasonably pertinent to the problem of sequence labeling.
	In particular, Tomanek teaches “the highest aggregated confidence value of a unique combination is higher than a predefined threshold.” [§ 3.2: “If                         
                            
                                
                                    C
                                
                                
                                    λ
                                
                            
                            (
                            
                                
                                    y
                                
                                
                                    j
                                
                                
                                    *
                                
                            
                            )
                        
                     exceeds a certain confidence threshold t,                         
                            
                                
                                    y
                                
                                
                                    j
                                
                                
                                    *
                                
                            
                        
                     is assumed to be the correct label for this token and assigned to it”].
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modified the combination of Christen, Wroczynski, Kim, and Nelke with the teachings of Tomanek such that “the highest aggregated confidence value of a unique combination is higher than a predefined threshold,” in order to utilize a criterion for determining whether labels for tokens is correct.

As to claim 7, the combination of Christen, Wroczynski, Kim, and Nelke teaches the method of claim 1, further comprising: updating the lookup dictionary [Wroczynski, “If the word is not in the dictionary, it is added to the dictionary at 608. Then it is determines (612) whether possible tags for the newly word have been previously defined. If no tags were previously defined, a new set of rules is created for a new set of tags is created at 614.”], but does not teach the remaining limitations.
Tomanek, in an analogous art, teaches the above limitation. Tomanek generally relates to “machine learning approaches” (see § 1) for “many sequence labeling tasks” (see abstract). Since Tomanek pertains to the tagging of data sequences using machine learning models, Tomanek is in the same field of endeavor as the claimed invention and is also reasonably pertinent to the problem of sequence labeling.
In particular, Tomanek teaches updating “using tokens and associated tag candidates for each unique combination” [Algorithm 1 (page 1041), line 5: “move newly labeled examples from P to L.” Note that as described in Algorithm 1, “L” is a set of labeled examples, and P is a set of originally unlabeled examples that are labeled by the process described in algorithm 1, i.e., a labeling process (see title: “sequence labeling”) that returns associated tag candidates and tokens.] and updating the set of machine mining models using the updated lookup dictionary. [Algorithm 1: “loop until stopping criterion is met… 1. learn model M from L”] 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Christen, Wroczynski, Kim, and Nelke with the teachings of Tomanek such that the updating of the lookup dictionary uses tokens and associated tag candidates for each unique combination, and the method further comprises updating the set of machine mining models using the updated lookup dictionary. The motivation for doing so would have been to improve the initial model (see Tomanek, page 1042, paragraph bridging the two columns, discussing “improvement over the initial model” as a result of iterated learning that is used in active learning techniques. See also page 1045, left column, top paragraph, which teaches that the classifier improves as further rounds are performed.). 

As to claim 13, this claim recites further limitations that are the same or substantially the same as those recited in claim 3. Therefore, the rejection made to claim 3 is applied to claim 13.

As to claim 17, this claim recites further limitations that are the same or substantially the same as those recited in claim 7. Therefore, the rejection made to claim 7 is applied to claim 17.

3.	Claims 4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Christen in view of Wroczynski, Kim, and Nelke, and further in view of Brown et al. (US 2018/0181843 A1).
As to claim 4, the combination of Christen, Wroczynski, Kim, and Nelke teaches the method of claim 1, but does not teach the method further comprising the additional limitations recited in the instant claim.  
Brown, in an analogous art, teaches the additional limitations. Brown generally relates to classification using neural networks (see [0003]) and techniques in labeled data bootstrapping (see title). Therefore, Brown is in the same field of endeavor as the claimed invention, i.e., machine learning techniques.
In particular, Brown teaches prompting a user for a confidence value of a given unique combination; [[0025]: “FIGS. 3 and 4 illustrate possible interfaces 120 associated with the bootstrap trainer 114, according to an embodiment of the present disclosure. In reference to FIG. 3, the user is shown the contents of the top three rows of the interface 120 corresponding to the input image, the assigned label, and the assigned probability.” [0027]: “…the user may choose to select a confidence button 130 and specify how confident he/she is in choosing a corrected label.” Note that the “input image” in Brown is data that is to be labeled, and is therefore analogous to the “given unique combination” of the instant claim.] in response to receiving the confidence value from the user, comparing the received confidence value with a predefined threshold; [[0027]: “…the user may choose to select a confidence button 130 and specify how confident he/she is in choosing a corrected label. Corrected labels in which a user specifies a high level of confidence are given greater weight by the classifier trainer 110. In contrast, corrected labels in which a user specifies a low level of confidence are given less weight by the classifier trainer 110.” That is, either the “high level confidence” or the “low level of confidence” corresponds to the limitation of “predefined threshold.”] and computing the aggregated confidence value using the received confidence value. [Brown, [0027], quoted above, teaches that the received confidence value is used by the classifier trainer to train the classifier. Since the classifier is analogous to the set of model in the combination of Christen, Wroczynski, Kim, and Nelke, and this set of model computes an aggregate confidence value based on its training, the use of the received confidence value for training results in “computing the aggregated confidence value using the received confidence value” to the extent required by the claim. The claim does not require a specific manner of use for the received confidence value in computing the aggregate confidence value.”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Christen, Wroczynski, Kim, and Nelke with the teachings of Brown by modifying the combination of Christen, Wroczynski, Kim, and Nelke to further include “prompting a user for a confidence value of a given unique combination; in response to receiving the confidence value from the user, comparing the received confidence value with a predefined threshold; and computing the aggregated confidence value using the received confidence value.” The motivation for doing so would have been to enable a user to review computed tags (labels) to improve automatic classification, as suggested by Brown, [0022] (“a user interface that allows a human user to review”) [0019] (“to make a meaningful improvement in accuracy over each iteration.”).

	As to claim 14, this claim recites further limitations that are the same or substantially the same as those recited in claim 4. Therefore, the rejection made to claim 4 is applied to claim 14.

4.	Claims 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Christen in view of Wroczynski, Kim, and Nelke, and further in view of Bellegarda (US 2014/0324435 A1) and Nelson et al. (US 2019/0147038 A1) (“Nelson”).
As to claim 5, the combination of Christen, Wroczynski, Kim, and Nelke teaches the method of claim 1, further comprising:
Bellegarda, in an analogous art, teaches “querying a knowledge base for tags of a data value; and receiving the tags in a given order.” Bellegarda generally relates to tagging “word of a text sequence” (see abstract) using “trained” models ([0003]). Therefore, Bellegarda is in the same field of endeavor as the claimed invention. 
In particular, Bellegarda teaches querying a knowledge base for tags of a data value and receiving the tags in a given order. [[0040]: “a word of text sequence 301 is input to rule-based POS tagger 304 and statistical tagger 305 independently and/or concurrently.” Here, the rule-based POS tagger 304 corresponds to a ‘knowledge base.” Note that Bellegarda concerns “text sequence.” Therefore, a plurality of tags are received from the rule-based tagger. See also [0028]: “Once the words have been tagged with one of the tags generated by statistical tagger 106 and rule-based tagger 107.”] and the determination of “a difference between the unique combination and the received tags.” [[0040]: “At block 306, the rule-based POS tag and the statistical POS tag are compared.” [0032]: “Another situation is referred to as a tag disagreement situation in which…the statistical system returned a different tag (even after a tag conversion). In this situation, according to one embodiment, a confidence score of the rule associated with the tag generated by the rule-based system is utilized to evaluate whether the rule-based tag can be selected as the final tag applied to the input context.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Christen, Wroczynski, Kim, and Nelke with the teachings of Bellgarda by performing the operation of querying a knowledge base for tags of a data value and receiving the tags in a given order, and determining “a difference between the unique combination and the received tags.” The motivation for doing so would have been to utilize an additional tagger in technique that combines the benefits of different types of taggers, as suggested by Bellgarda (see [0020]: “combines rule-based POS tagging and statistical POS tagging techniques. Complementing a rule-based system with a statistical tagger solves many of the problems described above.”). 
The combination of Christen, Wroczynski, Kim, Nelke, and Bellgarda does not specifically teach that the “aggregated confidence value of each unique combination of the data value is revised based on” the difference between the unique combination and the received tags.
 Nelson, in an analogous art, teaches the remaining limitations. Nelson generally relates to lexical analysis, e.g., “lexical items associated with the data block, such as tokens, may be tagged with semantic tags and/or syntactic tags” ([0024]; see also [0132]). Therefore, Nelson is in the same field of endeavor as the claimed invention.
In particular, Nelson teaches that the “aggregated confidence value of each unique combination of the data value is revised” based on external factors [[0066]-[0067]: “the tagger 204 may modify confidence score of the lexical items, such as tokens… the confidence score may be calculated using a formula…” [0068]: “The rules may also include external factors, such as [information] from an external database.” See also [0069] (“the confidence factor is determined based on how well the external semantic information matches internal contextual information of the interpretation and other alternative interpretations.”) See also [0078] (“determine the confidence scores based on…various other external sources”).  (Note that the above content is disclosed in [0044] and [0052] of the priority document of Nelson).] As Nelson teaches that revising a confidence measure based on external factors is a known technique in the art, one of ordinary skill in the art would have understood that the difference between the unique combination and the received tags can be a basis for modifying the confidence measure. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Christen, Wroczynski, Kim, Nelke, and Bellgarda with the teachings of Nelson such that “aggregated confidence value of each unique combination of the data value is revised based on” the difference between the unique combination and the received tags. The motivation for doing so would have been to utilize a suitable additional source to determine a confidence score, as suggested by Nelson (see sections quoted above). 

As to claim 15, this claim recites further limitations that are the same or substantially the same as those recited in claim 5. Therefore, the rejection made to claim 5 is applied to claim 15.

5.	Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Christen in view of Wroczynski, Kim, and Nelke, and further in view of Tomanek and Brown.
As to claim 6, the combination of Christen, Wroczynski, Kim, and Nelke teaches the method of claim 1, but does not teach the method further comprising the additional limitations recited in the claim. 
Tomanek, in an analogous art, teaches “in response to determining that the highest aggregated confidence value of a unique combination is less than a predefined threshold, prompting a user…for the unique combination” and “wherein the unique combination is provided as a standardization rule” based on the user input and the predefined threshold. Tomanek generally relates to “machine learning approaches” (see § 1) for “many sequence labeling tasks” (see abstract). Since Tomanek pertains to the tagging of data sequences using machine learning models, Tomanek is in the same field of endeavor as the claimed invention and is also reasonably pertinent to the problem of sequence labeling.
In particular, Tomanek teaches in response to determining that the highest aggregated confidence value of a unique combination is less than a predefined threshold, prompting a user for (an input for) the unique combination; [Page 1041, right column, paragraph 2: “§ 3.2: “If                         
                            
                                
                                    C
                                
                                
                                    λ
                                
                            
                            (
                            
                                
                                    y
                                
                                
                                    j
                                
                                
                                    *
                                
                            
                            )
                        
                     exceeds a certain confidence threshold t,                         
                            
                                
                                    y
                                
                                
                                    j
                                
                                
                                    *
                                
                            
                        
                     is assumed to be the correct label for this token and assigned to it. Otherwise, manual annotation of this token is required.” Furthermore, as stated in Algorithm 1 (page 1041, left column), manual annotation is obtained by the operation of “query human annotator for labels of all B examples”]. Tomanek further teaches that the operation of “wherein the unique combination is provided as a standardization rule” is based on the user input and the “predefined threshold.” [Algorithm 1 indicates that the label from the annotator is accepted as correct, i.e., equivalent to a 100% confidence level). Accepting the user-input labeling as correct is analogous to providing the unique combination as a standardization rule.]  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modified the combination of Christen, Wroczynski, Kim, and Nelke with the teachings of Tomanek such that in response to determining that the highest aggregated confidence value of a unique combination is less than a predefined threshold, prompting a user for an input for the unique combination, wherein the unique combination is provided as a standardization rule based on the user input and the predefined threshold. The motivation would have been to handle high uncertain combinations that cannot be easily labeled by an automated process, as suggested by Tomanek (see abstract: “highly uncertain subsequences are presented to human annotators”).  
Brown, in an analogous art, teaches or suggests the limitation that the input from the user is a “confidence value” of the unique combination, and the limitation of “in response to receiving the confidence value from the user, comparing the received confidence value with the predefined threshold,” and the condition of “in case the received confidence value is higher than the predefined threshold” for the operation of “wherein the unique combination is provided as a standardization rule.” Brown generally relates to classification using neural networks (see [0003]) and techniques in labeled data bootstrapping (see title). Therefore, Brown is in the same field of endeavor as the claimed invention, i.e., machine learning techniques.
Brown teaches that the input from the user is a “confidence value” [[0025]: “FIGS. 3 and 4 illustrate possible interfaces 120 associated with the bootstrap trainer 114, according to an embodiment of the present disclosure. In reference to FIG. 3, the user is shown the contents of the top three rows of the interface 120 corresponding to the input image, the assigned label, and the assigned probability.” [0027]: “…the user may choose to select a confidence button 130 and specify how confident he/she is in choosing a corrected label.” Note that the “input image” in Brown is data that is to be labeled, and is therefore analogous to the “given unique combination” of the instant claim.], and “in response to receiving the confidence value from the user, comparing the received confidence value with the predefined threshold.” [[0027]: “…the user may choose to select a confidence button 130 and specify how confident he/she is in choosing a corrected label. Corrected labels in which a user specifies a high level of confidence are given greater weight by the classifier trainer 110. In contrast, corrected labels in which a user specifies a low level of confidence are given less weight by the classifier trainer 110.” That is, the “high level confidence” corresponds to the limitation of “predefined threshold.”]
Furthermore, the teachings of Brown, in conjunction with the teachings of existing references, suggests the condition of “in case the received confidence value is higher than the predefined threshold” for providing the unique combination as a standardization rule. [The condition of “in case the received confidence value is higher than the predefined threshold” would have been obvious over the teachings of Brown, because Brown teaches the general condition that a high level of confidence is given greater weight and identifies the level of confidence given by the user as a result-effective variable. Furthermore, since Tomanek teaches that the “predefined threshold” is the threshold for which a label is accepted to be correct, one of ordinary skill would have understood that a user-annotated confidence level that is above the predefined threshold would be an optimal or workable range. “Where the general conditions of a claim are disclosed in the prior art, it is not inventive to discover the optimum or workable ranges by routine experimentation.” MPEP § 2144.05(II)(A) (citing In re Aller, 220 F.2d 454, 456, 105 USPQ 233, 235 (CCPA 1955)). Therefore, in the combination of references set forth below, the condition of “in case the received confidence value is higher than the predefined threshold” would have been obvious as a discovery of an optimum or workable range by routine experimentation.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Christen, Wroczynski, Kim, Nelke, and Tomanek with the teachings of Brown by modifying the user input taught in Tomanek to be a confidence value and to perform the further operation of “in response to receiving the confidence value from the user, comparing the received confidence value with the predefined threshold.” The motivation for doing so would have been to enable a user to review computed tags (labels) to improve automatic classification, as suggested by Brown, [0022] (“a user interface that allows a human user to review”) [0019] (“to make a meaningful improvement in accuracy over each iteration.”). The limitation of wherein the unique combination is provided as a standardization rule “in case the received confidence value is higher than the predefined threshold” would have been obvious as a discovery of an optimum or workable range by routine experimentation for the reasons set forth above and the principle that “the presence of a known result-effective variable would be one…motivation for a person of ordinary skill in the art to experiment to reach another workable product or process” (MPEP § 2144.04(II)(B)).

As to claim 16, this claim recites further limitations that are the same or substantially the same as those recited in claim 6. Therefore, the rejection made to claim 6 is applied to claim 16.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The following references evidence the state of the art.
US 2009/0006282 A1 (Roth) teaches an aggregate confidence score ([0014]: sum of confidence levels)
US 2013/0212073 A1 (Cochrane) teaches utilizing data fingerprints.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to YAO DAVID HUANG whose telephone number is (571)270-1764. The examiner can normally be reached Monday - Friday 8:30 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Y.D.H./Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124