Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION


Reasons for Allowance

Claims 1 – 20   are Allowed over prior art.

The following is an examiner’s statement of reasons for allowance: 
Prior art made of record fails to teach the underline limitations within the independent claims,

Regarding Claim 1,
A method for jointly training a classification model and a confidence model, the method comprising: receiving, at data processing hardware, a training data set comprising a plurality of training data subsets, each training data subset associated with a different respective class and having a plurality of corresponding training examples that belong to the respective class; from two or more training data subsets in the training data set: selecting, by the data processing hardware, a support set of training examples, the support set of training examples comprising K number of training examples sampled from each of the two or more training data subsets; and selecting, by the data processing hardware, a query set of training examples, the query set of training examples comprising training examples sampled from each of the two or more training data subsets that are not included in the support set of training examples; for each respective class associated with the two or more training data subsets, determining, by the data processing hardware, using the classification model, a centroid value by averaging K number of support encodings associated with the K number of training examples in the support set of training examples that belong to the respective class; for each training example in the query set of training examples: generating, by the data processing hardware, using the classification model, a query encoding; determining, by the data processing hardware, a class distance measure representing a respective distance between the query encoding and the centroid value determined for each respective class; determining, by the data processing hardware, a ground-truth distance between the query encoding and a ground-truth label associated with the corresponding training example in the query set of training examples; and updating, by the data processing hardware, parameters of the classification model based on the class distance measure and the ground-truth distance; and for each training example in the query set of training examples identified as being misclassified: generating, by the data processing hardware, using the confidence model, a standard deviation value for the query encoding generated by the classification model for the corresponding misclassified training example; sampling, by the data processing hardware, using the standard deviation value and the query encoding, a new query encoding for the corresponding misclassified training example; and updating, by the data processing hardware, parameters of the confidence model based on the new query encoding. 

Regarding Claim 11,
A system for jointly training a classification model and a confidence model, the system comprising: data processing hardware; and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising: receiving a training data set comprising a plurality of training data subsets, each training data subset associated with a different respective class and having a plurality of corresponding training examples that belong to the respective class; from two or more training data subsets in the training data set: selecting a support set of training examples, the support set of training examples comprising K number of training examples sampled from each of the two or more training data subsets; and selecting a query set of training examples, the query set of training examples comprising training examples sampled from each of the two or more training data subsets that are not included in the support set of training examples; for each respective class associated with the two or more training data subsets, determining, using the classification model, a centroid value by averaging K number of support encodings associated with the K number of training examples in the support set of training examples that belong to the respective class; for each training example in the query set of training examples: generating, using the classification model, a query encoding; determining a class distance measure representing a respective distance between the query encoding and the centroid value determined for each respective class; determining a ground-truth distance between the query encoding and a ground-truth label associated with the corresponding training example in the query set of training examples; and updating parameters of the classification model based on the class distance measure and the ground-truth distance; and for each training example in the query set of training examples identified as being misclassified: generating, using the confidence model, a standard deviation value for the query encoding generated by the classification model for the corresponding misclassified training example; sampling, using the standard deviation value and the query encoding, a new query encoding for the corresponding misclassified training example; and updating parameters of the confidence model based on the new query encoding.  


Regarding Claim 1: 
The following prior art  DUBOVSKY et al. ( USPUB 20200074247)  teaches A method for jointly training a classification model and a confidence model ( classification model and confidence of recognized object taught within Paragraphs [0008-0009] and [0019]) , the method comprising: receiving, at data processing hardware ( data processing apparatus taught within Paragraphs [0054-0055]) , a training data set comprising a plurality of training data subsets( Paragraphs [0006] and [0029],[0034]) , each training data subset associated with a different respective class and having a plurality of corresponding training examples that belong to the respective class ( Paragraphs [0024] and [0034]) ; from two or more training data subsets in the training data set ( Paragraphs [0035] and [0047]) : selecting, by the data processing hardware, a support set of training examples( Paragraphs [0054-0055]) ,
Within analogous art, Mikhail Bilenko ( NPL DOC: "Learning to Combine Trained Distance Metrics for Duplicate Detection in Databases,"22nd February 2002,  Technical Report AI-02-296, Artificial Intelligence Lab,University of Texas at Austin, February 2002,Pages-2-16.)  teaches the support set of training examples comprising K number of training examples sampled from each of the two or more training data subsets ( Page 7 - If only a relatively small number of training examples is available, probabilities of some edit operations may be underestimated, and distances assigned to strings will vary significantly with minor character variations. There are two steps that need to be taken to address these issues. First, the probability distribution over the set of edit operations, E, is smoothed by bounding each edit operation probability by some minimum value _....” AND Page 8 – “…Matched pairs of duplicate records …can be used to construct a training set of such vectors by assigning them a positive class label. Pairs of records that are not labeled as duplicates form the complementary set of negative examples. A binary classifier can then be trained using these vectors to discriminate between pairs of records…”); 
Within analogous art Yunhui Long ( NPL DOC: "Understanding Membership Inferences on Well-Generalized Learning Models," 13th February2018,arXiv:1802.04889,https://arxiv.org/abs/1802.04889,Pages 1-13. ) teaches selecting, by the data processing hardware, a query set of training examples( Page 4-Col. 4- “By querying the model M, the adversary gathers evidence in favor of either Hin or Hout, eventually deciding in the favor of the more likely hypothesis. To illustrate this approach, we use a toy dataset with 1,181 records (as shown in Figure 1a) to train a neural network model with two fully connected layers for binary classification. Suppose we want to infer the membership of a record r by querying a record q. Let M(q) be the models output to q. Over the record space from which the training records are sampled,…”) , the query set of training examples comprising training examples sampled from each of the two or more training data subsets that are not included in the support set of training examples(Page 8 – Col. 1- “…to the training dataset almost always increase the models’ output probability on the class label yr for the query q. In practice, we use q in the MIA on the target record r only if I(r;q) is greater than a threshold q (e.g. 0.95). Algorithm 1 summarizes the entire algorithm for query selection….” AND Col. 2- “…query selection and optimization algorithms on all random records because the predictions of the models on most records are highly correlated: the models giving high output probabilities on some record are also likely to give high output probabilities on correlated records. To improve the efficiency of query selection…”)  ; for each respective class associated with the two or more training data subsets ( Page 16-Col. 1- “…we trained one attack classifier per class for each dataset. The attack classifiers are neural networks with one hidden layer of 64 units….”) , determining, by the data processing hardware, using the classification model( Page 1- Col. 2 – “…trains an attack model that utilizes the target model’s classification result for a given input to determine whether the input is present in the target model’s training set….” AND Page 2- Col. 2- “… we run a hypothesis test to decide whether its classification by the target model is in line with this distribution. This approach successfully identifies training records of well-generalized models….” ) , within claim 1, but does not teach the limitations, nor render obvious the following limitations : " a centroid value by averaging K number of support encodings associated with the K number of training examples in the support set of training examples that belong to the respective class; for each training example in the query set of training examples: generating, by the data processing hardware, using the classification model, a query encoding; determining, by the data processing hardware, a class distance measure representing a respective distance between the query encoding and the centroid value determined for each respective class; determining, by the data processing hardware, a ground-truth distance between the query encoding and a ground-truth label associated with the corresponding training example in the query set of training examples; and updating, by the data processing hardware, parameters of the classification model based on the class distance measure and the ground-truth distance; and for each training example in the query set of training examples identified as being misclassified: generating, by the data processing hardware, using the confidence model, a standard deviation value for the query encoding generated by the classification model for the corresponding misclassified training example; sampling, by the data processing hardware, using the standard deviation value and the query encoding, a new query encoding for the corresponding misclassified training example; and updating, by the data processing hardware, parameters of the confidence model based on the new query encoding.  ”


Regarding Claim 11: 
The following prior art  DUBOVSKY et al. ( USPUB 20200074247)  teaches A system for jointly training a classification model and a confidence model ( classification model and confidence of recognized object taught within Paragraphs [0008-0009] and [0019]) , the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware ( data processing apparatus taught within Paragraphs [0054-0055]) to perform operations comprising: receiving a training data set comprising a plurality of training data subsets ( Paragraphs [0006] and [0029],[0034]) , each training data subset associated with a different respective class and having a plurality of corresponding training examples that belong to the respective class ( Paragraphs [0024] and [0034]) ; from two or more training data subsets in the training data set ( Paragraphs [0035] and [0047]) : selecting a support set of training examples ( Paragraphs [0054-0055]) ,
Within analogous art, Mikhail Bilenko ( NPL DOC: "Learning to Combine Trained Distance Metrics for Duplicate Detection in Databases,"22nd February 2002,  Technical Report AI-02-296, Artificial Intelligence Lab,University of Texas at Austin, February 2002,Pages-2-16.)  teaches the support set of training examples comprising K number of training examples sampled from each of the two or more training data subsets ( Page 7 - If only a relatively small number of training examples is available, probabilities of some edit operations may be underestimated, and distances assigned to strings will vary significantly with minor character variations. There are two steps that need to be taken to address these issues. First, the probability distribution over the set of edit operations, E, is smoothed by bounding each edit operation probability by some minimum value _....” AND Page 8 – “…Matched pairs of duplicate records …can be used to construct a training set of such vectors by assigning them a positive class label. Pairs of records that are not labeled as duplicates form the complementary set of negative examples. A binary classifier can then be trained using these vectors to discriminate between pairs of records…”); 
Within analogous art Yunhui Long ( NPL DOC: "Understanding Membership Inferences on Well-Generalized Learning Models," 13th February2018,arXiv:1802.04889,https://arxiv.org/abs/1802.04889,Pages 1-13. ) teaches selecting a query set of training examples, the query set of training examples ( Page 4-Col. 4- “By querying the model M, the adversary gathers evidence in favor of either Hin or Hout, eventually deciding in the favor of the more likely hypothesis. To illustrate this approach, we use a toy dataset with 1,181 records (as shown in Figure 1a) to train a neural network model with two fully connected layers for binary classification. Suppose we want to infer the membership of a record r by querying a record q. Let M(q) be the models output to q. Over the record space from which the training records are sampled,…”) , the query set of training examples comprising training examples sampled from each of the two or more training data subsets that are not included in the support set of training examples (Page 8 – Col. 1- “…to the training dataset almost always increase the models’ output probability on the class label yr for the query q. In practice, we use q in the MIA on the target record r only if I(r;q) is greater than a threshold q (e.g. 0.95). Algorithm 1 summarizes the entire algorithm for query selection….” AND Col. 2- “…query selection and optimization algorithms on all random records because the predictions of the models on most records are highly correlated: the models giving high output probabilities on some record are also likely to give high output probabilities on correlated records. To improve the efficiency of query selection…”)  ; for each respective class associated with the two or more training data subsets ( Page 16-Col. 1- “…we trained one attack classifier per class for each dataset. The attack classifiers are neural networks with one hidden layer of 64 units….”) , determining, using the classification model( Page 1- Col. 2 – “…trains an attack model that utilizes the target model’s classification result for a given input to determine whether the input is present in the target model’s training set….” AND Page 2- Col. 2- “… we run a hypothesis test to decide whether its classification by the target model is in line with this distribution. This approach successfully identifies training records of well-generalized models….” ) , within claim 11, but does not teach the limitations, nor render obvious the following limitations : " a centroid value by averaging K number of support encodings associated with the K number of training examples in the support set of training examples that belong to the respective class; for each training example in the query set of training examples: generating, using the classification model, a query encoding; determining a class distance measure representing a respective distance between the query encoding and the centroid value determined for each respective class; determining a ground-truth distance between the query encoding and a ground-truth label associated with the corresponding training example in the query set of training examples; and updating parameters of the classification model based on the class distance measure and the ground-truth distance; and for each training example in the query set of training examples identified as being misclassified: generating, using the confidence model, a standard deviation value for the query encoding generated by the classification model for the corresponding misclassified training example; sampling, using the standard deviation value and the query encoding, a new query encoding for the corresponding misclassified training example; and updating parameters of the confidence model based on the new query encoding. ”

A system for jointly training a classification model and a confidence model, the system comprising: data processing hardware; and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising: receiving a training data set comprising a plurality of training data subsets, each training data subset associated with a different respective class and having a plurality of corresponding training examples that belong to the respective class; from two or more training data subsets in the training data set: selecting a support set of training examples, the support set of training examples comprising K number of training examples sampled from each of the two or more training data subsets; and selecting a query set of training examples, the query set of training examples comprising training examples sampled from each of the two or more training data subsets that are not included in the support set of training examples; for each respective class associated with the two or more training data subsets, determining, using the classification model, 

3.	The examiner found no suggestions or motivations to combine similar teachings from prior art made of record to overcome the limitations as discussed above. 

4.	Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”


Conclusion



5. 	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Refer to PTO-892, Notice of Reference Cited for a listing of analogous art.
6. 	Any inquiry concerning this communication or earlier communications from the examiner should be directed to OMAR S ISMAIL whose telephone number is (571) 272-9799 and FAX number (571) 273-9799.  The examiner can normally be reached on M-F 9:00am-6:00pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David C. Payne can be reached on (571) 272-3024.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-3024.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/OMAR S ISMAIL/
Primary Examiner, Art Unit 2637