Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
Claims 1 – 20 are pending in this Office action.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

	Claims 1 – 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
Claim 1 and 11 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim recites correcting labels in untrusted data based on a small sample of trusted data in a training database.
The limitation of obtaining dataset, selecting portions of dataset, determining a corresponding label, generating updated remaining dataset and performing operations, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “a system comprising a computing device” nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the “a system comprising a computing device” language, “selecting portions of dataset, determining a corresponding label, generating updated remaining dataset and performing operation” in the context of this claim encompasses the user manually doing all the above steps. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental processes” grouping of abstract ideas. Accordingly, the claims 1 and 11 recites an abstract idea.
This judicial exception is not integrated into a practical application. In particular, the claims 2-10 and 12-19 only recites additional steps to further carrying out the performance of the limitation in the mind. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims 2-10 and 12-19 directed to an abstract idea.
The claims 1-19 do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of comprising a computing device to perform both the obtaining, determining and performing steps amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claims 1-19 are not patent eligible.
Claim 20 recites one additional element – using a processor to perform both the obtaining, determining and performing steps. The processor in all steps is recited at a high-level of generality (i.e., as a generic processor performing a generic computer function such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim 20 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional element of training a machine learning model based on the updated training data amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim 20 is not patent eligible.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1, 3-7, 9-11, 13-17 and 19 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Mukherjee et al, United States Patent Application Publication No. 20210357747.
Claim 1:
	Mukherjee discloses:
a computing device configured to: obtain a dataset from a database (see [0059]-[0067] → Mukherjee discloses this limitation in that the system obtains sets of data from a database); 
select a first portion of the dataset including trusted data of the dataset such that a remining dataset exists (see [0059]-[0067] → Mukherjee discloses this limitation in that the system obtains a small set of cleanly-labeled datasets from the obtained set of data); 
for each data sample of the remaining dataset, determine whether a corresponding observed label is a true label for the data sample based at least in part on the first portion of the dataset (see [0059]-[0067] → Mukherjee discloses this limitation in that the system determines the labels of the remaining weak dataset based on leverage both the clean and weak labels of the both datasets); 
generate an updated remaining dataset based on the determination, for each data sample of the remaining dataset, whether the observed label is a true label for the data sample (see [0059]-[0067] → Mukherjee discloses this limitation in that the system generates inferred labels base on the determination); and 
perform at least one operation based at least in part on the updated remaining dataset (see [0059]-[0067] → Mukherjee discloses this limitation in that the system utilizes the inferred labels directly for training or performs corrections based on the determination). 

Claim 3:
	Mukherjee discloses:
wherein the computing device is further configured to: for each data sample of the remaining dataset, determine a probability of the corresponding observed label being a true label; and determine, for each data sample of the remaining dataset, whether the corresponding observed label is a true label based on the probability (see Mukherjee [0034] → Mukherjee discloses this limitation in that the output of the system classification layer is corresponding to a probability distribution of each of the enumerated intents). 

Claim 4:
	Mukherjee discloses:
wherein determining whether the corresponding observed label is the true label is based on a confidence of the computing device to have correctly determined the probability (see Mukherjee [0110]-[0113] → Mukherjee discloses this limitation in that the system produces output based a probability distribution of confidence values for each intent). 

Claim 5:
	Mukherjee discloses:
wherein the probability is determined based on one or more of a first probability of the observed label being the true label for the corresponding data sample, a second probability of the true label for the corresponding data sample being a trusted label from the first portion of the dataset given the observed label of the data sample, and a probability distribution of the features of the data sample having the observed label and the corresponding trusted label (see Mukherjee [0110]-[0113] → Mukherjee discloses this limitation in that the system determines probabilities based on probability distribution for each intent among a plurality of probabilities for each corresponding data sample set). 

Claim 6:
	Mukherjee discloses:
wherein the computing device is further configured to: determine a portion of the remaining dataset with a set of first data samples where corresponding observed labels are not true labels; and update the portion of the remaining dataset by replacing the corresponding labels with a trusted label from the first portion of the dataset (see Mukherjee [0065] → Mukherjee discloses this limitation in that the system corrects the weak labels from inferred labels formed by leveraging the clean and weak label sets). 

Claim 7:
	Mukherjee discloses:
wherein a size of the first portion of the dataset is based at least in part on minimizing a weighted sum of sampling variances of features associated with the dataset (see Mukherjee [0085]-[0088] → Mukherjee discloses this limitation in that the system utilizes instance-weighted training method to determine the dataset). 

Claim 9:
	Mukherjee discloses:
wherein for each data sample of the remaining dataset, determining whether a corresponding observed label is a true label is based at least in part on assuming that the true labels of the data samples are related to the corresponding observed labels (see Mukherjee [0052] → Mukherjee discloses this limitation in that the system utilizes contents related to RI intent and ignores the trivial attachment that are not related). 

Claim 10:
	Mukherjee discloses:
wherein the updated remaining dataset is generated as the remaining dataset based on the determination, for each data sample of the remaining dataset, that the corresponding observed label is a true label for the data sample (see Mukherjee [0065] → Mukherjee discloses this limitation in that the system generates inferred labels base on the determination for each data sample of the clean dataset). 

Claim 11 is essentially the same as Claim 1 except it set forth the claimed invention as a method claim and is rejected for the same reason as applied hereinabove for Claim 1. 

Claims 13, 14, 15, 16, 17 and 19 perform the same functions as Claims 3, 4, 5, 6, 7 and 9 respectively.  Thus Mukherjee discloses/teaches every element of Claims 13, 14, 15, 16, 17 and 19 as indicated in the above rejection for Claims 3, 4, 5, 6, 7 and 9 respectively. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 2 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Mukherjee et al, in view of Ghulati, U.S. Patent Application Publication No. 20200202073.
Claim 2:
Mukherjee discloses every element of Claim 1. 
Mukherjee does not explicitly disclose:
wherein the first portion of the dataset is selected using stratified sampling of the dataset.
However, Ghulati discloses:
wherein the first portion of the dataset is selected using stratified sampling of the dataset (see Ghulati [0098] → Ghulati teaches this limitation in that the system implements stratified sampling to build and train the test datasets). 
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Mukherjee with the teachings of Ghulati for the purpose of improve existing unscalable, costly, and labor-intensive methods for verifying content, see Ghulati [0006]-[0008]. 

Claim 12 performs the same functions as Claim 2.  Thus Mukherjee as modified discloses/teaches every element of Claim 12 as indicated in the above rejection for Claim 2.

Claims 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Mukherjee et al, in view of Neumann, U.S. Patent Application Publication No. 20200380421.
Claim 8:
Mukherjee discloses every element of Claim 1. 
Mukherjee does not explicitly disclose:
wherein a size of the first portion of the dataset is based at least in part on a linear cost function.
However, Neumann discloses:
wherein a size of the first portion of the dataset is based at least in part on a linear cost function (see Neumann [0162][0163] → Neumann teaches this limitation in that the system performs supervised machine learning processes using linear cost function algorithms). 
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Mukherjee with the teachings of Neumann for the purpose of accurately generating instruction set for a user and/or to update information and training sets utilized by other models, see Neumann [0163]. 

Claim 18 performs the same functions as Claim 8.  Thus Mukherjee as modified discloses/teaches every element of Claim 18 as indicated in the above rejection for Claim 8.

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Mukherjee et al, in view of Breugelmans et al, U.S. Patent Application Publication No. 20210124996.
Claim 20:
Mukherjee discloses:
obtain training data (see [0059]-[0067] → Mukherjee discloses this limitation in that the system obtains sets of training data from a database); 
dividing the training data into a trusted dataset and an untrusted dataset (see [0059]-[0067] → Mukherjee discloses this limitation in that the system divides the training data into cleanly-labeled datasets and weak-labeled datasets); 
updating labels in the untrusted dataset to generate updated untrusted dataset based at least in part on trusted labels in the trusted dataset (see [0059]-[0067] → Mukherjee discloses this limitation in that the system determines the labels of the weak dataset based on leverage both the clean and weak labels of the both datasets); 
generating updated training data by combining the trusted dataset and updated untrusted dataset (see [0059]-[0067] → Mukherjee discloses this limitation in that the system generates inferred labels base on the determination); and 
training a machine learning model based on the updated training data (see [0059]-[0067] → Mukherjee discloses this limitation in that the system utilizes the inferred label sets for training neural networks and other machine learning models). 
Mukherjee does not explicitly disclose:
injecting noise in the untrusted database.
However, Breugelmans discloses:
injecting noise in the untrusted database (see [0035] → Breugelmans teaches this limitation in that the system generates noise into image database to train the machine learning model to learn to identify the noise as corresponding to the class label). 
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Mukherjee with the teachings of Breugelmans for the purpose of efficient training of the machine learning model to address problems for encoding and decoding image classification with labels, see Breugelmans [0003]-[0009]. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NAN HUTTON whose telephone number is (571) 270-1223. The examiner can normally be reached M-F 8AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain Alam can be reached on (571) 272-3978. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.







/NAN HUTTON/Primary Examiner, Art Unit 2154