DETAILED ACTION
Response to Amendment
The amendment was received 2/15/2022. Claims 1,3,4,6-8,15-18 and 9 and 20 are pending.
Claim Objections
Claims 1,3,4,6-8,15-18 and 9 and 20 are objected to because of the following informalities:  
Claim 1 is objected for not keeping track of labels in claim 1.
Claim 3 depends on canceled claim 2. Claim 3 is interpreted to depend on claim 
1.
Claim 4 depends on canceled claim 2. Claim 4 is interpreted to depend on claim 
1.
	Thus, claims 3,4,6-8,15-18 are objected for depending on claim 1.
	Claim 9 is objected the same as claim 1 for losing track of the claimed labels.
	Claim 10 is objected the same as claim 1 for losing track of the claimed labels.

Appropriate correction is required.
This is suggested for the tracking objection of claim 1 (including claims 9 and 20): 





1. (Suggested) A computer-implemented method for training a classifier, the method comprising:

obtaining a classifier for classifying data into one of a plurality of classes;

retrieving training data comprising a set of observations and a set of corresponding labels, each label representing an assigned class for a corresponding observation; and 

applying an agent trained by a reinforcement learning system to generate labelled data from unlabelled observations and train the classifier using the training data and the labelled data according to a policy determined by the reinforcement learning system, 

wherein the agent performs a series of actions based on a state of the classifier, each action being determined in accordance with the policy, wherein the state of the classifier represents a level of classification performance of the classifier and each action comprises: 

generating labelled data from unlabelled observations; 

training the classifier based on the labelled data and the training data; and 

determining an updated state of the classifier, the updated state representing an updated level of classification performance of the classifier following the training, 

wherein, for at least one action, generating labelled data from unlabelled observations comprises: 

dividing each of the observations from the training data into samples; 

determining a frequency of each sample within the training data; 

determining, for each sample and for each class, an inclusion probability for the sample within the class, wherein the inclusion probability represents the probability that the sample will occur within any instance of the corresponding class; 

selecting each sample that has an inclusion probability for a class that exceeds an inclusion probability threshold and assigning a label to that sample according to the class; 

identifying instances of each sample within the unlabelled observations; and  

generating labelled data from each identified instance by forming an observation comprising the identified instance and neighboring data that is located next to the identified instance within the unlabelled data and assigning the assigned label [[corresponding]] to [[the]] that sample for that instance to the newly formed observation.  




1. (Suggested with 112(a) support) A computer-implemented method for training a classifier, the method comprising:

obtaining a classifier for classifying data into one of a plurality of classes;

retrieving training data comprising a set of observations and a set of corresponding labels, each label representing an assigned class for a corresponding observation; and 

applying an agent trained by a reinforcement learning system to generate labelled data from unlabelled observations and train the classifier using the training data and the labelled data according to a policy determined by the reinforcement learning system, 

wherein the agent performs a series of actions based on a state of the classifier, each action being determined in accordance with the policy, wherein the state of the classifier represents a level of classification performance of the classifier and each action comprises: 

generating labelled data from unlabelled observations; 

training the classifier based on the labelled data and the training data; and 

determining an updated state of the classifier, the updated state representing an updated level of classification performance of the classifier following the training, 

wherein, for at least one action, generating labelled data from unlabelled observations comprises: 

dividing each of the observations from the training data into samples; 

determining a frequency of each sample within the training data; 

determining, for each sample and for each class, an inclusion probability for the sample within the class (“the inclusion probability for each word in the…class”: [0122]), wherein the inclusion probability represents the probability that the sample will occur within any instance of the corresponding class; 

selecting each sample that has an inclusion probability for a class that exceeds an inclusion probability threshold and assigning a label to that sample (“the selected word is labeled”: [0125]: 3rd S) according to the class; 

identifying instances of each sample within the unlabelled observations; and  

generating labelled data from each identified instance by forming an observation comprising the identified instance and neighboring data that is located next to the identified instance within the unlabelled data and assigning the assigned label [[corresponding]] to [[the]] that sample (“This label is then assigned to each sample”: [0125]: last S) for that instance to the newly formed observation.  


[0122] The method then determines the inclusion probability for each word in the labeled data set relative to each class 42. This is achieved by determining the overall frequency of each word in the labeled data set. The inclusion probability for each word relative to each class can then be determined based on the frequency. The inclusion probability for a given word and class represents the probability that the word will occur within any of the instances of that class. More specifically, the inclusion probability is the sum of the sample probabilities for all samples within the class that contain that word. The sample probability for a sample is the probability that the sample will contain that word. This inclusion probability applies to both the positive and negative classes of the train set.

[0125] Each sample is assigned a label and added to the training data set 50. Each sample is labeled based on the class to which the corresponding selected word is most likely to belong. That is, when a selected word has an inclusion probability that exceeds the inclusion probability threshold, the selected word is labeled with the class to which the inclusion probability relates (the class for which the word has the highest inclusion probability). This label is then assigned to each sample that is generated from the unlabeled data based on that selected word.















Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
Regarding claims 1,3,4,6-8,15-18 and 9 and 20, 35 USC 112(f) is not invoked in claims 1,6-8,15-18 and 3 and 4 and 9 and 20.
Claim 3 is interpreted to depend on claim 1.
Claim 4 is interpreted to depend on claim 1.





Accordingly, method claims 3’s and 18’s “until” is/are interpreted as contingent limitations via MPEP 2111.04: 
II.    CONTINGENT LIMITATIONS
The broadest reasonable interpretation of a method (or process) claim having contingent limitations requires only those steps that must be performed and does not include steps that are not required to be performed because the condition(s) precedent are not met. For example, assume a method claim requires step A if a first condition happens and step B if a second condition happens. If the claimed invention may be practiced without either the first or second condition happening, then neither step A or B is required by the broadest reasonable interpretation of the claim. If the claimed invention requires the first condition to occur, then the broadest reasonable interpretation of the claim requires step A. If the claimed invention requires both the first and second conditions to occur, then the broadest reasonable interpretation of the claim requires both steps A and B.
The broadest reasonable interpretation of a system (or apparatus or product) claim having structure that performs a function, which only needs to occur if a condition precedent is met, requires structure for performing the function should the condition occur. The system claim interpretation differs from a method claim interpretation because the claimed structure must be present in the system regardless of whether the condition is met and the function is actually performed. 
See Ex parte Schulhauser, Appeal 2013-007847 (PTAB April 28, 2016) for an analysis of contingent claim limitations in the context of both method claims and system claims. In Schulhauser, both method claims and system claims recited the same contingent step. When analyzing the claimed method as a whole, the PTAB determined that giving the claim its broadest reasonable interpretation, "[i]f the condition for performing a contingent step is not satisfied, the performance recited by the step need not be carried out in order for the claimed method to be performed" (quotation omitted). Schulhauser at 10. When analyzing the claimed system as a whole, the PTAB determined that "[t]he broadest reasonable interpretation of a system claim having structure that performs a function, which only needs to occur if a condition precedent is met, still requires structure for performing the function should the condition occur." Schulhauser at 14. Therefore "[t]he Examiner did not need to present evidence of the obviousness of the [ ] method steps of claim 1 that are not required to be performed under a broadest reasonable interpretation of the claim (e.g., instances in which the electrocardiac signal data is not within the threshold electrocardiac criteria such that the condition precedent for the determining step and the remaining steps of claim 1 has not been met);" however to render the claimed system obvious, the prior art must teach the structure that performs the function of the contingent step along with the other recited claim limitations. Schulhauser at 9, 14.
See also MPEP § 2143.03.
The following definitions are “taken” via MPEP 2111.01 III. "PLAIN MEANING" REFERS TO THE ORDINARY AND CUSTOMARY MEANING GIVEN TO THE TERM BY THOSE OF ORDINARY SKILL IN THE ART, 3rd paragraph, emphasis added:
“It is also appropriate to look to how the claim term is used in the prior art, which includes prior art patents, published applications, trade publications, and dictionaries. Any meaning of a claim term taken from the prior art must be consistent with the use of the claim term in the specification and drawings. Moreover , when the specification is clear about the scope and content of a claim term, there is no need to turn to extrinsic evidence for claim interpretation. 3M Innovative Props. Co. v. Tredegar Corp., 725 F.3d 1315, 1326-28, 107 USPQ2d 1717, 1726-27 (Fed. Cir. 2013) (holding that "continuous microtextured skin layer over substantially the entire laminate" was clearly defined in the written description, and therefore, there was no need to turn to extrinsic evidence to construe the claim).”

The claimed “representing” (as in “a set of corresponding labels, each label representing an assigned class” in claim 1) is interpreted in light of applicant’s disclosure and definition thereof via Dictionary.com wherein “to be the equivalent of; correspond to” is “taken” as the meaning of the claimed “representing” via MPEP 2111.01 III:
represent
verb (used with object)
14	to be the equivalent of; correspond to:
The llama of the New World represents the camel of the Old World.

	This equivalency or correspondence is reflected in claim 1, lines 22-24: “corresponding…class” and “each…class”: 
	“determining, for each sample and for each class, an inclusion probability for the sample with the class, wherein the inclusion probability represents the probability that the sample will occur within any instance of the corresponding class”




The claimed “each” (as in “determining, for each sample and for each class, an inclusion probability for the sample within the class” in claim 1) is interpreted in light of applicant’s disclosure, emphasis added:
“[0122] The method then determines the inclusion probability for each word in the labeled data set relative to each class 42. This is achieved by determining the overall frequency of each word in the labeled data set. The inclusion probability for each word relative to each class can then be determined based on the frequency. The inclusion probability for a given word and class represents the probability that the word will occur within any of the instances of that class. More specifically, the inclusion probability is the sum of the sample probabilities for all samples within the class that contain that word. The sample probability for a sample is the probability that the sample will contain that word. This inclusion probability applies to both the positive and negative classes of the train set.”

and definition thereof via Dictionary.com wherein “every one of two or more considered individually or one by one” is “taken” as the meaning of the claimed “each” via MPEP 2111.01 III:
each
adjective
1	every one of two or more considered individually or one by one:
each stone in a building; a hallway with a door at each end.








The claimed “the” (as in “determining, for each sample and for each class, an inclusion probability for the sample within the class” in claim 5) is interpreted in light of applicant’s disclosure, emphasis added:
“[0122] The method then determines the inclusion probability for each word in the labeled data set relative to each class 42. This is achieved by determining the overall frequency of each word in the labeled data set. The inclusion probability for each word relative to each class can then be determined based on the frequency. The inclusion probability for a given word and class represents the probability that the word will occur within any of the instances of that class. More specifically, the inclusion probability is the sum of the sample probabilities for all samples within the class that contain that word. The sample probability for a sample is the probability that the sample will contain that word. This inclusion probability applies to both the positive and negative classes of the train set.”

and definition thereof via Dictionary.com wherein “(one of many of a class or type, as of a manufactured item, as opposed to an individual one)” is “taken” as the meaning of the claimed “the” via MPEP 2111.01 III:
the
definite article
10	(one of many of a class or type, as of a manufactured item, as opposed to an individual one):
Did you listen to the radio last night?









The claimed “the” (as in “determining, for each sample and for each class, an inclusion probability for the sample within the class” in claim 5) is interpreted in light of applicant’s disclosure, emphasis added:
“[0122] The method then determines the inclusion probability for each word in the labeled data set relative to each class 42. This is achieved by determining the overall frequency of each word in the labeled data set. The inclusion probability for each word relative to each class can then be determined based on the frequency. The inclusion probability for a given word and class represents the probability that the word will occur within any of the instances of that class. More specifically, the inclusion probability is the sum of the sample probabilities for all samples within the class that contain that word. The sample probability for a sample is the probability that the sample will contain that word. This inclusion probability applies to both the positive and negative classes of the train set.”

and definition thereof via Dictionary.com wherein “(used, especially before a noun, with a specifying or particularizing effect, as opposed to the indefinite or generalizing force of the indefinite article a or an)” is “taken” as the meaning of the claimed “the” via MPEP 2111.01 III:
the
definite article
1	(used, especially before a noun, with a specifying or particularizing effect, as opposed to the indefinite or generalizing force of the indefinite article a or an):
the book you gave me; Come into the house.








The claimed “classifier” (as in “A machine learning classifier” in claim 9) is interpreted in light of applicant’s disclosure, emphasis added:
[0062] By utilizing the reinforcement learning methods described herein, a system can be trained to improve the classification performance of a classifier without requiring additional manually labeled data. This means that a classifier can be trained either to a higher standard using the same amount of manually labeled data as previously required, or can be trained to a similar performance using a smaller amount of input training data.

and definition thereof via Dictionary.com, wherein:
--a particular process of acting that classifies--
is “taken” as the meaning of the claimed “classifier” via MPEP 2111.01 III:
classifier
noun
1	a person or thing that classifies.

wherein “thing” is defined:
thing1
noun
6	an action, deed, event, or performance:
to do great things; His death was a horrible thing.

wherein “performance” is defined”
performance
noun
4	a particular action, deed, or proceeding.

wherein “action” is defined:
action
noun
1	the process or state of acting or of being active:
The machine is not in action now.





Response to Arguments
REMARKS
Applicant’s arguments, see remarks, page 8, filed 2/15/22, with respect to the claim objection have been fully considered and are persuasive.  The claim objection of claims 9 and 11-15 and 19 and 20 and 21 and 22 and 23 has been withdrawn. A new claim objection is above. 
35 USC 102 Rejections and 103 Rejections
Applicant’s arguments, see remarks, page 8, filed 2/15/22, with respect to 35 USC 102 and 35 USC 103 have been fully considered and are persuasive.  
The 35 USC 102 rejection of claims 1-4,8,9 and 10-13 and 19 and 20 and 21 and 22 and 23 has been withdrawn. 
The 35 USC 103 rejection of claims 5 and 14,15 has been withdrawn. 
The 35 USC 103 rejection of claims 6,7 and 16-18 has been withdrawn.
Thus, all pending claims 1,3,4,6-8,15-18 and 9 and 20 are not rejected.









Allowable Subject Matter
Claims 1,3,4,6-8,15-18 and 9 and 20 are allowed.
The following is an examiner’s statement of reasons for allowance:
The claims are allowed for the same reasons as in said applicant’s remarks, pg. 8. For example:
A.	Wu et al. (Reinforced Co-Training) teaches a probability distribution of an example as shown in fig. 3: “N-class probability distribution…”. 
B.	Johnson et al. (US 2017/0116544 A1) teaches “the probability that hyperplane h labels an instance as being +1”, [0074].
In contrast, claim 1, lines 23,24 requires “the probability that the sample will occur within any instance of the corresponding class”.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”






Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Mohamed et al. (US Patent 10,417,350) is pertinent as teaching the occurrence of a word is related to a probability of an example of a class (corresponding to the claimed claim 1, lines 23,24: “the probability that the sample will occur within any instance of the corresponding class”):
“For example, with respect to class 212A, the occurrence of n-gram 250 (with n=2) comprising Word1 followed by Word2 is associated with a 0.9 (90%) probability of classification as an example of class 212A, the occurrence of Word3 followed by Word4 is associated with a 0.7 (70%) probability, the occurrence of Word5 is associated with a 0.65 (65%), while the occurrence of Word6 followed after two intervening words by Word? (the “?” symbols indicate that any words/tokens may be positioned in the indicated positions) is associated with a 0.62 (62%) probability.”

However, Mohamed does not teach claim 1, lines 25-27:

selecting each sample that has an inclusion probability for a class that exceeds an inclusion probability threshold and assigning a label to that sample according to the class.

Instead, Mohamed teaches a “token…related with…a probability above some…threshold…of the…class being selected as the class to which the…text…belongs”:
“The token combinations may be deemed “influential” in that the presence or occurrence of the token combinations in an input text collection may be correlated with a high probability (e.g., a probability above some selected threshold) of the corresponding class being selected as the class to which the input text collection belongs.” (c.3,ll. 1-7).





Aggarwal (US Patent App. Pub. No.: US 2014/0052674 A1) is pertinent as teaching confidence as the times the word occurs in an instance corresponding to a class (corresponding to the claimed claim 1, lines 23,24: “the probability that the sample will occur within any instance of the corresponding class”):
[0088] At block 530, feature confidences for each class are re-computed. The re-computed feature confidence may be defined as the ratio of the times the word occurs in a message instance corresponding to the class partition divided by its global occurrence across all class partitions.

However, Aggarwal does not teach claim 1, lines 25-27:

selecting each sample that has an inclusion probability for a class that exceeds an inclusion probability threshold and assigning a label to that sample according to the class.

	Instead, Aggarwal teaches outputting a confidence if the confidence is above a threshold:
[0095] In another exemplary embodiment, at block 630, each class confidence that is determined to be above a predetermined threshold is reported. Accordingly, the test instance may be classified for multiple classes.



	







Koichi et al. (JP 2006-293767 A with machine translation) is pertinent as calculating a probability of a sentence as belonging to a category as a function of occurrence frequency in an category sentence example such that the probability is higher than a threshold and thus an unlabeled sentence can be classified (corresponding to the claimed: claim 1, lines 23,24: “the probability that the sample will occur within any instance of the corresponding class”):
BEST-MODE: 22nd text-block:
“The determination unit 28 acquires the belonging probability calculated for each category by the category belonging probability calculating unit 26 and determines whether to classify the unclassified sentence data into any category based on the value of the belonging probability. More specifically, the determination unit 28 classifies the unclassified sentence data into the category having the maximum attribution probability. Alternatively, uncategorized sentence data may be classified into all categories for which an attribution probability equal to or higher than a preset threshold is obtained. By doing so, one uncategorized sentence data can be classified into two or more categories by a series of operations. When there is no category having the attribution probability equal to or higher than the threshold, the determination unit 28 may determine that the uncategorized sentence data is a sentence not classified into any category, or the category having the maximum attribution probability. May be classified. The determination result of the uncategorized sentence data by the determination unit 28 is stored in the determination result storage unit 32 or output to an external device (not shown).”

BEST-MODE: 42nd text-block or 1st text-block above equation “[1]”:
“The appearance probability calculation unit 130 calculates the appearance probability a .sub.nm of each component W .sub.n extracted from the unclassified data for each category m. Here, the appearance probability a .sub.nm is calculated by the following equation using the appearance frequencies X .sub.nm and Y .sub.nm in the sentence included in the above-described category sentence example or non-category sentence example.” 

	However, Koichi does not teach the claim 1, lines 25-27:
“selecting each sample that has an inclusion probability for a class that exceeds an inclusion probability threshold and assigning a label to that sample according to the class”.

	Instead Koichi teaches classifying an unlabeled sentence when the preset threshold is met.
This application is in condition for allowance except for the following formal matters: 
The above claim objection of claims 1,3,4,6-8,15-18 and 9 and 20 regarding keeping track of the claimed labels and claim dependencies.
Prosecution on the merits is closed in accordance with the practice under Ex parte Quayle, 25 USPQ 74, 453 O.G. 213, (Comm’r Pat. 1935).
A shortened statutory period for reply to this action is set to expire TWO (2) MONTHS from the mailing date of this letter. Extensions of time may be granted under  37 CFR 1.136 but in no case can any extension carry the date for reply to this Office action beyond the maximum period of SIX MONTHS set by statute (35 U.S.C. 133).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENNIS ROSARIO whose telephone number is (571)272-7397. The examiner can normally be reached Monday-Friday, 9AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571)272-7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DENNIS ROSARIO/Examiner, Art Unit 2667 

/MATTHEW C BELLA/Supervisory Patent Examiner, Art Unit 2667