DETAILED ACTION
Claim Objections
Claims 9 and 11-15 and 19 and 20 and 21 and 22 and 23 are objected to because of the following informalities:  
Regarding claim 9, independent claim 9 is objected for truncating the claimed “method” as the word “claim 1”: suggested:
9. A machine learning classifier trained according to a computer-implemented method for training a classifier, the method comprising: 
obtaining a classifier for classifying data into one of a plurality of classes;
retrieving training data comprising a set of observations and a set of corresponding labels, each label representing an assigned class for a corresponding observation; and 
applying an agent trained by a reinforcement learning system to generate labeled data from unlabeled observations and train the classifier using the training data and the labeled data according to a policy determined by the reinforcement learning system.  
	






Regarding claim 11, claim 11 is objected for not clearly further limiting claim 10, line 13’s “generates a policy”. This is suggested:	
11. The method of claim 10 wherein the agent performs a series of actions based on a state of the classifier, generating the policy further comprises storing each unique action and a value associated with each unique action, the state of the classifier represents a level of classification performance of the classifier, and each action comprises: 

generating labeled data from unlabeled observations; 

training the classifier based on the labeled data and the training data; and

determining an updated state of the classifier, the updated state representing an updated level of classification performance of the classifier following the training.

wherein “ing” of “generating” is defined via Dictionary.com, emphasis added:
-ing
a suffix of nouns formed from verbs, expressing the action of the verb or its result, product, material, etc. (the art of building; a new building; cotton wadding). It is also used to form nouns from words other than verbs (offing; shirting). Verbal nouns ending in -ing are often used attributively (the printing trade) and in forming compounds (drinking song). In some compounds (sewing machine), the first element might reasonably be regarded as the participial adjective, -ing2, the compound thus meaning “a machine that sews,” but it is commonly taken as a verbal noun, the compound being explained as “a machine for sewing.”

Thus, claims 12-15 are objected for depending on claim 11.
Further regarding claim 14, claim 14 is missing a period, “.”.
	Thus claim 15 is further objected for depending on claim 14.





Regarding claim 19, independent claim 19 is objected the same as claim 9’s truncation.
Regarding claim 20, independent claim 20 is objected the same as claim 9’s truncation.
Regarding claim 21, independent claim 21 is objected the same as claim 9’s truncation.
Regarding claim 22, independent claim 22 is objected the same as claim 9’s truncation.
Regarding claim 23, independent claim 23 is objected the same as claim 9’s truncation.
Appropriate correction is required.












Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
Regarding claims 1-23, 35 USC 112(f) is not invoked in claims 1-23.







Accordingly, method claims 3’s and 18’s “until” is/are interpreted as contingent limitations via MPEP 2111.04: 
II.    CONTINGENT LIMITATIONS
The broadest reasonable interpretation of a method (or process) claim having contingent limitations requires only those steps that must be performed and does not include steps that are not required to be performed because the condition(s) precedent are not met. For example, assume a method claim requires step A if a first condition happens and step B if a second condition happens. If the claimed invention may be practiced without either the first or second condition happening, then neither step A or B is required by the broadest reasonable interpretation of the claim. If the claimed invention requires the first condition to occur, then the broadest reasonable interpretation of the claim requires step A. If the claimed invention requires both the first and second conditions to occur, then the broadest reasonable interpretation of the claim requires both steps A and B.
The broadest reasonable interpretation of a system (or apparatus or product) claim having structure that performs a function, which only needs to occur if a condition precedent is met, requires structure for performing the function should the condition occur. The system claim interpretation differs from a method claim interpretation because the claimed structure must be present in the system regardless of whether the condition is met and the function is actually performed. 
See Ex parte Schulhauser, Appeal 2013-007847 (PTAB April 28, 2016) for an analysis of contingent claim limitations in the context of both method claims and system claims. In Schulhauser, both method claims and system claims recited the same contingent step. When analyzing the claimed method as a whole, the PTAB determined that giving the claim its broadest reasonable interpretation, "[i]f the condition for performing a contingent step is not satisfied, the performance recited by the step need not be carried out in order for the claimed method to be performed" (quotation omitted). Schulhauser at 10. When analyzing the claimed system as a whole, the PTAB determined that "[t]he broadest reasonable interpretation of a system claim having structure that performs a function, which only needs to occur if a condition precedent is met, still requires structure for performing the function should the condition occur." Schulhauser at 14. Therefore "[t]he Examiner did not need to present evidence of the obviousness of the [ ] method steps of claim 1 that are not required to be performed under a broadest reasonable interpretation of the claim (e.g., instances in which the electrocardiac signal data is not within the threshold electrocardiac criteria such that the condition precedent for the determining step and the remaining steps of claim 1 has not been met);" however to render the claimed system obvious, the prior art must teach the structure that performs the function of the contingent step along with the other recited claim limitations. Schulhauser at 9, 14.
See also MPEP § 2143.03.
Regarding claims 22 and 23, the claimed “computer readable storage medium” is interpreted in light of applicant’s disclosure:
“[0164] Implementations of the subject matter and the operations described in this specification can be realized in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their 
structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be realized using one or more computer programs, i.e., one or more modules of computer program instructions, encoded on computer storage medium for execution by, or to control the operation of, data processing apparatus. Alternatively or in addition, the program instructions can be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. A computer storage medium can be, or be included in, a computer- readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them. Moreover, while a computer storage medium is not a propagated signal, a computer storage medium can be a source or destination of computer program instructions encoded in an artificially generated propagated signal. The computer storage medium can also be, or be included in, one or more separate physical components or media (e.g., multiple CDs, disks, or other storage devices).”











Accordingly the following definitions are “taken” via MPEP 2111.01 III. "PLAIN MEANING" REFERS TO THE ORDINARY AND CUSTOMARY MEANING GIVEN TO THE TERM BY THOSE OF ORDINARY SKILL IN THE ART, 3rd paragraph, emphasis added:
“It is also appropriate to look to how the claim term is used in the prior art, which includes prior art patents, published applications, trade publications, and dictionaries. Any meaning of a claim term taken from the prior art must be consistent with the use of the claim term in the specification and drawings. Moreover , when the specification is clear about the scope and content of a claim term, there is no need to turn to extrinsic evidence for claim interpretation. 3M Innovative Props. Co. v. Tredegar Corp., 725 F.3d 1315, 1326-28, 107 USPQ2d 1717, 1726-27 (Fed. Cir. 2013) (holding that "continuous microtextured skin layer over substantially the entire laminate" was clearly defined in the written description, and therefore, there was no need to turn to extrinsic evidence to construe the claim).”

The claimed “each” (as in “determining, for each sample and for each class, an inclusion probability for the sample within the class” in claim 5) is interpreted in light of applicant’s disclosure, emphasis added:
“[0122] The method then determines the inclusion probability for each word in the labeled data set relative to each class 42. This is achieved by determining the overall frequency of each word in the labeled data set. The inclusion probability for each word relative to each class can then be determined based on the frequency. The inclusion probability for a given word and class represents the probability that the word will occur within any of the instances of that class. More specifically, the inclusion probability is the sum of the sample probabilities for all samples within the class that contain that word. The sample probability for a sample is the probability that the sample will contain that word. This inclusion probability applies to both the positive and negative classes of the train set.”

and definition thereof via Dictionary.com wherein “every one of two or more considered individually or one by one” is “taken” as the meaning of the claimed “each” via MPEP 2111.01 III:
each
adjective
1	every one of two or more considered individually or one by one:
each stone in a building; a hallway with a door at each end.
The claimed “the” (as in “determining, for each sample and for each class, an inclusion probability for the sample within the class” in claim 5) is interpreted in light of applicant’s disclosure, emphasis added:
“[0122] The method then determines the inclusion probability for each word in the labeled data set relative to each class 42. This is achieved by determining the overall frequency of each word in the labeled data set. The inclusion probability for each word relative to each class can then be determined based on the frequency. The inclusion probability for a given word and class represents the probability that the word will occur within any of the instances of that class. More specifically, the inclusion probability is the sum of the sample probabilities for all samples within the class that contain that word. The sample probability for a sample is the probability that the sample will contain that word. This inclusion probability applies to both the positive and negative classes of the train set.”

and definition thereof via Dictionary.com wherein “(one of many of a class or type, as of a manufactured item, as opposed to an individual one)” is “taken” as the meaning of the claimed “the” via MPEP 2111.01 III:
the
definite article
10	(one of many of a class or type, as of a manufactured item, as opposed to an individual one):
Did you listen to the radio last night?









The claimed “the” (as in “determining, for each sample and for each class, an inclusion probability for the sample within the class” in claim 5) is interpreted in light of applicant’s disclosure, emphasis added:
“[0122] The method then determines the inclusion probability for each word in the labeled data set relative to each class 42. This is achieved by determining the overall frequency of each word in the labeled data set. The inclusion probability for each word relative to each class can then be determined based on the frequency. The inclusion probability for a given word and class represents the probability that the word will occur within any of the instances of that class. More specifically, the inclusion probability is the sum of the sample probabilities for all samples within the class that contain that word. The sample probability for a sample is the probability that the sample will contain that word. This inclusion probability applies to both the positive and negative classes of the train set.”

and definition thereof via Dictionary.com wherein “(used, especially before a noun, with a specifying or particularizing effect, as opposed to the indefinite or generalizing force of the indefinite article a or an)” is “taken” as the meaning of the claimed “the” via MPEP 2111.01 III:
the
definite article
1	(used, especially before a noun, with a specifying or particularizing effect, as opposed to the indefinite or generalizing force of the indefinite article a or an):
the book you gave me; Come into the house.








Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 22 and 23 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the claimed “computer readable storage medium” is not limited by the disclosure (for example, “computer storage medium is not a propagated signal” at [0164], cited above) and encompasses propagating signals to one of ordinary skill in the art of transferring energy from one location to the next, especially waves, for example water waves.












Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1-4,8,9 and 10-13 and 19 and 20 and 21 and 22 and 23 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wu et al. (Reinforced Co-Training).
Regarding claim 1, Wu discloses a computer-implemented method for training a classifier, the method comprising: 
obtaining a classifier (resulting in figure 2: “Classifier C1”) for classifying data into one of a plurality of classes;
retrieving training (via description of fig. 1: ‘training methods”) data (via fig. 2: “unlabeled set”: “Labeled by Classifier 2”) comprising (or involving as a factor) a set of observations (via fig. 2: “Q-agent”: “agent observes”, section 3.3 Q-Learning Agent, 3rd Sentence) and a set of corresponding labels, each label representing an assigned class for a corresponding observation; and 
applying an agent (via said fig. 2: “Q-agent”) trained (via “guide the…learning of the Q-agent””, 1 Introduction, 3rd paragraph, 5th S) by a reinforcement learning system to generate labeled data from unlabeled observations and train the classifier using the training data and the labeled data according to a policy (via “exploits this policy to train the co-training classifiers”, Abstract, penultimate S) determined by the reinforcement learning system.
Regarding claim 2, Wu discloses the method of claim 1 wherein the agent performs a series of actions (via fig. 2: “action at” that is “sequential”, 3 Method, 2nd para, 1st S) based on a state (via fig. 2: “state St+1”) of the classifier, each action being determined (via “select an action”, 3.3 Q-Learning Agent, 3rd S) in accordance with the policy, wherein the state of the classifier represents a level of classification performance (via “performance-driven rewards”, 1 Introduction, 3rd para, 5th S, represented in fig. 2: “reward rt”) of the classifier and each action comprises:
generating labeled data (said via fig. 2: “unlabeled set”: “Labeled by Classifier 2”) from unlabeled observations (said via fig. 2: “Q-agent”: “agent observes”); 
training the classifier (via said “exploits this policy to train the co-training classifiers”) based on the labeled data and the training data; and
determining an updated (via “update two classifiers”, 3 Method, 1st para, bullet “3.”) state (said via fig. 2: “state St+1” detailed in equation “(3)”, 3.3.1 State Representation) of the classifier, the updated state representing an updated level of classification performance (via said fig. 2: “reward rt”) of the classifier following the training.







Regarding claim 3, Wu discloses the method of claim 2 wherein: 
the method comprises, prior (via before each iteration of fig. 2) to applying the agent: 
applying (via a previous iteration) the classifier to the observations from the training data to classify the observations; and 
determining (via a previous iteration) a state (said via fig. 2: “state St+1”) of the classifier, the state representing the classification performance of the classifier, and 
applying (via post said a previous iteration) the agent comprises iteratively (as indicated by the loop in fig. 2): 
determining (said via “select an action”) from the policy the action that corresponds to the state of the classifier; 
performing the action to generate the updated state; and 
setting the state to equal (via “similar” via equation “(5)”, 3.3.2 Q-Network, comprising a {set} further comprising “a number, group, or combination of things of similar nature, design, or function” via Dictionary.com) the updated state, (contingent limitation follows) until an end condition is reached (thus Wu need not present this contingent limitation).




Regarding claim 4, Wu discloses the method of claim 2 wherein, for at least one action, generating labeled data from unlabeled observations comprises: 
applying the classifier to classify the unlabeled observations; and 
for each unlabeled observation that has been classified with a confidence score (via “scored” “confidence”, 1 Introduction, 2nd para 4th S) that exceeds a confidence threshold (or zero, “0”, 3.3.3 Reward Function, rt equation with inequalities “> 0”), assigning a label to the unlabeled observation according to the classification.
Regarding claim 8, Wu discloses the method of claim 1 further comprising training the agent (via “train the Q-agent”, 3 Method, 2nd para, last S) to determine a set of actions (via said “select an action”) to be performed by the agent for achieving a desired classification performance (via said fig. 2: “reward rt”) of the classification system.
Regarding claim 9, Wu discloses a machine learning classifier trained according to the method of claim 1 (via the rejection of claim 1).









Regarding claim 10, claim 10 is rejected the same as claims 1,2 and 8. Thus, argument presented in claims 1,2 and 8 is equally applicable to claim 10. Accordingly, Wu discloses a computer implemented method for training an agent to improve the classification performance of a classification system, the method comprising: 
obtaining a classifier for classifying data into one of a plurality of classes;
retrieving training data comprising a set of observations and a set of corresponding labels, each label representing an assigned class for a corresponding observation (via rejection of claim 1); and 
training an agent to perform a series of actions (via rejection of claim 2), each action comprising:
generating labeled data (via fig. 2: “Labeled by Classifier 2”) from a first set (via fig. 1: “Data Space”) of observations (said via fig. 2: “Q-agent”: “agent observes”) taken from the training data and adding (via “addition”, 3 Method, 1st para, 3rd bullet) the labeled data to a cumulative training set (via “selected subset of unlabeled data”, ibid.) comprising the first set of observations and their corresponding labels; 
training the classifier (via the rejection of claim 1) using the cumulative training set; and 
determining a classification performance (via said “performance-driven rewards”) of the trained classifier; 



wherein the training the agent generates a policy (via “improve the policy”, 3 Method, 1st para, last S, i.e., “to make or become better in quality” via Dictionary.com) dictating a set of actions (via fig. 2: “action                         
                            
                                
                                    a
                                
                                
                                    t
                                
                            
                        
                    ”) to be performed by the agent (as indicated upon the output of fig. 2: “Q-agent”) for achieving a desired classification performance (via said “performance-driven rewards”) of the classification system.

















Regarding claim 11, Wu discloses the method of claim 10 wherein the agent performs a series of actions based on a state of the classifier (via the rejection of claim 2), generating the policy comprises storing (via learning) each unique action (said via fig. 2: “action at”) and a value associated with each unique action, the state of the classifier represents a level of classification performance (via said reward) of the classifier, and each action comprises: 
generating labeled data from unlabeled observations (via said fig. 2: “Q-agent” that “observes”); 
training the classifier (via said “exploits this policy to train the co-training classifiers”) based on (via the loop of fig. 2) the labeled data and the training data; and
determining an updated (said via “update two classifiers”) state of the classifier, the updated state representing an updated level of classification performance (said “performance-driven rewards”) of the classifier following the training.
Regarding claim 12, claim 12 is rejected the same as claim 4. Thus, argument presented in claim 4 is equally applicable to claim 12. Accordingly, Wu as combined with Johnson teaches the method of claim 11 wherein, for at least one action, generating labeled data from the first set of observations taken from the training data comprises: 
applying the classifier to classify the first set of observations; 
for each of the first set of observations that has been classified with a confidence score that exceeds a confidence threshold, assigning a label to the unlabeled observation according to the classification.

Regarding claim 13, Wu discloses the method of claim 12 wherein training the system comprises assigning a value to (via the loop in Wu’s fig. 2) the at least one action and, based on (via the loop in Wu’s fig. 2) the value, determining (Markush limitation follows: A or B) one or more of (A) whether to perform this action (as indicated by fig. 2: “Q-agent” “making…the…action…at”, section 2 Method, 2nd paragraph, 1st sentence), or (B) a level (via high-quality unlabeled examples”, 1 Introduction, 3rd para, 4th S) for the confidence threshold that improves classification performance.
Regarding claim 19, claim 19 is rejected the same as claim 10. Thus, argument presented in claim 10 is equally applicable to claim 19. Accordingly, Wu discloses claim 19 of a reinforcement learning system trained according to the method of claim 10 (via rejection of claim 10).
Regarding claim 20, claim 20 is rejected the same as claim 1. Thus, argument presented in claim 1 is equally applicable to claim 20. Accordingly, Wu discloses claim
20 of a computing system comprising a processor (or a neural network as shown in fig. 3: “Q-network” specifically comprising a computer via Dictionary.com) configured to implement the method of claim 1 (via the rejection of claim 1).
Regarding claim 21, claim 21 is rejected the same as claim 10. Thus, argument presented in claim 10 is equally applicable to claim 21. Accordingly, Wu discloses claim
21 of a computing system comprising a processor (via said neural-net specifically comprising a processor) configured to implement the method of claim 10 (via the rejection of claim 10).

Regarding claim 22, claim 22 is rejected the same as claim 1. Thus, argument presented in claim 1 is equally applicable to claim 22. Accordingly, Wu discloses claim
22 of a computer readable storage medium (via said neural-net comprising hardware or software via Dictionary.com) encoded with instructions that, when executed by one or more computers (of the hardware or software), cause the one or more computers to implement the method of claim 1 (via the rejection of claim 1).
Regarding claim 23, claim 23 is rejected the same as claim 10. Thus, argument presented in claim 10 is equally applicable to claim 23. Accordingly, Wu discloses claim
23 of a computer readable storage medium (via said neural-net comprising hardware or software) encoded with instructions that, when executed by one or more computers (said of the hardware or software), cause the one or more computers to implement the method of claim 10 (via the rejection of claim 10).











Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Regarding inquiry 4, see Suggestions regarding claim 5.
Claims 5 and 14,15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al. (Reinforced Co-Training) in view of Johnson et al. (US Patent App. Pub. No.: US 2017/0116544 A1).
	Regarding claim 5, Wu teaches claim 5 of the method of claim 2 wherein, for at least one action, generating labeled data from unlabeled observations comprises:



dividing (via a “time step”, 3 Method, 2nd para, 1st S, represented in fig. 2 as subscript “t” or “t+1” or “#1” thru  “#K”) each of the observations (said via fig. 2: “Q-agent”: “agent observes”) from the training data into (i.e., toward via feedback arrows in fig. 2) samples (fig. 1: squares represented in fig. 2: “unlabeled subsets {Ui}” and “Labeled by Classifier 2” for each division of time); 
determining the frequency of each sample within the training data; 
determining, for each sample and for each class, an inclusion probability (via “the two classifiers’ probability distribution”, section 3.3.1 State Representation, 2nd para, 1st S) for the sample within the class; 
selecting each sample (via “selects high confidence samples”, Introduction, 2nd para, 4th S) that has an inclusion probability (said via “the two classifiers’ probability distribution”) for a class that exceeds an inclusion probability threshold (or zero, “0”, 3.3.3 Reward Function, rt equation with inequalities “> 0”) and assigning (via said figure 2: “Classifier C1”) a label to that sample according to the class; 
identifying (said via fig. 2: “unlabeled set”: “Labeled by Classifier 2”) instances (or “high-quality unlabeled examples”, 1 Introduction, 3rd para, 3rd S) of each sample within the unlabeled observations; and






generating labeled data (said via fig. 2: “unlabeled set”: “Labeled by Classifier 2”) from (via the loop in fig. 2) each identified instance (said via fig. 2: “unlabeled set”: “Labeled by Classifier 2”) by forming (via each said “time step” or iteration) an observation (said via fig. 2: “Q-agent”: “agent observes”) comprising the identified instance and neighbouring (via “the boundary”, comprising a “neighborhood”, in the description of fig. 1) data (as shown in fig. 1: neighboring symbol shapes) that is located next to the (neighboring) identified instance within the unlabeled data and assigning (said via figure 2: “Classifier C1”) the label corresponding to the sample for that instance to the newly formed observation (wherein boundary is defined via Dictionary.com:
boundary
noun, plural bound·a·ries.
2	Also called frontier. Mathematics. the collection of all points of a given set having the property that every neighborhood of each point contains points in the set and in the complement of the set.).

	Thus, Wu does not teach, as indicated in bold above, the claimed:
A.	“determining the frequency of each sample within the training data”; and
B.	“located next”.







Accordingly, Johnson teaches claim 5 of:
dividing each of the observations from the training (via Johnson: fig. 2A: “NUMBER OF TRAINING DOCUMENTS”) data into samples (resulting in “sampled k instances” cited in [0063], 1st sentence, and represented in fig. 7:706: “CONSTRUCT FIRST BATCH OF k DOCUMENTS”); 
A.	determining the frequency (via “TF-IDF”, cited in [0061], or “Term Frequency-Inverse Document Frequency”, cited in [0100], and represented as “D” in fig. 7:702: “OBTAIN UNLABELD DOCUMENTS D”) of each sample within the training data; 
determining, for each sample and for each class (as indicated in fig. 1: a dot and an “x”), an inclusion probability (or a probability vector represented in fig. 7:712a: “-SELECT NEW BATCH OF UNLABELED INSTANCES BC [SEE FIG. 8 or 9]”: fig. 9:906: “NORMALIZE THE WEIGHT VECTOR TO A PROBABILITY VECTOR”) for the (i.e., (one of many of a class or type, as of a manufactured item, as opposed to an individual one)”) sample (being one or many of a class) within the class (as particular one as indicated in fig. 1: a dot or an “x”); 
selecting each sample (said via fig. 7:712a: “-SELECT NEW BATCH OF UNLABELED INSTANCES BC [SEE FIG. 8 or 9]”) that has an inclusion probability (or said probability vector) for a class that exceeds an inclusion probability threshold (via fig. 7:716: “STOPPING CRITERIA MET? [SEE FIG. 10 OR 11]”: “YES”) and assigning a label (via fig. 7:712b: “-OBTAIN LABELS FOR UNLABELLED INSTANCES BC”) to that sample according to the class; 

identifying (via indices via fig. 9:912: “OBTAIN SORTED INDICES I OF REMAINING UNLABELED SET OF AVAILABLE DOCUMENTS D HAVING COSINE ANGLE                         
                            ≥
                        
                     t WITH RESPECT TO THE CHOSEN DOCUMENT (I[[1])” represented in fig. 7:712:712a “-SELECT NEW BATCH OF UNLABELED INSTANCES BC [SEE FIG. 9 OR 9]”) instances of each sample within the unlabeled observations (resulting in said via fig. 7:712b: “-OBTAIN LABELS FOR UNLABELLED INSTANCES BC”); and
B.	generating labeled data from each identified instance (ultimately resulting in fig. 7:712b: “-OBTAIN LABELS FOR UNLABELLED INSTANCES BC”) by forming (via fig. 7:714: “CONSTRUCT UPDATED HYPERPLANE h(x)”) an observation (via learning via fig. 7:712: “PERFORM ITERATION OF ACTIVE LEARNING USING SVM”) comprising the identified (via said index) instance and neighbouring (via “margin”, as shown in fig. 1:area between dashed lines, comprising “neighbouring” represented in fig. 9:902: “OBTAIN CURRENT VERSION HYPERPLANE hc(x), D,k AND t”) data (that is also 
indexed and as shown in fig. 1:indexed dots and Xs) that is located next (via “sorted…order based on…distance”, represented in said fig. 9:912, to the center line in fig. 1) to the identified (said via index I [1]) instance (as shown in fig. 1:indexed dots and Xs) within the unlabeled data (to be labeled via fig. 6:614: an expert) and assigning (via classification as shown in fig. 1) the label corresponding to the sample (being one or many of a class via fig. 7:708: “OBTAIN LABELS FOR FIRST BATCH OF k DOCUMENTS, i.e., TRAINING DATA”) for that instance to the newly formed observation (creating a circled update of fig. 1 via fig. 7: 714: “CONSTRUCT UPDATED HYPERPLANE h(x)” via Johnson:


“[0062] In this work, the soft-margin SVM is used for the classification task.  SVM is a supervised classification technique which uses a set of labeled data instances and learns a maximum-margin separating hyperplane h(X)=w.sup.Tx+b=0 by solving a quadratic optimization problem.  w controls the orientation of the hyperplane, T represents a matrix transpose operation, and b is the bias which fixes the offset of the hyperplane in d dimensional space.  Separating hyperplanes of SVM are linear, but by using non-linear kernels, SVM can also learn a non-linear hyperplane, if necessary.  Hyperplane h(x) splits the original d-dimensional space into two half-spaces such that if a test instance x.sub.i falls on the positive side of the hyperplane (i.e., h(x.sub.i).gtoreq.0), x.sub.i is predicted as +1, and otherwise, it is predicted as -1.  In FIG. 1, there is a graph illustrating a separating hyperplane 102 obtained using a linear SVM where the solid line represents the boundary and the dashed lines represent the margin.”

wherein “margin” is defined:
BRITISH DICTIONARY DEFINITIONS FOR MARGIN
margin
archaic margent
noun
1	an edge or rim, and the area immediately adjacent to it; border

wherein “adjacent” is defined:
BRITISH DICTIONARY DEFINITIONS FOR ADJACENT
adjacent
adjective
1	being near or close, esp having a common boundary; adjoining; contiguous

wherein “adjoining” is defined:
BRITISH DICTIONARY DEFINITIONS FOR ADJOINING
adjoining
adjective
1	being in contact; connected or neighbouring; 

and












“[0068] In TABLE #2 (Technique 2), both of the batch selection methods (namely 
DS and BPS) are introduced and outlined (note: these batch selection methods relate to Technique 1's Line 3).  In addition, to the current hyperplane h.sub.c and available dataset D, both of these batch selection methods also have a user-defined parameter t .di-elect cons.  [0, 1], which denotes a cosine similarity threshold.  Lines 2-7 describe the DS method and Lines 9-17 describe the BPS method.  For DS, in Line 2, the documents are first sorted in increasing order based on their absolute distance from the prevailing hyperplane h.sub.c to get the sorted indices of the available documents, D in I. In Line 5, the nearest one document is chosen deterministically and inserted it into the current batch set, B.sub.c.  Then the indices of the documents are obtained that have cosine angle .gtoreq.t with the currently selected document, I[1] (including I[1]).  All of the obtained indices are then removed from the I and Lines 5, 6, and 7 are repeated until B.sub.c=k. For the probabilistic sampler (BPS), distance from the hyperplane is calculated over the unlabeled documents.  For some documents, the distance value can be 0 (falling over the hyperplane).  In those cases, a minimum absolute distance is set as the distance of those documents from the hyperplane.  This is done as in Line 9, 
where there is an inverse operation.  In Line 10, the weight vector is normalized to convert it into probability vector.  In Line 13, one document, c, is chose using the weight, w, calculated in Line 10.  Then, the same operations are performed as in Lines 5, 6 and 7.  Finally, the weight, w is re-normalized as some of the documents have been removed from index list, I in Line 16.”).

Thus, one of ordinary skill in the art of text classifiers, as taught by both references, can modify Wu’s said teaching of “scored” “confidence” for each of said “time step” represented in fig. 2 as subscript “t” or “t+1” or “#1” thru  “#K” with Johnson’s teaching of fig. 10:1036: “OBTAIN KAPPA VALUE L” by:
a)	installing Johnson’s fig. 7 regarding a decision boundary classifier, similar to Wu’s discrimination frontier boundary of fig. 1, into Wu’s fig. 2: “Classifier C1”; and
b)	recognizing that the modification is predictable or looked forward to because the modification results in “efficient” “text classification” via Johnson, [0097]. 




Regarding claim 14, claim 14 is rejected the same as claim 5. Thus, argument presented in claim 5 is equally applicable to claim 14. Accordingly, Wu as combined with Johnson teaches the method of claim 11 wherein, for at least one action, generating labeled data the first set of observations comprises: 
dividing each of the first set of observations into samples; 
determining the frequency of each sample; 
determining, for each sample and for each class, an inclusion probability for the sample within the class; 
selecting each sample that has an inclusion probability for a class that exceeds an inclusion probability threshold and assigning a label to that sample according to the class; 
identifying (via said indices) instances of each sample within the unlabeled observations; and
generating labeled data from each identified (via said indices) instance by forming an observation comprising the identified instance and neighbouring (indexed) data that is located within a range (or within said “margin” as shown in fig. 1 by the area between dashed lines) of the identified instance within the unlabeled data and assigning the (expert) label corresponding to the sample for that instance to the newly formed observation (or new or updated SVM hyperplane).  



Regarding claim 15, Wu as combined teaches the method of claim 14 wherein training the system comprises assigning (via classification as modified via the combination) a value to (via the arrows in Wu’s fig. 2) the at least one action and, determining (Markush limitation follows: A, B or C), based on (via the loop in fig. 2) the value, one or more of whether to 
(A) perform this action (via said decision action), 
(B) a level (via said “high-quality unlabeled examples” or said kappa value) for the inclusion probability that improves classification performance, or 
(C) a level (or an extent via said margin being updated) for the range (via said margin being updated) over which neighbouring data is selected (or recognized) that .











Claims 6,7 and 16-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al. (Reinforced Co-Training) in view of Mariano et al. (A New Distributed Reinforcement Learning Algorithm for Multiple Objective Optimization Problems).
Regarding claim 6, Wu teaches the method of claim 1 wherein the classifier is one of a plurality of classifiers of an overall classification system and the method comprises, for each classifier, applying a corresponding agent trained by the reinforcement learning system to generate labeled data from unlabeled observations and train the classifier using the training data and the labeled data according to the policy, wherein the policy is shared across the plurality of agents.
Thus, Wu does not teach, as indicated in bold above, the claimed:
A.	applying a corresponding agent trained; and
B.	the policy is shared across the plurality of agents.
Accordingly, Mariano teaches:
A.	applying a corresponding agent (via “m agents”, p. 292, Table 1, algorithm line 3) trained (via “training information”, p. 291, 2 Reinforcement Learning, 1st para, last S); and
B.	the policy is shared (via “a common policy”, Abstract, 3rd S) across (i.e., from one side to the other of comprised by “the next cycle”, represented in said Table 1, algorithm lines 2,3,5: “Repeat”, wherein each cycle goes from one side or phase stage to the next, p. 292, 1st full para, penultimate S) the plurality (or “family”, Abstract, 3rd S, within each cycle) of agents (that found the common policy).
	Thus, one of ordinary skill in the art of Q-learning and multi-agents, as taught by Wu: “multiagent communication”, can modify Wu’s classifiers in fig. 2 with Mariano’s teaching of said “m agents” by:
a)	making Wu’s fig. 2: “Q-Agent” be as Mariano’s “Table 1. General algorithm for DQL.”, p. 292; and
b)	recognizing that the modification is predictable or looked forward to because Wu teaches that the multiple agent is used “to boost the classification performance” (Wu, section 5 Conclusion and Future Work, last para) and Mariano teaches that DQL obtains an “optimal policy” (Mariano, p. 291, section 2 Reinforcement Learning, 1st para, 1st S).
Regarding claim 7, Wu as combined teaches the method of claim 6 wherein applying a corresponding agent for each classifier comprises applying each agent sequentially (via said “Q-Agent” as modified with the DQL algorithm of Table 1), with each agent (of the family) performing a series (via said DQL algorithm) of actions to iteratively (via said DQL algorithm that has said “Repeat” in lines 2,3 and 5) train its corresponding classifier.
Regarding claim 16, claim 16 is rejected the same as claim 6. Thus, argument presented in claim 6 is equally applicable to claim 16. Accordingly, Wu as combined teaches the method of claim 10 wherein the classifier is one of a plurality of classifiers of an overall classification system and the method comprises, for each classifier, applying a corresponding agent to train the corresponding classifier using the policy, the policy being shared across the plurality of agents.
Regarding claim 17, Wu as combined teaches the method of claim 16 wherein the method comprises sequentially (via said “Q-Agent” as modified with the DQL algorithm) training each classifier, with each agent (of the family) performing a series (via said “Q-Agent” as modified with the DQL algorithm) of actions to iteratively train its corresponding classifier with the shared (or common) policy being updated (via the equation Q(s,a) in said DQL algorithm: last two lines: an “updating equation”, Mariano, 2 Reinforcement Learning, 2nd para, 2nd S) during the training of each classifier (each of which is being updated) so that the highest value actions may be selected (via said DQL algorithm, line 6: “Take action a”) to form a trained (or learned via said training information) policy (that is optimal).
Regarding claim 18, Wu as combined teaches the method of claim 17 wherein the method comprises iteratively repeating (via the loop in fig. 2) the sequential training of each classifier, based on the updated policy to iteratively improve (via said update) the policy (contingent limitation follows) until an end condition is reached (the Wu as combined need not present this limitation).








Suggestions
Applicant’s disclosure states regarding being efficient:
“[0122] The method then determines the inclusion probability for each word in the labeled data set relative to each class 42. This is achieved by determining the overall frequency of each word in the labeled data set. The inclusion probability for each word relative to each class can then be determined based on the frequency. The inclusion probability for a given word and class represents the probability that the word will occur within any of the instances of that class. More specifically, the inclusion probability is the sum of the sample probabilities for all samples within the class that contain that word. The sample probability for a sample is the probability that the sample will contain that word. This inclusion probability applies to both the positive and negative classes of the train set.”

	Johnson (US 2017/0116544) as applied in claim 5 teaches in [0074][0118][0120] that:
a)	a probability that a label is “+1”; or 
b)	a probability that represents how a human would label a document; or
c)	the probability of being selected. 
This appears as a difference to the disclosed “probability that the word will occur”.
Thus, applicant’s solution to efficiency regarding the chance a word will appear in data records (i.e., instances) of a single class is an indication of non-obviousness in view of the art in the rejection of claim 5.







Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Foerster et al. (Learning to Communicate with Deep Multi-Agent Reinforcement Learning) is pertinent as teaching regarding claim 6 of “a common policy”, 5.1 Reinforced Inter-Agent Learning: Parameter Sharing, 5th S, and fig. 1: “Agent 1”: “Agent 2”.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENNIS ROSARIO whose telephone number is (571)272-7397. The examiner can normally be reached Monday-Friday, 9AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571)272-7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/DENNIS ROSARIO/Examiner, Art Unit 2667   

/MATTHEW C BELLA/Supervisory Patent Examiner, Art Unit 2667