DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 1/21/2021 have been fully considered but they are not persuasive. 
35 U.S.C. §101 Rejection
Applicant argues: The claims as previously presented claim language recites "updating a dictionary used by a classifier." The dictionary is updated in a particular way, as recited in the claims. As noted in the prior response, by such updating, the claim language thus realizes a technical improvement, namely improving machine learning that uses a classifier. For instance, "learning is performed on a classifier used for causing a computer to recognize a pattern of a content" (para. [0002]). Machine learning is a technology, and classification is one type of such machine learning. By improving classification, the claim language therefore improves a technology. 
The final office action notes that such technological improvement is not recited in the claims, and as such as the claims are recited they do not include elements sufficient to amount to significantly more than a judicial exception (p. 3). Specifically, the final action states that updating a dictionary is simply storage of information. Applicant disagrees as to this latter point, since updating the dictionary is the processing of data - specifically data that represents a dictionary. 
However, Applicant acknowledges the first point made by the Examiner. Applicant has therefore accordingly amended the claims to explicitly recite that the classifier classifies data using the updated dictionary, such that the updated dictionary improves data classification by the processor. In this respect, the specific action of machine learning (i.e., data classification) is recited in the claim language, and the alleged abstract idea (i.e., pertaining to updating of a dictionary) is explicably tied to this performance of machine language. The technological improvement that results is also explicitly recited in the claim language. 
Therefore, by improving machine learning, specifically a classifier, via updating a dictionary in a particular way, and improving processing time for such learning, the claim language thus concretely provides improvement to an underlying technology, and thus is directed to patent eligible subject matter under 35 USC 101. 

Examiner Response: Examiner respectfully disagrees. "[U]pdating a dictionary used by a classifier" appears to be directed towards a mental process. Instead of using an old dictionary value, a human can use the new updated dictionary value for the classifier. The dictionary appears to be a parameter of the classifier according to paragraph [0037], [0042], and [0047] of Applicant’s specification and the classifier appears to be mathematical functions according to paragraphs [0065] and [0066] of the Applicant’s specifications. Therefore, "updating a dictionary used by a classifier" is understood as a recitation of a combination of mental processes and mathematical concepts. In addition, “the updated dictionary improving data classification by the classifier” is understood as merely describing the intended result of using the updated dictionary.  
35 U.S.C. §103 Rejection
Applicant argues: Applicant has amended the claim language to recite features not taught by the applied art in combination. Specifically, the claims have been amended to recite, "update the dictionary by using the samples with labeling added with the new sample with labeling when the loss calculated by using the dictionary before updating is less than the loss calculated by using the updated dictionary by using the samples with labeling added with the new sample with labeling" and "terminate labeling work of a correct-answer class when the loss calculated by using the updated dictionary decreases in inverse proportion to a number of the samples with labeling." Support for this claim language is found in the patent application at least in paragraphs [0050] and [0057].
Examiner Response: Examiner respectfully disagrees. Examiner interprets “loss” as a degree of error according to paragraph [0042] Applicant’s specification. Laws teaches an f-score which represents an accuracy (i.e. degree of error). Accordingly, Laws teaches update the dictionary by using the samples with labeling added with the new sample with labeling (pg. 467, col. 1; We then label these tokens and add them to the labeled training set. The classifiers are retrained with the new training set and the AL loop repeats.) when the loss calculated by using the dictionary before updating is less than the loss calculated by using the updated dictionary by using the samples with labeling added with the new sample with labeling (pg. 467; Furthermore we find that after the baseline performance is reached the increase in performance quickly levels off to a point where using more training data does not yield performance improvements anymore. In fact, our experiments show that there is a peak in performance reached at about 12% of the training data and performance decreases again after this point (see Figure 1). The peak is more prominent if the pool is large. On a pool of 30,000 tokens, peak performance is about 2.5% F-Score better than the baseline; on a 6000 token pool, the difference is only about 1.7%. Therefore, once the peak is reached, the AL process should stop, even if the annotation budget is not yet used up. The f1 score reads on “loss”. As shown in figure 1, as the f1 score increases (ie. old f1 score is less than the new f1 score) training is continued until a peak is hit, then declines (“decreases”) after. The f1 score represents the accuracy/performance of the algorithm.);

    PNG
    media_image1.png
    460
    468
    media_image1.png
    Greyscale

terminate labeling work of a correct answer class when the loss calculated by using the updated dictionary decreases in inverse proportion to a number of the samples with labeling (pg. 467; In fact, our experiments show that there is a peak in performance reached at about 12% of the training data and performance decreases again after this point (see Figure 1)), (pg. 467; Therefore, once the peak is reached, the AL process should stop, even if the annotation budget is not yet used up.).  
Claim Objections
Claims 4-5 objected to because of the following informalities: Claims 4 and 5 are dependent on claim 2 however, claim 2 is cancelled.  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.



The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1 and 4-7 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. Claim 1 recites “a loss calculated by using the updated dictionary and a loss calculated by using the dictionary before updating with respect to all the samples with labeling before adding a new sample with labeling”. It appears the limitation should recite “compare a loss calculated by using the updated dictionary and a loss calculated by using the dictionary before updating with respect to all the samples with labeling before adding a new sample with labeling”. The word “compare” seems to be omitted from the previous claim amendments filed 7/26/2020. It is unclear what the losses are being used for. Claims 4-7 are dependent on claim 1 and are rejected for the same reasons because they do not cure the deficiencies. 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 4-7 and 9-10 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
When considering subject matter eligibility under 35 U.S.C. 101, it must be determined whether the claim is directed to one of the four statutory categories of 2019 PEG for more details of the analysis.
Step 1

According to the first part of the analysis, in the instant case, claims 1-7 are directed to an apparatus, claim 9 is directed to a method, and claim 10 is directed to a computer readable non-transitory recording medium. Thus, each of the claims falls within one of the four statutory categories (i.e. process, machine, manufacture, or composition of matter).

Step 2A, Prong 1

Following the determination of whether or not the claims fall within one of the four categories (Step 1), it must be determined if the claims recite a judicial exception (e.g. mathematical concepts, mental processes, certain methods of organizing human activity) (Step 2A, Prong 1). In this case, the claims are determined to recite a judicial exception as explained below.
Step 2A, Prong 2

Following the determination that the claims recite a judicial exception, it must be determined if the claims recite additional elements that integrate the exception into a practical application of the exception (Step 2A, Prong 2). In this case, after considering all claim elements individually and as an ordered combination, it is determined that the claims do not include additional elements that integrate the exception into a practical application of the exception as explained below.
Step 2B

Based on the determination in Step 2A of the analysis that the claims are directed to a judicial exception, it must be determined if the claims contain any element or combination of elements sufficient to ensure that the claim amounts to significantly more than the judicial exception (Step 2B). In this case, after considering all claim elements individually and as an ordered combination, it is determined that the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception for the same reasons given above in the Step 2A, Prong 2 analysis. Furthermore, each additional element identified above as being insignificant extra-solution activity is also well-known, routine, conventional as 
Claim 1 recites:
Step 2A, Prong 1
“update a dictionary used by a classifier, by using one or more samples with labeling, the one or more samples selected from samples without labeling and assigned with labeling” (This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process. A human can update a parameter using an unlabeled sample by labeling that sample.).
“calculate, by using the updated dictionary and the one or more samples, a ratio to a number of the samples with labeling as a loss with respect to all the samples with labeling” (Save for the recitation of generic computer equipment (“processor”, “memory”, “apparatus”), this step is understood to be a recitation of a mathematical concept.).
“a loss calculated by using the updated dictionary and a loss calculated by using the dictionary before updating with respect to all the samples with labeling before adding a new sample with labeling;” (This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process. A human can compare a loss calculated using old weights to a loss calculated using new weights.).
“update the dictionary by using the samples with labeling added with the new sample with labeling when the loss calculated by using the dictionary before updating is less than the loss calculated by using the updated dictionary by using the samples with labeling added with the new sample with labeling.” (This step 
	“terminate labeling work of a correct answer class when the loss calculated by using the updated dictionary decreases in inverse proportion to a number of the samples with labeling” (This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process. A human can determine not to update weights based on an inverse proportion.).
	“classify data, by the classifier, using the updated dictionary, the updated dictionary improving data classification by the classifier.” (This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process and mathematical concept).
Step 2A, Prong 2
“A learning apparatus comprising: at least one memory storing instructions; and at least one processor configured to access the at least one memory and execute the instructions to:” (the “processor” and “memory are understood to be generic computer equipment used to run computer program instructions. See MPEP 2106.05(f).).
Step 2B
“A learning apparatus comprising: at least one memory storing instructions; and at least one processor configured to access the at least one memory and execute the instructions to:” (the “processor” and “memory are 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 4 recites:
Step 2A, Prong 1
	“when an average of the loss calculated by using the updated dictionary and a predetermined number of past losses is less than an average of a predetermined number of calculated losses before the predetermined number of past losses are calculated, determine not to update the dictionary.” (This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process. A human can determine not to update weights based on comparing two average losses.)
Step 2A, Prong 2
“The learning apparatus according to claim 2, wherein, the at least one processor is further configured to execute the instructions to:” (The “processor” and “memory are understood to be generic computer equipment used to run computer program instructions. See MPEP 2106.05(f).).
Step 2B
“The learning apparatus according to claim 2, wherein, the at least one processor is further configured to execute the instructions to:” (the “processor” and “memory are understood to be generic computer equipment used to run computer program instructions. See MPEP 2106.05(f).).

Claim 5 recites:
Step 2A, Prong 1
“calculate a correlation function between a ratio of a number of the samples with labeling to a first number of samples being smaller than the number of the samples with labeling by a predetermined number, and a ratio of a loss when a number of the samples with labeling is the first number of samples to a loss with respect to all the samples with labeling, and, when the correlation function is greater than a predetermined threshold value, determine not to update the dictionary.”  (This step is understood to be a recitation of a mathematical concept.)
Step 2A, Prong 2
“The learning apparatus according to claim 2, wherein, the at least one processor is further configured to execute the instructions to:” (The “processor” and “memory are understood to be generic computer equipment used to run computer program instructions. See MPEP 2106.05(f).).
Step 2B
“The learning apparatus according to claim 2, wherein, the at least one processor is further configured to execute the instructions to:” (The “processor” and “memory are understood to be generic computer equipment used to run computer program instructions. See MPEP 2106.05(f).).
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 6 recites:
Step 2A, Prong 1
	“select a sample that is likely to be discriminated as a class not being a correct-answer class, from one or more samples not assigned with a label, as a sample being a target of labeling;” (This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process. A human can select a sample to be labeled.)
“acquire, when a label is assigned to the sample being a target of labeling , the samples with labeling including the sample being a target of labeling, and” (Save for the recitation of generic computer equipment (“unit”, “apparatus”), this step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process. A human can add the sample to be labeled in a collection of samples.)
“update the dictionary by using the samples with labeling” (This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process. A human can update a weight value based on adding the new sample.).
Step 2A, Prong 2
“The learning apparatus according to claim 1, wherein, the at least one processor is further configured to execute the instructions to:” (The “processor” and “memory are understood to be generic computer equipment used to run computer program instructions. See MPEP 2106.05(f).).
Step 2B
“The learning apparatus according to claim 1, wherein, the at least one processor is further configured to execute the instructions to:” (The “processor” and “memory are understood to be generic computer equipment used to run computer program instructions. See MPEP 2106.05(f).).
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 7 recites:
Step 2A, Prong 1
This claim does not appear to recite any judicial exceptions.
Step 2A, Prong 2
	“The learning apparatus according to claim 1, wherein, the at least one processor is further configured to execute the instructions to:” (The “processor” and “memory are understood to be generic computer equipment used to run computer program instructions. See MPEP 2106.05(f).).
“output the dictionary when the dictionary is determined not to update.” (This step is understood to be mere data gathering and presenting the weight(s) as an output. See MPEP 2106.05(g)).
Step 2B
“The learning apparatus according to claim 1, wherein, the at least one processor is further configured to execute the instructions to:” (The “processor” and “memory are understood to be generic computer equipment used to run computer program instructions. See MPEP 2106.05(f).).
“output the dictionary when the dictionary is determined not to update.” (This step is understood to be mere data gathering and presenting the weight(s) as an output. See MPEP 2106.05(g)).
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 9 recites:
Step 2A, Prong 1
“updating a dictionary used by a classifier by using one or more samples with labeling, the one or more samples selected from samples without labeling and assigned with labeling;” (This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process. A human can update a parameter using an unlabeled sample by labeling that sample.).
“calculating, by using the updated dictionary and the one or more samples, a ratio to a number of the samples with labeling as a loss with respect to all the samples with labeling” (This step is understood to be a recitation of a mathematical concept.
“comparing a loss calculated by using the updated dictionary and a loss calculated by using the dictionary before updating with respect to all the samples with labeling before adding the new sample with labeling;” (This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process. A human can compare a loss calculated using old weights to a loss calculated using new weights.).
“updating the dictionary by using the samples with labeling added with the new sample with labeling when the loss calculated by using the dictionary before updating is less than the First named inventor: Atsushi SatoPage 5 Serial no. 15/536,783 Filed 06/16/2017 loss calculated by using the updated dictionary by using the samples with labeling added with the new sample with labeling.” (This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process. A human can determine to update weights by comparing a loss calculated using old weights to a loss calculated using new weights.)
“terminating labeling work of a correct answer class with the loss calculated by using the updated dictionary decreases in inverse proportion to a number of the samples with labeling” (This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process. A human can determine not to update weights based on an inverse proportion.).
“classifying data, by the classifier, using the updated dictionary, the updated dictionary improving data classification by the classifier.” (This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process and mathematical concept).

Step 2A, Prong 2
This claim does not appear to recite any additional elements.
Step 2B
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 10 recites:
Step 2A, Prong 1
“processing of calculating, by using the updated dictionary and one or more samples with labeling being samples assigned with labels, a ratio to a number of the samples with labeling as a loss with respect to all the samples with labeling;” (Save for the recitation of generic computer equipment (“computer readable non-transitory recording medium”), this step is understood to be a recitation of a mathematical concept.)
“processing of comparing a loss calculated by using the updated dictionary and a loss calculated by using the dictionary before updating with respect to all the samples with labeling before adding the new sample with labeling; and” (Save for the recitation of generic computer equipment (“computer readable non-transitory recording medium”), this step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process. A human can determine to update weights based on a loss value.)
“processing of updating the dictionary by using the samples with labeling added with the new sample with labeling when the loss calculated by using the dictionary before updating is less than the loss calculated by using the updated dictionary by using the samples with labeling added with the new sample with labeling.” (Save for the recitation of generic computer equipment (“computer readable non-transitory recording medium”), this step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process. A human can determine to update weights by comparing a loss calculated using old weights to a loss calculated using new weights.)
Step 2A, Prong 2
	“A computer-readable non-transitory recording medium storing a program causing a computer to perform:” (The compute-readable medium is understood to be generic computer equipment. See MPEP 2106.05(f).))
Step 2B
	“A computer-readable non-transitory recording medium storing a program causing a computer to perform:” (The compute-readable medium is understood to be generic computer equipment. See MPEP 2106.05(f).))
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4, 6, 7, 9, and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Xiao et al. (US-20060112028-A1) in view of Wu (US-20080255844-Laws et al. (“Stopping Criteria for Active Learning of Named Entity Recognition”).
Regarding Claim 1,
Xiao teaches a learning apparatus comprising: 
at least one memory storing instructions (para [0003] Von Neumann type computers include a memory and a processor.); 
and at least one processor configured to access the at least one memory and execute the instructions (para [0003] In operation, instructions and data are read from the memory and executed by the processor.) to: 
update a dictionary used by a classifier (para [0123] In step 722 the average of the derivatives of the objective function that are computed in step block 720 are processed with an optimization algorithm in order to calculate new values of the weights. Examiner note: Examiner interprets a dictionary as neural network weights) used by a classifier (para [0150] As mentioned above in classification problems it is appropriate to apply the sigmoid function at the output nodes. (Alternatively, other threshold functions are used in lieu of the sigmoid function.) Aside from the special case in which what is desired is a yes or no answer as to whether a particular input belongs to a particular class); 
a loss calculated by using the updated dictionary and a loss calculated by using the dictionary before updating with respect to all the samples with labeling before adding a new sample with labeling (Fig. 7; para [0130] The stopping condition preferably requires that the difference between the value of the objective function evaluated with the new weights and the value of the objective function calculated with the old weights is less than a predetermined small number. And [0133] [0133] OBJ.sup.NEW and OBJ.sup.OLD are the values of the objective function e.g., Equation Five for the current and preceding values of the weights. and [0135]); 
classify data, by the classifier, using the updated dictionary, the updated dictionary improving data classification by the classifier (para [0173] in which, .lamda. is a user chosen parameter that determines the relative priority of the sub-objective of minimizing the differences between actual and expected values, and the sub-objective of minimizing the number of weights of significant value. Lambda is preferably chosen in the range of 0.01 to 0.1, and is more preferably approximately equal to 0.05. Too high a value of lambda can lead to reduction of the complexity of the neural network at the expense of its prediction or classification performance, whereas too low of a value can lead to a network that is excessively complex and in some cases prone to over training.).
Xiao does not explicitly disclose
update a dictionary used by a classifier, by using one or more samples with labeling, the one or more samples selected from samples without labeling and assigned with labeling; 
calculate, by using the updated dictionary and the one or more samples, a ratio to a number of the samples with labeling as a loss with respect to all the samples with labeling; 
update the dictionary by using the samples with labeling added with the new sample with labeling when the loss calculated by using the dictionary before updating is 
terminate labeling work of a correct answer class when the loss calculated by using the updated dictionary decreases in inverse proportion to a number of the samples with labeling, and 
However, Wu teaches
calculate, by using the updated dictionary and one or more samples with labeling being samples assigned with labels (para [0026] It should be noted, in the normal scenario, the L is averaged over the entire training corpus, where N is the number of training samples.), a ratio to a number of the samples with labeling as a loss with respect to all the samples with labeling (para [0023] The overall criteria to minimize empirical error rate can then be defined over the entire training corpus as the objective function, 
L = 1 N i = 1 N l ( X i , W c , W r ) . ##EQU00001## 
where the loss function is, 
l ( X i ; W c , W r ) = 1 1 + exp ( - S ( X i , W c ) + S ( X i , W r ) ) ##EQU00002## and ##EQU00002.2## S ( X , W ) = p ( W ) p ( X W ) ##EQU00002.3## 
and where W.sub.c is the correct word transcription based on user correction activity and/or pre-labeled training corpus); 
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine Xiao et al.’s method of training a neural network with Wu’s method of training a neural network.
Abs. Architecture for minimizing an empirical error rate by discriminative adaptation of a statistical language model in a dictation and/or dialog application.).
Law teaches
update a dictionary used by a classifier, by using one or more samples with labeling, the one or more samples selected from samples without labeling and assigned with labeling (pg. 466, section 2; For a given measure Mi,X, we select in each iteration the unlabeled example(s) in the pool that have the smallest value for Mi,X (corresponding to the maximum uncertainty). And pg. 466, section 2.1; We start with a seed set of ten consecutive tokens randomly selected from the training pool and label it.); 
update the dictionary by using the samples with labeling added with the new sample with labeling (pg. 467, col. 1; We then label these tokens and add them to the labeled training set. The classifiers are retrained with the new training set and the AL loop repeats.) when the loss calculated by using the dictionary before updating is less than the loss calculated by using the updated dictionary by using the samples with labeling added with the new sample with labeling (pg. 467; Furthermore we find that after the baseline performance is reached the increase in performance quickly levels off to a point where using more training data does not yield performance improvements anymore. In fact, our experiments show that there is a peak in performance reached at about 12% of the training data and performance decreases again after this point (see Figure 1). The peak is more prominent if the pool is large. On a pool of 30,000 tokens, peak performance is about 2.5% F-Score better than the baseline; on a 6000 token pool, the difference is only about 1.7%. Therefore, once the peak is reached, the AL process should stop, even if the annotation budget is not yet used up. The f1 score reads on “loss”. As shown in figure 1, as the f1 score increases (ie. old f1 score is less than the new f1 score) training is continued until a peak is hit, then declines (“decreases”) after. The f1 score represents the accuracy/performance of the algorithm.);

    PNG
    media_image1.png
    460
    468
    media_image1.png
    Greyscale

terminate labeling work of a correct answer class when the loss calculated by using the updated dictionary decreases in inverse proportion to a number of the samples with labeling (pg. 467; In fact, our experiments show that there is a peak in performance reached at about 12% of the training data and performance decreases again after this point (see Figure 1)), (pg. 467; Therefore, once the peak is reached, the AL process should stop, even if the annotation budget is not yet used up.), and 

Doing so would reduce computational costs (pg. 472; This might lead to an approach to reduce the computational cost of AL).
Regarding Claim 4,
Xiao et al., Wu, and Laws et al. teach the learning apparatus according to claim 2. 
Laws et al. (Stopping Criteria for Active Learning of Named Entity Recognition) teaches
wherein, the at least one processor is further configured to execute instructions to: when an average of the loss calculated by using the updated dictionary and a predetermined number of past losses is less than an average of a predetermined number of calculated losses before the predetermined number of past losses are calculated (pg. 471; We achieve this with a moving median approach. At each step, we compute the median of w2 = {an−k, . . . , an} (the last n values) and of w1 = {an−k−1, . . . , an−1} (the previous last n values). Each value ai is the performance at iteration i (for the performance gradient) or the uncertainty of the instance selected in iteration i (for the uncertainty gradient). We then estimate the gradient using the medians of the two windows: g = (median(w2) − median(w1))/1 (4) For the performance estimate, which is less noisy, we can also use the arithmetic mean instead of the median. In this case, we simply replace “median” with “mean” in Equation 4.), determine not to update the dictionary (pg. 471; We stop the AL process when (i) the current certainty or estimated performance is a new maximum and (ii) the newly calculated gradient g is positive and (iii) g falls below a predefined level .).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the method of stopping training of Xiao et al. with the method of determining when to stop training of Laws et al.
Doing so would reduce computational costs (pg. 472; This might lead to an approach to reduce the computational cost of AL).
Regarding Claim 6,
Xiao et al., Laws et al. and Wu teach the learning apparatus according to claim 1.
	Laws et al. further teaches 
select a sample that is likely to be discriminated as a class not being a correct-answer class (pg. 468, col. 2; For positive decisions, the class probability very often is close to 1, for negative decisions, it is close to 0.), from one or more samples not assigned with a label, as a sample being a target of labeling (pg. 466; We use AL based on uncertainty sampling. We start with a seed set of ten consecutive tokens randomly selected from the training pool and label it.); and 
acquire, when a label is assigned to the sample being a target of labeling selected the samples with labeling including the sample being a target of labeling (pg. 467; We then label these tokens and add them to the labeled training set.), 
updates the dictionary by using the samples with labeling (pg. 467; The classifiers are retrained with the new training set and the AL loop repeats.).

Doing so would reduce computational costs (pg. 472; This might lead to an approach to reduce the computational cost of AL).
Regarding claim 7,
Xiao et al. Laws, and Wu teach the learning apparatus according to claim 1. Xiao et al. further teaches wherein, the at least one processor is further configured to execute instructions to: output the dictionary when the dictionary is determined not to update (para [0135] After process 700 has finished or after process 800 (described below) has been completed if the latter is used, the final values of the weights are used to construct a neural network.).
Regarding Claim 9,
Xiao teaches a learning method comprising: First named inventor: Atsushi SatoPage 4 Serial no. 15/536,783 Filed 06/16/2017 
updating a dictionary used by a classifier (para [0123] In step 722 the average of the derivatives of the objective function that are computed in step block 720 are processed with an optimization algorithm in order to calculate new values of the weights. Examiner note: Examiner interprets a dictionary as neural network weights) used by a classifier (para [0150] As mentioned above in classification problems it is appropriate to apply the sigmoid function at the output nodes. (Alternatively, other threshold functions are used in lieu of the sigmoid function.) Aside from the special case in which what is desired is a yes or no answer as to whether a particular input belongs to a particular class); 
Fig. 7; para [0130] The stopping condition preferably requires that the difference between the value of the objective function evaluated with the new weights and the value of the objective function calculated with the old weights is less than a predetermined small number. And [0133] [0133] OBJ.sup.NEW and OBJ.sup.OLD are the values of the objective function e.g., Equation Five for the current and preceding values of the weights. and [0135]); 
classifying data, by the classifier, using the updated dictionary, the updated dictionary improving data classification by the classifier (para [0173] in which, .lamda. is a user chosen parameter that determines the relative priority of the sub-objective of minimizing the differences between actual and expected values, and the sub-objective of minimizing the number of weights of significant value. Lambda is preferably chosen in the range of 0.01 to 0.1, and is more preferably approximately equal to 0.05. Too high a value of lambda can lead to reduction of the complexity of the neural network at the expense of its prediction or classification performance, whereas too low of a value can lead to a network that is excessively complex and in some cases prone to over training.).
Xiao does not explicitly disclose
updating a dictionary used by a classifier by using one or more samples with labeling, the one or more samples selected from samples without labeling and assigned with labeling; 
calculating, by using the updated dictionary and the one or more samples, a ratio to a number of the samples with labeling as a loss with respect to all the samples with labeling; 
updating the dictionary by using the samples with labeling added with the new sample with labeling when the loss calculated by using the dictionary before updating is less than the loss calculated by using the updated dictionary by using the samples with labeling added with the new sample with labeling; 
terminating labeling work of a correct answer class with the loss calculated by using the updated dictionary decreases in inverse proportion to a number of the samples with labeling; and 
However, Wu teaches
calculating, by using the updated dictionary and the one or more samples (para [0026] It should be noted, in the normal scenario, the L is averaged over the entire training corpus, where N is the number of training samples.), a ratio to a number of the samples with labeling as a loss with respect to all the samples with labeling (para [0023] The overall criteria to minimize empirical error rate can then be defined over the entire training corpus as the objective function, 
L = 1 N i = 1 N l ( X i , W c , W r ) . ##EQU00001## 
where the loss function is, 
l ( X i ; W c , W r ) = 1 1 + exp ( - S ( X i , W c ) + S ( X i , W r ) ) ##EQU00002## and ##EQU00002.2## S ( X , W ) = p ( W ) p ( X W ) ##EQU00002.3## 
and where W.sub.c is the correct word transcription based on user correction activity and/or pre-labeled training corpus); 
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine Xiao et al.’s method of training a neural network with Wu’s method of training a neural network.
Doing so would allow for reducing empirical error (Abs. Architecture for minimizing an empirical error rate by discriminative adaptation of a statistical language model in a dictation and/or dialog application.).
	Law teaches 
updating a dictionary used by a classifier by using one or more samples with labeling, the one or more samples selected from samples without labeling and assigned with labeling (pg. 466, section 2; For a given measure Mi,X, we select in each iteration the unlabeled example(s) in the pool that have the smallest value for Mi,X (corresponding to the maximum uncertainty). And pg. 466, section 2.1; We start with a seed set of ten consecutive tokens randomly selected from the training pool and label it.); 
updating the dictionary by using the samples with labeling added with the new sample with labeling (pg. 467, col. 1; We then label these tokens and add them to the labeled training set. The classifiers are retrained with the new training set and the AL loop repeats.) when the loss calculated by using the dictionary before updating is less than the loss calculated by using the updated dictionary by using the samples with labeling added with the new sample with labeling (pg. 467; Furthermore we find that after the baseline performance is reached the increase in performance quickly levels off to a point where using more training data does not yield performance improvements anymore. In fact, our experiments show that there is a peak in performance reached at about 12% of the training data and performance decreases again after this point (see Figure 1). The peak is more prominent if the pool is large. On a pool of 30,000 tokens, peak performance is about 2.5% F-Score better than the baseline; on a 6000 token pool, the difference is only about 1.7%. Therefore, once the peak is reached, the AL process should stop, even if the annotation budget is not yet used up. The f1 score reads on “loss”. As shown in figure 1, as the f1 score increases (ie. old f1 score is less than the new f1 score) training is continued until a peak is hit, then declines (“decreases”) after. The f1 score represents the accuracy/performance of the algorithm.); 
terminating labeling work of a correct answer class with the loss calculated by using the updated dictionary decreases in inverse proportion to a number of the samples with labeling (pg. 467; In fact, our experiments show that there is a peak in performance reached at about 12% of the training data and performance decreases again after this point (see Figure 1)), (pg. 467; Therefore, once the peak is reached, the AL process should stop, even if the annotation budget is not yet used up.); and 
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the method of stopping training of Xiao et al. with the method of determining when to stop training of Laws et al.
Doing so would reduce computational costs (pg. 472; This might lead to an approach to reduce the computational cost of AL).
Regarding Claim 10,
Xiao teaches a computer-readable non-transitory recording medium storing a program causing a computer to (para [0190] The processes depicted in FIGS. 7,13 are preferably embodied in the form of one or more programs that can be stored on a computer-readable medium which can be used to load the programs into a computer for execution.) perform: 
processing of updating a dictionary used by a classifier (para [0123] In step 722 the average of the derivatives of the objective function that are computed in step block 720 are processed with an optimization algorithm in order to calculate new values of the weights. Examiner note: Examiner interprets a dictionary as neural network weights) used by a classifier (para [0150] As mentioned above in classification problems it is appropriate to apply the sigmoid function at the output nodes. (Alternatively, other threshold functions are used in lieu of the sigmoid function.) Aside from the special case in which what is desired is a yes or no answer as to whether a particular input belongs to a particular class); 
processing of comparing a loss calculated by using the updated dictionary and a loss calculated by using the dictionary before updating with respect to all the samples with labeling before adding the new sample with labeling (Fig. 7; para [0130] The stopping condition preferably requires that the difference between the value of the objective function evaluated with the new weights and the value of the objective function calculated with the old weights is less than a predetermined small number. And [0133] [0133] OBJ.sup.NEW and OBJ.sup.OLD are the values of the objective function e.g., Equation Five for the current and preceding values of the weights. and [0135]); 
processing of classifying data, by the classifier, using the updated dictionary, the updated dictionary improving data classification by the classifier (para [0173] in which, .lamda. is a user chosen parameter that determines the relative priority of the sub-objective of minimizing the differences between actual and expected values, and the sub-objective of minimizing the number of weights of significant value. Lambda is preferably chosen in the range of 0.01 to 0.1, and is more preferably approximately equal to 0.05. Too high a value of lambda can lead to reduction of the complexity of the neural network at the expense of its prediction or classification performance, whereas too low of a value can lead to a network that is excessively complex and in some cases prone to over training.).
Xiao does not explicitly disclose
processing of updating a dictionary used by a classifier by using one or more samples with labeling, the one or more samples selected from samples without labeling and assigned with labeling; 
processing of calculating, by using the updated dictionary and one or more samples with labeling, a ratio to a number of the samples with labeling as a loss with respect to all the samples with labeling; First named inventor: Atsushi SatoPage 5 Serial no. 15/536,783 Filed 06/16/2017 
processing of updating the dictionary by using the samples with labeling added with the new sample with labeling when the loss calculated by using the dictionary before updating is less than the loss calculated by using the updated dictionary by using the samples with labeling added with the new sample with labeling;
processing of terminating labeling work of a correct answer class when the loss calculated by using the updated dictionary decreases in inverse proportion to a number of the samples with labeling; and 
However, Wu teaches
processing of calculating, by using the updated dictionary and one or more samples with labeling (para [0026] It should be noted, in the normal scenario, the L is averaged over the entire training corpus, where N is the number of training samples.), a ratio to a number of the samples with labeling as a loss with respect to all the samples with labeling (para [0023] The overall criteria to minimize empirical error rate can then be defined over the entire training corpus as the objective function, 
L = 1 N i = 1 N l ( X i , W c , W r ) . ##EQU00001## 
where the loss function is, 
l ( X i ; W c , W r ) = 1 1 + exp ( - S ( X i , W c ) + S ( X i , W r ) ) ##EQU00002## and ##EQU00002.2## S ( X , W ) = p ( W ) p ( X W ) ##EQU00002.3## 
and where W.sub.c is the correct word transcription based on user correction activity and/or pre-labeled training corpus); First named inventor: Atsushi SatoPage 5 Serial no. 15/536,783 Filed 06/16/2017 
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine Xiao et al.’s method of training a neural network with Wu’s method of training a neural network.
Abs. Architecture for minimizing an empirical error rate by discriminative adaptation of a statistical language model in a dictation and/or dialog application.).
Law teaches
processing of updating a dictionary used by a classifier by using one or more samples with labeling, the one or more samples selected from samples without labeling and assigned with labeling (pg. 466, section 2; For a given measure Mi,X, we select in each iteration the unlabeled example(s) in the pool that have the smallest value for Mi,X (corresponding to the maximum uncertainty). And pg. 466, section 2.1; We start with a seed set of ten consecutive tokens randomly selected from the training pool and label it.); 
processing of updating the dictionary by using the samples with labeling added with the new sample with labeling (pg. 467, col. 1; We then label these tokens and add them to the labeled training set. The classifiers are retrained with the new training set and the AL loop repeats.)  when the loss calculated by using the dictionary before updating is less than the loss calculated by using the updated dictionary by using the samples with labeling added with the new sample with labeling (pg. 467; Furthermore we find that after the baseline performance is reached the increase in performance quickly levels off to a point where using more training data does not yield performance improvements anymore. In fact, our experiments show that there is a peak in performance reached at about 12% of the training data and performance decreases again after this point (see Figure 1). The peak is more prominent if the pool is large. On a pool of 30,000 tokens, peak performance is about 2.5% F-Score better than the baseline; on a 6000 token pool, the difference is only about 1.7%. Therefore, once the peak is reached, the AL process should stop, even if the annotation budget is not yet used up. The f1 score reads on “loss”. As shown in figure 1, as the f1 score increases (ie. old f1 score is less than the new f1 score) training is continued until a peak is hit, then declines (“decreases”) after. The f1 score represents the accuracy/performance of the algorithm.);
processing of terminating labeling work of a correct answer class when the loss calculated by using the updated dictionary decreases in inverse proportion to a number of the samples with labeling (pg. 467; In fact, our experiments show that there is a peak in performance reached at about 12% of the training data and performance decreases again after this point (see Figure 1)), (pg. 467; Therefore, once the peak is reached, the AL process should stop, even if the annotation budget is not yet used up.); and 
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the method of stopping training of Xiao et al. with the method of determining when to stop training of Laws et al.
Doing so would reduce computational costs (pg. 472; This might lead to an approach to reduce the computational cost of AL).


Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Xiao et al. (US-20060112028-A1) in view of Wu (US-20080255844-A1) and Laws et al. (“Stopping Nguyen et al. (“Stopping criteria for ensemble of evolutionary artificial neural networks”).
Regarding Claim 5, 
Xiao et al., Wu, and Laws et al. teach the learning apparatus according to claim 2.
	Xiao et al., Wu, and Laws et al. do not explicitly disclose
calculates a correlation function between a ratio of a number of the samples with labeling to a first number of samples being smaller than the number of the samples with labeling by a predetermined number, and a ratio of a loss when a number of the samples with labeling is the first number of samples to a loss with respect to all the samples with labeling, and, when the correlation function is greater than a predetermined threshold value, determine not to update the dictionary.
However, Nguyen et al. teaches
Wherein, the at least one processor is further configured to execute the instructions to:  calculate a correlation function between a ratio of a number of the samples with labeling to a first number of samples being smaller than the number of the samples with labeling by a predetermined number, and a ratio of a loss when a number of the samples with labeling is the first number of samples to a loss with respect to all the samples with labeling (pg. 103; FpðmÞ is called the penalty function of network m and pattern p. This represents the correlation between the networks. FpðmÞ¼ðYˆ p ðmÞ F pÞ X l 6¼ m ðYˆ p ðlÞ F pÞ (11)), and, when the correlation function is greater than a predetermined threshold value, determine not to update the dictionary (pg. 104; The second criterion is to choose the ensemble corresponding to the minimum validation error.).

Doing so would allow for preventing overfitting (pg. 101 Thus, it is also desirable to stop the training phase in the right moment before overfitting happens.).
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Yang et al. “Unbiased Active Learning” (US 20100217732 A1) This art discloses a method for labeling unlabeled instances using a loss and stop condition.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to HENRY K NGUYEN whose telephone number is (571)272-0217.  The examiner can normally be reached on Mon - Fri 7:00am-4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/HENRY NGUYEN/Examiner, Art Unit 2121                                                                                                                                                                                                        


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121