DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application was filed on 06/07/2019.
This action is in response to amendment and/or arguments filed on 12/06/2022. In the current amendments, claims 1, 6 and 8 have been amended and claim 7 has been cancelled. Claim 1-6 and 8 are currently pending and have been examined. 
In response to amendments and/or arguments filed on 12/06/2022, the 35 U.S.C 112(b) rejections made in the previous Office Action has been withdrawn.

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Response to Arguments
Applicant's arguments filed on 12/06/2022 have been fully considered but they are not fully persuasive. 
As per applicant’s argument that “Settles does not teach that selection priority levels are calculated for the plurality of pieces of unlabeled data in such a way that a selection priority level is higher with closeness to a discrimination boundary that is based on a discrimination function serving as a basis for discriminating data and is higher the higher the calculated importance is. For instance, in section 3.6 on page 27, Settles states "The main idea is that informative instances should not only be those which are uncertain, but also those which are 'representative' of the underlying distribution (i.e., inhabit dense regions of the input space).” (Remarks p. 7-8).
The Examiner respectfully disagrees. On page 26, Figure 7, Settles clearly illustrates data selection wherein the only priority is proximity to the decision boundary, see especially: “Since A is on the decision boundary, it would be queried as the most uncertain.” These primitive forms of data selection priority levels are explained in detail in section 3.1, pages 12-15 (see especially p.14-15, “Tong and Koller (2000) also experiment with an uncertainty sampling strategy for support vector machines—or SVMs— that involves querying the instance closest to the linear decision boundary.”). In addition, Settles teaches selecting data instances for labeling (i.e., prioritizing the one data instance over the others) based on the proximity of that data instance to a discrimination boundary (i.e., decision boundary) is seen as implicitly assigning a priority level to the data instances in correspondence to their closeness to a decision boundary.) Thus, Settles obviously teaches selection priority levels are calculated for the plurality of pieces of unlabeled data as claimed.



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over “Active Learning Literature Survey” to Settles (hereinafter “Settles”) in view of U.S. Pub. No. US 20080069437A1 to Baker (hereinafter “Baker”).
Regarding claim 1 (Currently Amended), 
Settles teaches calculate an importance of each of a plurality of pieces of unlabeled data using a density of labeled data, (“The main idea is that informative instances should not only be those which are uncertain, but also those which are “representative” of the underlying distribution (i.e., inhabit dense regions of the input space). Therefore, we wish to query instances as follows: [calculation formula] Here, φA(x) represents the informativeness of x according to some “base” query strategy A, such as an uncertainty sampling or QBC approach. The second term weights the informativeness of x…” Settles, p25. Examiner Note: The importance of the piece of unlabeled data is seen as equivalent to Settle’s informativeness.)
each of the plurality of pieces of unlabeled data being a piece of training data that is arranged in a feature space at a position based on a feature vector of the piece of training data, (Settles, Fig. 7, p26, “An illustration of when uncertainty sampling can be a poor strategy for classification. Shaded polygons represent labeled instances in L, and circles represent unlabeled instances in U. Since A is on the decision boundary, it would be queried as the most uncertain. However, querying B is likely to result in more information about the data distribution as a whole.” “Active learning algorithms are generally evaluated by constructing learning curves, which plot the evaluation measure of interest (e.g., accuracy) as a function of the number of new instance queries that are labeled and added to L. Figure 3 presents learning curves for the first 100 instances labeled using uncertainty sampling and random sampling.” Settles, p7. Examiner Note: Figure 7 demonstrates that a piece of unlabeled data is in a feature space (i.e. the figure itself) where its position in the graph is its corresponding feature vector. On page 7, Settles clarifies that all data is able to be training data in an active learning system (because once unlabeled data is labeled, it is added to the training data).) 
the feature space being a space having an element constituting the feature vector of the piece of training data as a variable (Settles, Fig. 7, p26, “An illustration of when uncertainty sampling can be a poor strategy for classification. Shaded polygons represent labeled instances in L, and circles represent unlabeled instances in U. Since A is on the decision boundary, it would be queried as the most uncertain. However, querying B is likely to result in more information about the data distribution as a whole.” See also “Figure 2: An illustrative example of pool-based active learning. (a) A toy data set of 400 instances, evenly sampled from two class Gaussians. The instances are represented as points in a 2D feature space. (b) A logistic regression model trained with 30 labeled instances randomly drawn from the problem domain.” Examiner Note: Figure 7 demonstrates that a piece of unlabeled data is in a feature space (i.e. the figure itself) where its position in the graph is its corresponding feature vector.), 
the density of labeled data being a density with respect to not any piece of unlabeled data but a piece of labeled data, (Settles, Fig. 7, p25-26, “An illustration of when uncertainty sampling can be a poor strategy for classification. Shaded polygons represent labeled instances in L, and circles represent unlabeled instances in U. Since A is on the decision boundary, it would be queried as the most uncertain. However, querying B is likely to result in more information about the data distribution as a whole… Fujii et al. (1998) considered a query strategy for nearest-neighbor methods that selects queries that are (i) least similar to the labeled instances in L, and (ii) most similar to the unlabeled instances in U.” Examiner Note: The function of Settles’ “Density-Weighted” Active learning, as being described here, is to select a piece of data to label based on its proximity to other data, and optionally can weight based exclusively on proximity to labeled data.),
the piece of labeled data being the piece of training data in the feature space, (Figure 1 demonstrates that the labeled instances L in figure 7 represent the training data of the algorithm.  Figure 7: “An illustration of when uncertainty sampling can be a poor strategy for classification. Shaded polygons represent labeled instances in L, and circles represent unlabeled instances in U.” Settles, pages 7 and 26 Examiner Note: On page 7, Settles clarifies that all data is able to be training data in an active learning system (because once unlabeled data is labeled, it is added to the training data).),
the density of labeled data being a density of pieces of labeled data in a region where each of the plurality of pieces of unlabeled data as a reference is arranged and has a predetermined size, (“Here, φA(x) represents the informativeness of x according to some “base” query strategy A, such as an uncertainty sampling or QBC approach. The second term weights the informativeness of x by its average similarity to all other instances in the input distribution (as approximated by U), subject to a parameter β that controls the relative importance of the density term. A variant of this might first cluster U and compute average similarity to instances in the same cluster.” Settles, p25-26. Examiner Note: A circle of radius x qualifies as a region of predetermined size, and is an extremely basic form of clustering (e.g. if its in the circle, its in the cluster).);
the importance being higher as the density of labeled data decreases; calculate selection priority levels respectively corresponding to the plurality of pieces of unlabeled data using the importance of each of the plurality of pieces of unlabeled data (“Here, φA(x) represents the informativeness of x according to some “base” query strategy A, such as an uncertainty sampling or QBC approach. The second term weights the informativeness of x by its average similarity to all other instances in the input distribution (as approximated by U), subject to a parameter β that controls the relative importance of the density term. A variant of this might first cluster U and compute average similarity to instances in the same cluster… Fujii et al. (1998) considered a query strategy for nearest-neighbor methods that selects queries that are (i) least similar to the labeled instances in L, and (ii) most similar to the unlabeled instances in U.” Settles, p25-26. Examiner Note: The examiner sees selection of a query based on “most similar to the unlabeled instances” as equivalent to increasing importance based on the ratio of unlabeled data to labeled data increasing.)
and closeness to a discrimination boundary with respect to each of the plurality of pieces of unlabeled data, the discrimination boundary being based on a discrimination function serving as a basis for discriminating data, each of the selection priority levels being higher in correspondence with the closeness to the discrimination boundary and being higher in correspondence with the importance; (Settles p.12, “Perhaps the simplest and most commonly used query framework is uncertainty sampling (Lewis and Gale, 1994). In this framework, an active learner queries the instances about which it is least certain how to label.” Settles, p.14-15, “Tong and Koller (2000) also experiment with an uncertainty sampling strategy for support vector machines—or SVMs— that involves querying the instance closest to the linear decision boundary.” Settles, p. 26, Figure 7, “Since A is on the decision boundary, it would be queried as the most uncertain.” Examiner Note: Selecting data instances for labeling (i.e., prioritizing the one data instance over the others) based on the proximity of that data instance to a discrimination boundary (i.e., decision boundary) is seen as implicitly assigning a priority level to the data instances in correspondence to their closeness to a decision boundary.)
select data to be labeled from among the plurality of pieces of unlabeled data using information regarding the selection priority levels respectively corresponding to the plurality of pieces of unlabeled data (Settles p.12, “Perhaps the simplest and most commonly used query framework is uncertainty sampling (Lewis and Gale, 1994). In this framework, an active learner queries the instances about which it is least certain how to label.” Settles, p.14-15, “Tong and Koller (2000) also experiment with an uncertainty sampling strategy for support vector machines—or SVMs— that involves querying the instance closest to the linear decision boundary.” Settles, p. 26, Figure 7, “Since A is on the decision boundary, it would be queried as the most uncertain.” Examiner Note: Selecting data instances for labeling (i.e., prioritizing the one data instance over the others) based on the proximity of that data instance to a discrimination boundary (i.e., decision boundary) is seen as implicitly assigning a priority level to the data instances in correspondence to their closeness to a decision boundary.); 
Settles teaches a discrimination function and learning with a plurality of labeled training data, but does not explicitly disclose A dictionary learning device comprising: a processor configured, and learning a dictionary using a plurality of pieces of the training data, the plurality of pieces of the training data including the selected data to which a label has been added, the dictionary being a parameter of the discrimination function.
Baker teaches a dictionary learning device comprising: a processor configured to (para [0102] “Those skilled in the art will appreciate that such network computing environments will typically encompass many types of computer system configurations, including personal computers, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like.” [0103] “An exemplary system for implementing the overall system or portions of the invention is shown in FIG. 25. This exemplary system include a plurality of general purpose computing devices and memory storage. By way of example, each computing device could include a processing unit, a system memory, and a system bus that couples various system components including the system memory to the processing unit. The system memory may include read only memory (ROM) and random access memory (RAM). The computer may also include a magnetic hard disk drive for reading from and writing to a magnetic hard disk, a magnetic disk drive for reading from or writing to a removable magnetic disk, and an optical disk drive for reading from or writing to a removable optical disk such as a CD-ROM or other optical media. The drives and their associated machine-readable media provide nonvolatile storage of machine-executable instructions, data structures, program modules and other data for the computer.”);
and learning a dictionary using a plurality of pieces of the training data, the plurality of pieces of the training data including the selected data to which a label has been added, the dictionary being a parameter of the discrimination function (para [0222] “Block 610 obtains a collection of training data with labels. In non-Socratic model training it is important to have a very low error rate in the labels associated with the training data.” [0308] “Block 1304 optimizes parameters if the selected question has adjustable parameters. For example, one type of question is a linear discriminant function. In one embodiment of the phoneme recognizer example, a linear discriminant function might be constructed to discriminate vowels from fricatives. The parameters of this discriminate function would be optimized for the discrimination task before measuring the performance of the question on the node splitting task in the tree building process.” [0318] “Block 1402 trains a discriminator for the two class problem. There are many well-known techniques for training a two-class discriminator. In one embodiment, a simple form of discriminator is used in which a test is made on only one data feature. In this one embodiment, the training is done by trying each data feature, and for each data feature creating a discriminator by testing whether the value of the feature is greater or less than a specified threshold value.” Examiner Note: Baker recites the training of a discriminator (see [0318]) and specifically the optimization of discrimination function parameters (see [0308]). Baker’s training data is labeled. When Baker is applied to Settles, the resulting system would learn, using labeled training data as generated above by the method disclosed by Settles, a dictionary device, the dictionary being a parameter of the discrimination function.)
	Settles and Baker are analogous art because they are both directed to active machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing of the claimed invention to combine Settles’ density based active learning system with Baker’s computer apparatus and discrimination function. The combination would have been obvious to one of ordinary skill in the art because they would have been motivated to increase classifier performance, which can be accomplished by optimizing discrimination parameters (Baker [0308], “The specified value would be adjusted to optimize the performance on the node splitting performance measurement to be applied in block 1305.”).


As per claim 2, Settles teaches The dictionary learning device according to claim 1, wherein the processor calculates the importance of each of the plurality of pieces of unlabeled data using a ratio between the density of labeled data and a density of unlabeled data in the region having the predetermined size and the piece of unlabeled data as the reference position (“Here, φA(x) represents the informativeness of x according to some “base” query strategy A, such as an uncertainty sampling or QBC approach. The second term weights the informativeness of x by its average similarity to all other instances in the input distribution (as approximated by U), subject to a parameter β that controls the relative importance of the density term. A variant of this might first cluster U and compute average similarity to instances in the same cluster… Fujii et al. (1998) considered a query strategy for nearest-neighbor methods that selects queries that are (i) least similar to the labeled instances in L, and (ii) most similar to the unlabeled instances in U.” Settles, p25-26. Examiner Note: Settles’ active learning function is able to be adjusted to take into account proximity of a particular data point to unlabeled and/or labeled data, which would include a ratio between unlabeled and labeled data.).

As per claim 3, Settles teaches The dictionary learning device according to claim 2, wherein the processor calculates the importance that increases as a ratio of the density of unlabeled data to the density of labeled data increases (“Here, φA(x) represents the informativeness of x according to some “base” query strategy A, such as an uncertainty sampling or QBC approach. The second term weights the informativeness of x by its average similarity to all other instances in the input distribution (as approximated by U), subject to a parameter β that controls the relative importance of the density term. A variant of this might first cluster U and compute average similarity to instances in the same cluster… Fujii et al. (1998) considered a query strategy for nearest-neighbor methods that selects queries that are (i) least similar to the labeled instances in L, and (ii) most similar to the unlabeled instances in U.” Settles, p25-26. Examiner Note: The examiner sees selection of a query based on “most similar to the unlabeled instances” as equivalent to increasing importance based on the ratio of unlabeled data to labeled data increasing.).
As per claim 4, Settles teaches The dictionary learning device according to claim 2, wherein the processor calculates the importance that increases as a ratio of the density of labeled data to the density of unlabeled data decreases (“Here, φA(x) represents the informativeness of x according to some “base” query strategy A, such as an uncertainty sampling or QBC approach. The second term weights the informativeness of x by its average similarity to all other instances in the input distribution (as approximated by U), subject to a parameter β that controls the relative importance of the density term. A variant of this might first cluster U and compute average similarity to instances in the same cluster… Fujii et al. (1998) considered a query strategy for nearest-neighbor methods that selects queries that are (i) least similar to the labeled instances in L, and (ii) most similar to the unlabeled instances in U.” Settles, p25-26. Examiner Note: The examiner sees selection of a query based on “least similar to the labeled instances” as equivalent to increasing importance based on the ratio of unlabeled data to labeled data increasing.).

As per claim 5, Settles teaches The dictionary learning device according to claim 1, wherein the processor is further configured to: when information on a label with which the selected data is to be labeled is received from outside the dictionary learning device, add a label based on the received information to the selected data (“Figure 1 illustrates the pool-based active learning cycle. A learner may begin with a small number of instances in the labeled training set L, request labels for one or more carefully selected instances, learn from the query results, and then leverage its new knowledge to choose which instances to query next. Once a query has been made, there are usually no additional assumptions on the part of the learning algorithm. The new labeled instance is simply added to the labeled set L, and the learner proceeds from there in a standard supervised way. There are a few exceptions to this, such as when the learner is allowed to make alternative types of queries (Section 6.4), or when active learning is combined with semisupervised learning (Section 7.1).” Settles, p6. Examiner Note: Settles teaches adding labeled data to the model when it is added to the system.); and
improve the discrimination function by learning a dictionary using a plurality of pieces of the training data, the plurality of pieces of the training data including the selected data to which the label has been added, the dictionary being a parameter of the discrimination function (“Figure 1 illustrates the pool-based active learning cycle. A learner may begin with a small number of instances in the labeled training set L, request labels for one or more carefully selected instances, learn from the query results, and then leverage its new knowledge to choose which instances to query next. Once a query has been made, there are usually no additional assumptions on the part of the learning algorithm. The new labeled instance is simply added to the labeled set L, and the learner proceeds from there in a standard supervised way. There are a few exceptions to this, such as when the learner is allowed to make alternative types of queries (Section 6.4), or when active learning is combined with semisupervised learning (Section 7.1).” Settles, p6 “Active learning algorithms are generally evaluated by constructing learning curves, which plot the evaluation measure of interest (e.g., accuracy) as a function of the number of new instance queries that are labeled and added to L. Figure 3 presents learning curves for the first 100 instances labeled using uncertainty sampling and random sampling.” Settles, p7. Examiner Note: Settles teaches learning a model based on the labeled samples available, and then iteratively improving that model (see second quote, Figure 3’s “learning curve”) based on adding additional labeled samples to the model.).

Regarding claim 6 (Currently Amended), Settles teaches calculate an importance of each of a plurality of pieces of unlabeled data using a density of labeled data (“The main idea is that informative instances should not only be those which are uncertain, but also those which are “representative” of the underlying distribution (i.e., inhabit dense regions of the input space). Therefore, we wish to query instances as follows: [calculation formula] Here, φA(x) represents the informativeness of x according to some “base” query strategy A, such as an uncertainty sampling or QBC approach. The second term weights the informativeness of x…” Settles, p25. Examiner Note: The importance of the piece of unlabeled data is seen as equivalent to Settle’s informativeness.), each of the plurality of pieces of unlabeled data being a piece of training data that is arranged in a feature space at a position based on a feature vector of the piece of training data (Settles, Fig. 7, p26, “An illustration of when uncertainty sampling can be a poor strategy for classification. Shaded polygons represent labeled instances in L, and circles represent unlabeled instances in U. Since A is on the decision boundary, it would be queried as the most uncertain. However, querying B is likely to result in more information about the data distribution as a whole.” “Active learning algorithms are generally evaluated by constructing learning curves, which plot the evaluation measure of interest (e.g., accuracy) as a function of the number of new instance queries that are labeled and added to L. Figure 3 presents learning curves for the first 100 instances labeled using uncertainty sampling and random sampling.” Settles, p7. Examiner Note: Figure 7 demonstrates that a piece of unlabeled data is in a feature space (i.e. the figure itself) where its position in the graph is its corresponding feature vector. On page 7, Settles clarifies that all data is able to be training data in an active learning system (because once unlabeled data is labeled, it is added to the training data).), the feature space being a space having an element constituting the feature vector of the piece of training data as a variable (Settles, Fig. 7, p26, “An illustration of when uncertainty sampling can be a poor strategy for classification. Shaded polygons represent labeled instances in L, and circles represent unlabeled instances in U. Since A is on the decision boundary, it would be queried as the most uncertain. However, querying B is likely to result in more information about the data distribution as a whole.” See also “Figure 2: An illustrative example of pool-based active learning. (a) A toy data set of 400 instances, evenly sampled from two class Gaussians. The instances are represented as points in a 2D feature space. (b) A logistic regression model trained with 30 labeled instances randomly drawn from the problem domain.” Examiner Note: Figure 7 demonstrates that a piece of unlabeled data is in a feature space (i.e. the figure itself) where its position in the graph is its corresponding feature vector.), the density of labeled data being a density with respect to not any piece of unlabeled data but a piece of labeled data (Settles, Fig. 7, p25-26, “An illustration of when uncertainty sampling can be a poor strategy for classification. Shaded polygons represent labeled instances in L, and circles represent unlabeled instances in U. Since A is on the decision boundary, it would be queried as the most uncertain. However, querying B is likely to result in more information about the data distribution as a whole… Fujii et al. (1998) considered a query strategy for nearest-neighbor methods that selects queries that are (i) least similar to the labeled instances in L, and (ii) most similar to the unlabeled instances in U.” Examiner Note: The function of Settles’ “Density-Weighted” Active learning, as being described here, is to select a piece of data to label based on its proximity to other data, and optionally can weight based exclusively on proximity to labeled data.), the piece of labeled data being the piece of training data in the feature space (Figure 1 demonstrates that the labeled instances L in figure 7 represent the training data of the algorithm.  Figure 7: “An illustration of when uncertainty sampling can be a poor strategy for classification. Shaded polygons represent labeled instances in L, and circles represent unlabeled instances in U.” Settles, pages 7 and 26 Examiner Note: On page 7, Settles clarifies that all data is able to be training data in an active learning system (because once unlabeled data is labeled, it is added to the training data).), the density of labeled data being a density of pieces of labeled data in a region where each of the plurality of pieces of unlabeled data as a reference is arranged and has a predetermined size (“Here, φA(x) represents the informativeness of x according to some “base” query strategy A, such as an uncertainty sampling or QBC approach. The second term weights the informativeness of x by its average similarity to all other instances in the input distribution (as approximated by U), subject to a parameter β that controls the relative importance of the density term. A variant of this might first cluster U and compute average similarity to instances in the same cluster.” Settles, p25-26. Examiner Note: A circle of radius x qualifies as a region of predetermined size, and is an extremely basic form of clustering (e.g. if its in the circle, its in the cluster).);
select data to be labeled from among the plurality of pieces of unlabeled data using information on closeness of each piece of unlabeled data to a discrimination boundary and information on the calculated importance, the discrimination boundary being based on a discrimination function serving as a basis for discriminating data (“The main idea is that informative instances should not only be those which are uncertain, but also those which are “representative” of the underlying distribution (i.e., inhabit dense regions of the input space). Therefore, we wish to query instances as follows: [calculation formula] Here, φA(x) represents the informativeness of x according to some “base” query strategy A, such as an uncertainty sampling or QBC approach. The second term weights the informativeness of x…” Figure 7. Settles, p25. Settles, Fig. 7, p25-26, “An illustration of when uncertainty sampling can be a poor strategy for classification. Shaded polygons represent labeled instances in L, and circles represent unlabeled instances in U. Since A is on the decision boundary, it would be queried as the most uncertain. However, querying B is likely to result in more information about the data distribution as a whole” Examiner Note: Settles’ active learning selects a piece of data based on a function including both a discrimination function, and the relative impact of adding that labeled sample to the model.);
when information on a label with which the selected data is to be labeled is received from outside the processor performing the dictionary learning method, adding a label based on the received information to the selected data (“Figure 1 illustrates the pool-based active learning cycle. A learner may begin with a small number of instances in the labeled training set L, request labels for one or more carefully selected instances, learn from the query results, and then leverage its new knowledge to choose which instances to query next. Once a query has been made, there are usually no additional assumptions on the part of the learning algorithm. The new labeled instance is simply added to the labeled set L, and the learner proceeds from there in a standard supervised way. There are a few exceptions to this, such as when the learner is allowed to make alternative types of queries (Section 6.4), or when active learning is combined with semisupervised learning (Section 7.1).” Settles, p6. Examiner Note: Settles teaches adding labeled data to the model when it is added to the system.); and
improving the discrimination function by learning a dictionary using a plurality of pieces of the training data, the plurality of pieces of the training data including new piece of labeled data given the label, the dictionary being a parameter of the discrimination function (“Figure 1 illustrates the pool-based active learning cycle. A learner may begin with a small number of instances in the labeled training set L, request labels for one or more carefully selected instances, learn from the query results, and then leverage its new knowledge to choose which instances to query next. Once a query has been made, there are usually no additional assumptions on the part of the learning algorithm. The new labeled instance is simply added to the labeled set L, and the learner proceeds from there in a standard supervised way. There are a few exceptions to this, such as when the learner is allowed to make alternative types of queries (Section 6.4), or when active learning is combined with semisupervised learning (Section 7.1).” Settles, p6 “Active learning algorithms are generally evaluated by constructing learning curves, which plot the evaluation measure of interest (e.g., accuracy) as a function of the number of new instance queries that are labeled and added to L. Figure 3 presents learning curves for the first 100 instances labeled using uncertainty sampling and random sampling.” Settles, p7. Examiner Note: Settles teaches learning a model based on the labeled samples available, and then iteratively improving that model (see second quote, Figure 3’s “learning curve”) based on adding additional labeled samples to the model.)
the importance being higher as the density of labeled data decreases; calculate selection priority levels respectively corresponding to the plurality of pieces of unlabeled data using the importance of each of the plurality of pieces of unlabeled data (“Here, φA(x) represents the informativeness of x according to some “base” query strategy A, such as an uncertainty sampling or QBC approach. The second term weights the informativeness of x by its average similarity to all other instances in the input distribution (as approximated by U), subject to a parameter β that controls the relative importance of the density term. A variant of this might first cluster U and compute average similarity to instances in the same cluster… Fujii et al. (1998) considered a query strategy for nearest-neighbor methods that selects queries that are (i) least similar to the labeled instances in L, and (ii) most similar to the unlabeled instances in U.” Settles, p25-26. Examiner Note: The examiner sees selection of a query based on “most similar to the unlabeled instances” as equivalent to increasing importance based on the ratio of unlabeled data to labeled data increasing.)
and closeness to a discrimination boundary with respect to each of the plurality of pieces of unlabeled data, the discrimination boundary being based on a discrimination function serving as a basis for discriminating data, each of the selection priority levels being higher in correspondence with the closeness to the discrimination boundary and being higher in correspondence with the importance; (Settles p.12, “Perhaps the simplest and most commonly used query framework is uncertainty sampling (Lewis and Gale, 1994). In this framework, an active learner queries the instances about which it is least certain how to label.” Settles, p.14-15, “Tong and Koller (2000) also experiment with an uncertainty sampling strategy for support vector machines—or SVMs— that involves querying the instance closest to the linear decision boundary.” Settles, p. 26, Figure 7, “Since A is on the decision boundary, it would be queried as the most uncertain.” Examiner Note: Selecting data instances for labeling (i.e., prioritizing the one data instance over the others) based on the proximity of that data instance to a discrimination boundary (i.e., decision boundary) is seen as implicitly assigning a priority level to the data instances in correspondence to their closeness to a decision boundary.)
Settles teaches a discrimination function and learning with a plurality of labeled training data, but does not explicitly disclose A dictionary learning device comprising: a processor configured, and learning a dictionary using a plurality of pieces of the training data, the plurality of pieces of the training data including the selected data to which a label has been added, the dictionary being a parameter of the discrimination function.
	Baker teaches A dictionary learning method comprising: by a processor, calculating ([0102] “Those skilled in the art will appreciate that such network computing environments will typically encompass many types of computer system configurations, including personal computers, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like.” [0103] “An exemplary system for implementing the overall system or portions of the invention is shown in FIG. 25. This exemplary system include a plurality of general purpose computing devices and memory storage. By way of example, each computing device could include a processing unit, a system memory, and a system bus that couples various system components including the system memory to the processing unit. The system memory may include read only memory (ROM) and random access memory (RAM). The computer may also include a magnetic hard disk drive for reading from and writing to a magnetic hard disk, a magnetic disk drive for reading from or writing to a removable magnetic disk, and an optical disk drive for reading from or writing to a removable optical disk such as a CD-ROM or other optical media. The drives and their associated machine-readable media provide nonvolatile storage of machine-executable instructions, data structures, program modules and other data for the computer.”)
and learning a dictionary using a plurality of pieces of the training data, the plurality of pieces of the training data including the selected data to which a label has been added, the dictionary being a parameter of the discrimination function (para [0222] “Block 610 obtains a collection of training data with labels. In non-Socratic model training it is important to have a very low error rate in the labels associated with the training data.” [0308] “Block 1304 optimizes parameters if the selected question has adjustable parameters. For example, one type of question is a linear discriminant function. In one embodiment of the phoneme recognizer example, a linear discriminant function might be constructed to discriminate vowels from fricatives. The parameters of this discriminate function would be optimized for the discrimination task before measuring the performance of the question on the node splitting task in the tree building process.” [0318] “Block 1402 trains a discriminator for the two class problem. There are many well-known techniques for training a two-class discriminator. In one embodiment, a simple form of discriminator is used in which a test is made on only one data feature. In this one embodiment, the training is done by trying each data feature, and for each data feature creating a discriminator by testing whether the value of the feature is greater or less than a specified threshold value.” Examiner Note: Baker recites the training of a discriminator (see [0318]) and specifically the optimization of discrimination function parameters (see [0308]). Baker’s training data is labeled. When Baker is applied to Settles, the resulting system would learn, using labeled training data as generated above by the method disclosed by Settles, a dictionary device, the dictionary being a parameter of the discrimination function.)
Settles and Baker are analogous art because they are both directed to active machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing of the claimed invention to combine Settles’ density based active learning system with Baker’s computer apparatus and discrimination function. The combination would have been obvious to one of ordinary skill in the art because they would have been motivated to integrate Settles’ method into a practical system, which can be accomplished by executing it via a processor and memory (Baker [0100], “As noted above, embodiments within the scope of the present invention include program products comprising machine-readable media for carrying or having machine-executable instructions or data structures stored thereon. Such machine-readable media can be any available media which can be accessed by a general purpose or special purpose computer or other machine with a processor.”).

Claim 8 is a non-transitory program storage medium apparatus claim corresponding to system claim 1. Claim 8 requires A non-transitory program storage medium on which a computer program is stored, the computer program causing a computer to perform (Baker [0102] “Those skilled in the art will appreciate that such network computing environments will typically encompass many types of computer system configurations, including personal computers, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like.” [0103] “An exemplary system for implementing the overall system or portions of the invention is shown in FIG. 25. This exemplary system include a plurality of general purpose computing devices and memory storage. By way of example, each computing device could include a processing unit, a system memory, and a system bus that couples various system components including the system memory to the processing unit. The system memory may include read only memory (ROM) and random access memory (RAM). The computer may also include a magnetic hard disk drive for reading from and writing to a magnetic hard disk, a magnetic disk drive for reading from or writing to a removable magnetic disk, and an optical disk drive for reading from or writing to a removable optical disk such as a CD-ROM or other optical media. The drives and their associated machine-readable media provide nonvolatile storage of machine-executable instructions, data structures, program modules and other data for the computer.”). 
Settles and Baker are analogous art because they are both directed to active machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing of the claimed invention to combine Settles’ density based active learning system with Baker’s computer apparatus and discrimination function. The combination would have been obvious to one of ordinary skill in the art because they would have been motivated to integrate Settles’ method into a practical system, which can be accomplished by executing it via a processor and memory (Baker [0100], “As noted above, embodiments within the scope of the present invention include program products comprising machine-readable media for carrying or having machine-executable instructions or data structures stored thereon. Such machine-readable media can be any available media which can be accessed by a general purpose or special purpose computer or other machine with a processor.”).
Claim 8 is rejected for the same reasons as claim 1. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. U.S. Pub. No. 20130097103 A1 to Chari et al, U.S. Pub. No. 10402685 B2 to Guyon et Weston, U.S. Pub. No. 20140201126 A1 to Zadeh et Tadayon, and U.S. Pub. No. 20130054603 A1 to Birdwell et al.                                                                                                                                                                                   

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VAN C MANG whose telephone number is (571)270-7598. The examiner can normally be reached Mon - Fri 8:00-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on 5712729767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/V.M./
Examiner, Art Unit 2126     
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126