Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
	Claims 1-6 and 8 are pending in the present application.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 5 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. In the limitation “when information on a label with which the selected piece of unlabeled data is to be labeled is received from outside, give the selected piece of unlabeled data with the label using the received information”, it is unclear as to where or what the “selected piece of unlabeled data” is being given to. The examiner has interpreted the claim in its present state 


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.



Claims 1-4 and 8 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Step 1 analysis:
In the instant case, the claims are directed to a system (claims 1-5), method (claim 6), and apparatus (claim 8). Thus, each of the claims falls within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).

Step 2A analysis:
Based on the claims being determined to be within of the four categories (Step 1), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea), in this case the claims fall within the judicial exception of an abstract idea. 

Step 2A: Prong 1 analysis:
The claim(s) recite(s):
Claims 1 and 8:
•	 “calculate an importance of a piece of unlabeled data using a density of labeled data…” (mathematical calculation)
•	 “select(ing) data to be labeled...” (judgement) 

Step 2A: Prong 2 analysis:
This judicial exception is not integrated into a practical application because the additional elements in claims 1 and 8, “a processor configured to”, “a non-transitory program storage medium on which a computer program is stored, the computer program causing a computer to perform” correspond to mere instructions to implement an abstract idea or other exception on a generic computer. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B:	The additional elements in the claims, “a processor configured to”, “A non-transitory program storage medium on which a computer program is stored, the computer program causing a 

Step 2A: Prong 1: 	Claim 2:
“wherein the processor calculates the importance…” (mathematical calculation).

Step 2A: Prong 2:
The further limitations in claim 2 is directed to a judicial exception and nothing more. Accordingly, the additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2A: Prong 1:
 	Claim 3:
•	“wherein the processor calculates the importance…” (mathematical calculation).

Step 2A: Prong 2:


Step 2A: Prong 1:
 	Claim 4:
•	“wherein the processor calculates the importance…” (mathematical calculation).

Step 2A: Prong 2:
The further limitation in claim 4 is directed to a judicial exception and nothing more. Accordingly, the additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-6 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over “Active Learning Literature Survey” to Settles (hereinafter “Settles”) in view of U.S. Pub. No. 20130097103 A1 to Chari et al (hereinafter “Chari”).

As per claim 1, Settles teaches calculate an importance of a piece of unlabeled data using a density of labeled data (“The main idea is that informative instances should not only be those which are uncertain, but also those which are “representative” of the underlying distribution (i.e., inhabit dense regions of the input space). Therefore, we wish to query instances as follows: , the piece of unlabeled data being a piece of training data that is arranged in a feature space at a position based on a feature vector of the piece of training data (Settles, Fig. 7, p26, “An illustration of when uncertainty sampling can be a poor strategy for classification. Shaded polygons represent labeled instances in L, and circles represent unlabeled instances in U. Since A is on the decision boundary, it would be queried as the most uncertain. However, querying B is likely to result in more information about the data distribution as a whole.” “Active learning algorithms are generally evaluated by constructing learning curves, which plot the evaluation measure of interest (e.g., accuracy) as a function of the number of new instance queries that are labeled and added to L. Figure 3 presents learning curves for the first 100 instances labeled using uncertainty sampling and random sampling.” Settles, p7. Examiner Note: Figure 7 demonstrates that a piece of unlabeled data is in a feature space (i.e. the figure itself) where its position in the graph is its corresponding feature vector. On page 7, Settles clarifies that all data is able to be training data in an active learning system (because once unlabeled data is labeled, it is added to the training data).), the feature space being a space having an element constituting the feature vector of the piece of training data as a variable (Settles, Fig. 7, p26, “An illustration of when uncertainty sampling can be a poor strategy for classification. Shaded polygons represent labeled instances in L, and circles represent unlabeled instances in U. Since A is on the decision boundary, it would be queried as the most uncertain. However, querying B is likely to result in more information about the data distribution as a whole.” See also “Figure 2: An illustrative example of pool-based active learning. (a) A toy data set of 400 instances, evenly sampled from two class Gaussians. The instances are represented as points in a 2D feature space. (b) A logistic regression , the density of labeled data being a density with respect to a piece of labeled data (Settles, Fig. 7, p25-26, “An illustration of when uncertainty sampling can be a poor strategy for classification. Shaded polygons represent labeled instances in L, and circles represent unlabeled instances in U. Since A is on the decision boundary, it would be queried as the most uncertain. However, querying B is likely to result in more information about the data distribution as a whole… Fujii et al. (1998) considered a query strategy for nearest-neighbor methods that selects queries that are (i) least similar to the labeled instances in L, and (ii) most similar to the unlabeled instances in U.” Examiner Note: The function of Settles’ “Density-Weighted” Active learning, as being described here, is to select a piece of data to label based on its proximity to other data, and optionally can weight based exclusively on proximity to labeled data.), the piece of labeled data being the piece of training data in the feature space (Figure 1 demonstrates that the labeled instances L in figure 7 represent the training data of the algorithm.  Figure 7: “An illustration of when uncertainty sampling can be a poor strategy for classification. Shaded polygons represent labeled instances in L, and circles represent unlabeled instances in U.” Settles, pages 7 and 26 Examiner Note: On page 7, Settles clarifies that all data is able to be training data in an active learning system (because once unlabeled data is labeled, it is added to the training data).), the density of labeled data being a density of pieces of labeled data in a region where the piece of unlabeled data as a reference is arranged and has a predetermined size (“Here, φA(x) represents the informativeness of x according to some “base” query strategy A, such as an uncertainty sampling or QBC approach. The second term weights the informativeness of x by its average similarity to all other instances in the input distribution (as approximated by U), subject to a parameter β that controls the relative importance of the density ; and
select data to be labeled from among the plurality of pieces of unlabeled data using information on closeness of the piece of unlabeled data to a discrimination boundary and information on the calculated importance, the discrimination boundary being based on a discrimination function to serve as a basis for discriminating data (“The main idea is that informative instances should not only be those which are uncertain, but also those which are “representative” of the underlying distribution (i.e., inhabit dense regions of the input space). Therefore, we wish to query instances as follows: [calculation formula] Here, φA(x) represents the informativeness of x according to some “base” query strategy A, such as an uncertainty sampling or QBC approach. The second term weights the informativeness of x…” Figure 7. Settles, p25. Settles, Fig. 7, p25-26, “An illustration of when uncertainty sampling can be a poor strategy for classification. Shaded polygons represent labeled instances in L, and circles represent unlabeled instances in U. Since A is on the decision boundary, it would be queried as the most uncertain. However, querying B is likely to result in more information about the data distribution as a whole” Examiner Note: Settles’ active learning selects a piece of data based on a function including both a discrimination function, and the relative impact of adding that labeled sample to the model.).

Settles does not explicitly teach A dictionary learning device comprising: a processor configured.

	Chari teaches A dictionary learning device comprising: a processor configured to ([0078] “Processor device 1420 can be configured to implement the methods, steps, and functions .

	Settles and Chari are analogous art because they are both directed to active machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing of the claimed invention to combine Settles’ density based active learning system with Chari’s computer apparatus. The combination would have been obvious to one of ordinary skill in the art because they would have been motivated to integrate Settles’ method into a practical system, which can be accomplished by executing it with a processing apparatus (Chari, [0078]).

As per claim 2, Settles teaches The dictionary learning device according to claim 1, wherein the processor calculates the importance of the piece of unlabeled data using a ratio between the density of labeled data and a density of unlabeled data in the region having the predetermined size and the piece of unlabeled data as the reference position (“Here, φA(x) represents the informativeness of x according to some “base” query strategy A, such as an uncertainty sampling or QBC approach. The second term weights the informativeness of x by its average similarity to all other instances in the input distribution (as approximated by U), subject to a parameter β that controls the relative importance of the density term. A variant of this might first cluster U and compute average similarity to instances in the same cluster… Fujii et al. (1998) considered a query strategy for nearest-neighbor methods that selects queries that are (i) least similar to the labeled instances in L, and (ii) most similar to the unlabeled instances in U.” Settles, p25-26. .

As per claim 3, Settles teaches The dictionary learning device according to claim 2, wherein the processor calculates the importance that increases as a ratio of the density of unlabeled data to the density of labeled data increases (“Here, φA(x) represents the informativeness of x according to some “base” query strategy A, such as an uncertainty sampling or QBC approach. The second term weights the informativeness of x by its average similarity to all other instances in the input distribution (as approximated by U), subject to a parameter β that controls the relative importance of the density term. A variant of this might first cluster U and compute average similarity to instances in the same cluster… Fujii et al. (1998) considered a query strategy for nearest-neighbor methods that selects queries that are (i) least similar to the labeled instances in L, and (ii) most similar to the unlabeled instances in U.” Settles, p25-26. Examiner Note: The examiner sees selection of a query based on “most similar to the unlabeled instances” as equivalent to increasing importance based on the ratio of unlabeled data to labeled data increasing.).

As per claim 4, Settles teaches The dictionary learning device according to claim 2, wherein the processor calculates the importance that increases as a ratio of the density of labeled data to the density of unlabeled data decreases (“Here, φA(x) represents the informativeness of x according to some “base” query strategy A, such as an uncertainty sampling or QBC approach. The second term weights the informativeness of x by its average similarity to all other instances in the input distribution (as approximated by U), subject to a parameter β that controls the relative importance of the density term. A variant of this might first cluster U and .

As per claim 5, Settles teaches The dictionary learning device according to claim 1, wherein the processor is further configured to: when information on a label with which the selected piece of unlabeled data is to be labeled is received from outside, give the selected piece of unlabeled data with the label using the received information (“Figure 1 illustrates the pool-based active learning cycle. A learner may begin with a small number of instances in the labeled training set L, request labels for one or more carefully selected instances, learn from the query results, and then leverage its new knowledge to choose which instances to query next. Once a query has been made, there are usually no additional assumptions on the part of the learning algorithm. The new labeled instance is simply added to the labeled set L, and the learner proceeds from there in a standard supervised way. There are a few exceptions to this, such as when the learner is allowed to make alternative types of queries (Section 6.4), or when active learning is combined with semisupervised learning (Section 7.1).” Settles, p6. Examiner Note: Settles teaches adding labeled data to the model when it is added to the system.); and
improve the discrimination function by learning a dictionary using a plurality of pieces of the training data, the plurality of pieces of the training data including new piece of labeled data the label, the dictionary being a parameter of the discrimination function (“Figure 1 illustrates the pool-based active learning cycle. A learner may begin with a small number of instances in the labeled training set L, request labels for one or more carefully selected instances, .

As per claim 6, Settles teaches calculate an importance of a piece of unlabeled data using a density of labeled data (“The main idea is that informative instances should not only be those which are uncertain, but also those which are “representative” of the underlying distribution (i.e., inhabit dense regions of the input space). Therefore, we wish to query instances as follows: [calculation formula] Here, φA(x) represents the informativeness of x according to some “base” query strategy A, such as an uncertainty sampling or QBC approach. The second term weights the informativeness of x…” Settles, p25. Examiner Note: The importance of the piece of unlabeled data is seen as equivalent to Settle’s informativeness.), the piece of unlabeled data being a piece of training data that is arranged in a feature space at a position based on a feature vector of the piece of training data (Settles, Fig. 7, p26, “An illustration of when uncertainty sampling can be a poor strategy for classification. Shaded polygons represent labeled instances in L, and circles , the feature space being a space having an element constituting the feature vector of the piece of training data as a variable (Settles, Fig. 7, p26, “An illustration of when uncertainty sampling can be a poor strategy for classification. Shaded polygons represent labeled instances in L, and circles represent unlabeled instances in U. Since A is on the decision boundary, it would be queried as the most uncertain. However, querying B is likely to result in more information about the data distribution as a whole.” See also “Figure 2: An illustrative example of pool-based active learning. (a) A toy data set of 400 instances, evenly sampled from two class Gaussians. The instances are represented as points in a 2D feature space. (b) A logistic regression model trained with 30 labeled instances randomly drawn from the problem domain.” Examiner Note: Figure 7 demonstrates that a piece of unlabeled data is in a feature space (i.e. the figure itself) where its position in the graph is its corresponding feature vector.), the density of labeled data being a density with respect to a piece of labeled data (Settles, Fig. 7, p25-26, “An illustration of when uncertainty sampling can be a poor strategy for classification. Shaded polygons represent labeled instances in L, and circles represent unlabeled instances in U. Since A is on the decision boundary, it would be queried as the most uncertain. However, querying B is likely to result in more , the piece of labeled data being the piece of training data in the feature space (Figure 1 demonstrates that the labeled instances L in figure 7 represent the training data of the algorithm.  Figure 7: “An illustration of when uncertainty sampling can be a poor strategy for classification. Shaded polygons represent labeled instances in L, and circles represent unlabeled instances in U.” Settles, pages 7 and 26 Examiner Note: On page 7, Settles clarifies that all data is able to be training data in an active learning system (because once unlabeled data is labeled, it is added to the training data).), the density of labeled data being a density of pieces of labeled data in a region where the piece of unlabeled data as a reference is arranged and has a predetermined size (“Here, φA(x) represents the informativeness of x according to some “base” query strategy A, such as an uncertainty sampling or QBC approach. The second term weights the informativeness of x by its average similarity to all other instances in the input distribution (as approximated by U), subject to a parameter β that controls the relative importance of the density term. A variant of this might first cluster U and compute average similarity to instances in the same cluster.” Settles, p25-26. Examiner Note: A circle of radius x qualifies as a region of predetermined size, and is an extremely basic form of clustering (e.g. if its in the circle, its in the cluster).);
select data to be labeled from among the plurality of pieces of unlabeled data using information on closeness of the piece of unlabeled data to a discrimination boundary and information on the calculated importance, the discrimination boundary being based on a discrimination function to serve as a basis for discriminating data (“The main idea is that ; and
improving the discrimination function by learning a dictionary using a plurality of pieces of the training data, the plurality of pieces of the training data including new piece of labeled data given the label, the dictionary being a parameter of the discrimination function (“Figure 1 illustrates the pool-based active learning cycle. A learner may begin with a small number of instances in the labeled training set L, request labels for one or more carefully selected instances, learn from the query results, and then leverage its new knowledge to choose which instances to query next. Once a query has been made, there are usually no additional assumptions on the part of the learning algorithm. The new labeled instance is simply added to the labeled set L, and the learner proceeds from there in a standard supervised way. There are a few exceptions to this, such as when the learner is allowed to make alternative types of queries (Section 6.4), or when active learning is combined with semisupervised learning (Section 7.1).” Settles, p6 “Active learning algorithms are generally evaluated by constructing learning curves, which plot the evaluation measure of interest (e.g., accuracy) as a function of the number of new instance queries that are labeled and added to L. 

Settles does not explicitly teach A dictionary learning method comprising: by a processor, calculating.

	Chari teaches A dictionary learning method comprising: by a processor, calculating ([0078] “Processor device 1420 can be configured to implement the methods, steps, and functions disclosed herein. The memory 1430 could be distributed or local and the processor device 1420 could be distributed or singular. The memory 1430 could be implemented as an electrical, magnetic or optical memory, or any combination of these or other types of storage devices. Moreover, the term "memory" should be construed broadly enough to encompass any information able to be read from, or written to, an address in the addressable space accessed by processor device 1420.”).

	Settles and Chari are analogous art because they are both directed to active machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing of the claimed invention to combine Settles’ density based active learning system with Chari’s computer apparatus. The combination would have been obvious to one of ordinary skill in the art because they would have been motivated to integrate Settles’ method into a practical system, which can be accomplished by executing it with a processing apparatus (Chari, [0078]).

Claim 8 is a non-transitory program storage medium apparatus claim corresponding to system claim 1. Claim 8 requires A non-transitory program storage medium on which a computer program is stored, the computer program causing a computer to perform (Chari, [0078] “Processor device 1420 can be configured to implement the methods, steps, and functions disclosed herein. The memory 1430 could be distributed or local and the processor device 1420 could be distributed or singular. The memory 1430 could be implemented as an electrical, magnetic or optical memory, or any combination of these or other types of storage devices. Moreover, the term "memory" should be construed broadly enough to encompass any information able to be read from, or written to, an address in the addressable space accessed by processor device 1420.”). Claim 8 is rejected for the same reasons as claim 1. 


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. U.S. Pub. No. 10402685 B2 to Guyon et Weston, U.S. Pub. No. 20140201126 A1 to Zadeh et Tadayon, and U.S. Pub. No. 20130054603 A1 to Birdwell et al.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL G SMITH whose telephone number is (571)272-9730. The examiner can normally be reached on Monday-Friday from 9:30 A.M. to 6:00 P.M. EST. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo, can be reached at telephone number 571-272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
/PAUL GORDON SMITH/Examiner, Art Unit 2126                                                                                                                                                                                                        
/MICHAEL J HUNTLEY/Primary Examiner, Art Unit 2116