Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because claim 14 is a system claim comprising one or more processors

    PNG
    media_image1.png
    602
    900
    media_image1.png
    Greyscale

	Fig. 1 shows system 100 include 4 components A, B,C, and D. Given broadest reasonable interpreatation, these components are interpreted as software components.
Further, a processor can be a program as defined in American Heritage Dictionary.

    PNG
    media_image2.png
    495
    850
    media_image2.png
    Greyscale

	A system comprising one or more processor (i.e., software/program) configured to perform operations, but recite no hardware in the system to perform the claimed steps. Claim 14 is nothing more than software per se. The claim lacks the necessary physical articles or objects to constitute a machine or manufacture within the meaning of 35 USC 101. They are clearly not a series of steps or acts to be a process nor are they a combination of chemical compounds to be a composition of matter. As such, they fail to fall within a statutory category. They are, at best, functional descriptive material.



Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-5, and 13-15 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wang (U.S. Pub 2012/0263376 A1)
Claim 1
Wang discloses a method for lifelong machine learning using boosting, the method comprising:
receiving a new task and a learning sample for the new task ([0021], line 1-2, “... the new training samples are collected...” [0024], line 6-8, “... The model trained with the initial training samples may be updated with the newly added samples...” <examiner note: ht e task of a model is to make predictions. The model is updated so that its prediction/task is more accutate. New added samples <=> learning sample>);
learning a distribution of weights over the learning sample using previously learned classifiers from old tasks ([0025], “... the input technique may be a group of weak classifiers learned so far Li0, i=1, . . . M, a new training sample s to arrive (with ys as it's ground truth label), and an incremental learning technique which utilizes the new training sample to update the existing weak classifier BaseUpdate(Li0,s)...” [0027], “... For each weak classifier Li0, i=1, . . . M; [0031] Update the new sample's weight...” <examiner note: using the set of weak classifiers that are previously trained for old taks/predictions based on initial samples to learn the distribution of weights of newly added samples>); and 
learning a set of task-specific classifiers for the new task using a boosting algorithm and the distribution of weights over the learning sample, whereby the distribution of weights over the learning sample is updated using the task-specific classifiers for the new task ([0034], line 4-13, “... When the new sample is misclassified by a weak classifier Li0, -ysfi(s) would be positive, so the weight λs associated with this sample is increased when presented to the next weak classifier; otherwise, the weight λs will be decreased. The principal idea of the online boosting technique is to process the new sample as if it were already included in the initial training set, i.e., also passing it from the first weak classifier to the last, and modifying the sample's weight at each step before passing to the next classifier...” <examiner note: weak classifiers are updated/learned to make predictions/new task using newly added samples and boosting algorithm. Further, the distribution of weight of new samples are adjusted by weak classifiers for the new predictions/tasks>) 
Claims 14 and 15 are similar to claim 1. Therefore, claim 14-15 are rejcted based on similar reasons

Claim 2
Claim 1 is included, Wang discloses further comprising updating the distribution of weights based on performance of the task-specific classifiers on the learning sample ([0034], line 4-13, “... When the new sample is misclassified by a weak classifier Li0, -ysfi(s) would be positive, so the weight λs associated with this sample is increased when presented to the next weak classifier; otherwise, the weight λs will be decreased...” <examiner note: the weights of new samples are adjusted based on whether the sample is correctly/incorrectly classfied by weak classifiers>)
Claim 3
Claim 2 is included, Wang further comprising selecting training examples from the learning sample based on the performance of the task-specific classifiers on the learning sample ([0034], line 4-13, “... When the new sample is misclassified by a weak classifier Li0, -ysfi(s) would be positive, so the weight λs associated with this sample is increased when presented to the next weak classifier...” <examiner note: a sample that is misclassified by early classifier is selected and its weight is increased so that the next weak classifier is trained on this misclassified sample>)
Claim 4
Claim 3 is included, Wang further discloses wherein a portion of the examples of the learning sample having the highest weights are selected as the training examples, and wherein the highest weights correspond to the lowest classification accuracy of the task-specific classifiers on the portion of the examples ([0041], “...if the decision tree is trained, and it is desirable to update it with a new sample, the system may first pass the sample from the root to corresponding branch according to the criteria at each internal node, and recalculate the "purity" score for each node. If the purity score of a node is not high enough, it should be re-split based on all previous samples and this new sample. Therefore, some information should be maintained about previous samples so that it's possible to recalculate the purity score of each node. For variables with discrete value, this statistical information can be obtained by counting the number of samples with each value; however, if the variable is real-valued, to precisely maintain the distribution information, all the previous feature values that have appeared should be stored for future use...”)
Claim 5
Claim 4 is included, Wang further discloses wherein the portion of the examples is less than 30% of a total number of examples of the learning sample ([0043] In the initial stage, the system obtains the representative samples of initial training samples by using a suitable technique, such as a K-means clustering technique on positive and negative samples separately. The cluster centers are selected as the representative samples, denoted as {{circumflex over (x)}.sub.i, i=1, . . . , n} and their weights are taken as the number of samples in the corresponding cluster, denoted as {s.sub.i, i=1, . . . , n}.)
Claim 13
Claim 1 is included, Wang wherein the tasks are in at least one of the medicine, predictive planning or transportation fields, and the learned task-specific classifiers for the tasks are applied in at least one of these fields for at least one of a medical diagnosis, a product demand prediction, a transportation demand prediction or a ridership prediction ([0003] Such classifiers may be trained using training samples, and then used in a so-called testing or prediction stage, to classify test samples. For example, such classifiers may be used in an automated factory inspection application to detect defects in an image of a product. In this case, a "sample" may consist of a set of data derived from an image or pixels in a region of the image, and the task of the classifier is to classify "sample" as "defect" (positive class) or "non-defect" (negative class). As another example, such classifiers may be used to classify defects into different categories....” <examiner note: This application can be used in medicine field, for instance, to classify a sample is benign tumor or cancer>)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 6-9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang (U.S. Pub 2012/0263376 A1), as applied to claim 1, and further in view of Li (U.S. Pub 2006/0062451 A1)
Claim 6
Claim 1 is included, however, Wang does not explicitly disclose pruning one or more of the task-specific classifiers based on performance of the task-specific classifiers on the learning sample.
	Li discloses pruning one or more of the task-specific classifiers based on performance of the task-specific classifiers on the learning sample ([0033], line 17-30, “... A determination is then made as to which of the current set of optimal weak classifiers is the least significant classifier (process action 210). The least significant classifier includes the feature when matching that is the least likely to predict whether a training example matches the classification of a particular classifier. The overall cost for the current set of optimal weak classifiers is next computed, as shown in process action 212 of FIG. 2B, using the cost function. The least significant classifier for the current set of optimal weak classifiers is then conditionally removed (process action 214) and the overall cost for the current set of optimal weak classifiers is computed, less the least significant classifier, using the cost function (process action 216)...” <examiner note: the least significant classifier is removed>)
	Wang discloses an ensemble of weak classifiers are updated using boosting technique to improve the prediction of the ensemble. however, Wang does not disclose pruning weak classifiers based on performance of weak classifiers. Li discloses removing weak classifiers to improve performance of ensemble of weak classifiers. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate FloatBoost learning procedure as disclosed by Li into Wang to identify a set of optiomal classifiers, less the least significant classifiers to lower overall cost (i.e., higher performance in predictions) of ensenble of weak classifiers.
Claim 7
Claim 6 is included, Li further discloses further comprising storing the task-specific classifiers which were not pruned, and using the stored task-specific classifiers for a subsequent iteration of the step of learning the distribution of weights over the learning sample using the previously learned classifiers which is performed for a subsequent task ([0051], “... In Step 2 (forward inclusion), the currently most significant weak classifier is added one at a time, which is the same as in AdaBoost. In Step 3 (Conditional Exclusion), FloatBoost removes the least significant weak classifier from H.sub.M, subject to the condition that the removal leads to a lower cost than JM-1min (which is not done in AdaBoost). Supposing that the removed weak classifier was the m'-th in HM, then hm', . . . , hM will be re-learned. This is repeated until no more removals can be done...”)
Claim 8
Claim 6 is included, Li further discloses further comprising learning weights over the task- specific classifiers which were not pruned using training examples from the old tasks to update a distribution of weights over the training examples from the old tasks, and storing the training examples from the old tasks with the updated distribution of weights for a subsequent iteration of the step of learning the distribution of weights over the learning sample using the previously learned classifiers which is performed for a subsequent task ([0033], line 49-62, “... Next, it is determined if the number of weak classifiers in the current set of optimal weak classifiers equals the prescribed maximum number of weak classifiers or the last computed overall cost for the current set of optimal weak classifiers exceeds the acceptable maximum cost, as shown in process action 228. Whenever it is determined that the number of weak classifiers in the current set of optimal weak classifiers does not equal the prescribed maximum number of weak classifiers or the last computed overall cost for the current set of optimal weak classifiers exceeds the acceptable maximum cost (process action 230), the foregoing process starting with determining which of the set of weak classifiers is the most significant classifier (process action 206) is repeated...”).
Claim 9
Claim 8 is included, Wang discloses wherein the training examples are selected based on performance of examples of learning samples from the old tasks which result in the training examples having higher weights than other ones of the examples of the learning samples (([0034], line 4-13, “... When the new sample is misclassified by a weak classifier Li0, -ysfi(s) would be positive, so the weight λs associated with this sample is increased when presented to the next weak classifier; otherwise, the weight λs will be decreased. The principal idea of the online boosting technique is to process the new sample as if it were already included in the initial training set, i.e., also passing it from the first weak classifier to the last, and modifying the sample's weight at each step before passing to the next classifier...”)

Claim(s) 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang (U.S. Pub 2012/0263376 A1), as applied to claim 1, and further in view of Rabinowitz (U.S. Pub. 2021/0201116 A1)
Claim 10 
Claim 1 is included, however, Wang does not explicitly disclose wherein a neural network is used as a base learner for learning the task-specific classifiers, and wherein, at each iteration of the boosting algorithm, a new head is added to the neural network having classifier-specific parameters that are optimized using the updated distribution over learning sample.
	Rabinowitz discloses wherein a neural network is used as a base learner for learning the task-specific classifiers, and wherein, at each iteration of the boosting algorithm, a new head is added to the neural network having classifier-specific parameters that are optimized using the updated distribution over learning sample ([0045], line 1-6, “... The first DNN 104 in the sequence of DNNs 102 corresponds to a first machine learning task in the sequence of machine learning tasks. That is, the first DNN 104 in the sequence of DNNs 102 is a DNN that is configured to perform a first machine learning task, e.g., through training on appropriate training data...” [0047], line 1-6, “... Subsequent DNNs in the sequence of DNNs 102 correspond to subsequent machine learning tasks in the sequence of machine learning tasks. That is, each subsequent DNN in the sequence of DNNs is a DNN that may be configured to perform a subsequent machine learning task, e.g., through training on appropriate training data...”)
	Wang discloses an ensemble of weak/base classifiers are updated or continuous learning with the newly added sample using boosting algorithm to make better predictions. However, Wang does not explicitly discloses weak/base classifiers are neural network. Rabinowitz discloses a first and subsequent classifiers are DNN (i.e., deep neural network. At each iteration, a new subsequent is added to learn subsequent task. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to inplement ensemble of weak/base classifiers as DNN classifiers so that not only these classifiers are updated using boosting technique, but also learn subsequence machine learning tasks.
Claim(s) 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Wang (U.S. Pub 2012/0263376 A1), in view of Rabinowitz (U.S. Pub. 2021/0201116 A1)	 as applied to claim 10, and further in view of Li (U.S. Pub 2006/0062451 A1)
Claim 11
Claim 10 is included, however, Wang and Rabinowitz do not explicitly disclose further comprising pruning heads from the neural network based on performance of a neural network classifier on the learning sample.
	Li discloses pruning heads from the neural network based on performance of a neural network classifier on the learning sample ([0033], line 17-30, “... A determination is then made as to which of the current set of optimal weak classifiers is the least significant classifier (process action 210). The least significant classifier includes the feature when matching that is the least likely to predict whether a training example matches the classification of a particular classifier. The overall cost for the current set of optimal weak classifiers is next computed, as shown in process action 212 of FIG. 2B, using the cost function. The least significant classifier for the current set of optimal weak classifiers is then conditionally removed (process action 214) and the overall cost for the current set of optimal weak classifiers is computed, less the least significant classifier, using the cost function (process action 216)...” <examiner note: the least significant classifier is removed>)
	Wang and Rabinowitz disclose an ensemble of weak classifiers as DNNs are updated using boosting technique to improve the prediction of the ensemble. However, Wang and Rabinowitz do not disclose pruning weak classifiers as DNNs based on performance of weak classifiers. Li discloses removing weak classifiers to improve performance of ensemble of weak classifiers. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate FloatBoost learning procedure as disclosed by Li into Wang and Rabinowitz to identify a set of optiomal classifiers, less the least significant classifiers to lower overall cost (i.e., higher performance in predictions) of ensenble of weak classifiers.
Claim 12
Claim 11 is included, Li and Rabinowitz disclose further comprising using the neural network including the heads which were not pruned for a subsequent iteration of the method for a subsequent task ([0051], “... In Step 2 (forward inclusion), the currently most significant weak classifier is added one at a time, which is the same as in AdaBoost. In Step 3 (Conditional Exclusion), FloatBoost removes the least significant weak classifier from H.sub.M, subject to the condition that the removal leads to a lower cost than JM-1min (which is not done in AdaBoost). Supposing that the removed weak classifier was the m'-th in HM, then hm', . . . , hM will be re-learned. This is repeated until no more removals can be done...”)

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAU HAI HOANG whose telephone number is (571)270-5894. The examiner can normally be reached 1st biwk: Mon-Thurs 7:00 AM-5:00 PM; 2nd biwk: Mon-Thurs: 7:00 am-5:00pm, Fri: 7:00 am - 4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Robert Beausoliel can be reached on 571 262 3645. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

HAU HAI. HOANG
Primary Examiner
Art Unit 2167



/HAU H HOANG/Primary Examiner, Art Unit 2167