DETAEILD ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 2022-08-11 has been entered.  The status of the claims is as follows:
Claims 2-4 and 9-11 are cancelled.
Claims 1, 5-8, and 12-15 remain pending in the application.
Claims 1, 8, and 15 have been amended.
Response to Arguments
Applicant’s arguments with respect to rejections under 35 USC 103 have been considered but are moot because the amendments to the claims have led to new grounds of rejection.  Applicant argues on Remarks Pages 7-10 against the combination of Dai, Blum, and Tuarob.   However, Applicant’s amendments to the claims have necessitated the addition of a new reference to the combination.
For the record, Examiner addresses Applicant’s argument against the mapping of the original claim language to Tuarob.  Applicant argues that “Rather than combine or "average" or even utilize in any way each and every one of the weighted votes or labels (which, as described by the Patent Office is the method disclosed by the prior art), the embodiments recited in the claims select one vote/label over all the other votes/labels as the consensus label”.  Examiner points out that while Tuarob does indeed calculate a weighted average of all the votes, Tuarob does indeed select one label (in Tuarob’s case, one of 2 labels), as stated in Tuarob Page 261, end of Section 4.3.2:  “For the WPA method, an instance is classified as positive if the final probability estimate is equal to or greater than the probability cutoff, and negative otherwise.”  Thus, the result of this comparison from the weighted average is the “highest weighted vote”, as the weighted vote corresponds to the most likely (“highest”) label.
Nevertheless, Applicant’s amended language clarifies that the consensus label is an individual highest weighted value for a given label, rather than a weighted vote that is produced based on a weighted average.  For this, Examiner relies on Kuncheva et al..

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 6-8, and 13-15 are rejected under 35 U.S.C. 103 as being unpatentable over Dai et. al. (“Unlock Big Data Emotions: Weighted Word Embeddings for Sentiment Classification”; hereinafter “Dai”) in view of Blum et. al. (“Combining Labeled and Unlabeled Data with Co-Training”; hereinafter Blum), Tuarob et. al. (“An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages”; hereinafter “Tuarob”), and Kuncheva et al. (“A weighted voting framework for classifiers ensembles”; hereinafter “Kuncheva”).
As per Claim 1, Dai teaches a method of training a classifier to classify one or more emotions based on social media content and at least some of the input data is social media content obtained from a social media platform (Dai, Pg 3833 Intro Para 2, discloses:  “Sentiment classification [33] [45] plays an important role in sentiment analysis. The two prominent methods are lexicon-based methods and learning-based methods. Both methods rely on bag-of-words (BoW) model. These methods disregard context, grammar and even word order. They cannot sufficiently capture the complex linguistic characteristics of words. In addition, social media are usually short text and do not provide sufficient word occurrences for conventional BoW methods to work reliably.”  Here, Dai discloses that their paper is directed to a method of sentiment (emotion) classification based on social media content.  Dai, Pg 3834 Section III A, discloses:  “The proposed WWE method has two phases: the training phase and the prediction phase (Figure 1). A large collection of text documents (e.g., tweets or Wikipedia articles) were trained to a word embedding model during the training phase. The semantic relationships between words can be calculated by the cosine distances of the trained vectors. The prediction phase is illustrated on the right side. The testing data (e.g., a tweet) was given to the WWE classifier. The WWE classifier has a few committee members. Each committee member computes a polarity of the whole tweet using part-of-speech weight. At the end, all committee members vote and collectively determine the final polarity.”  Here, Dai discloses that social media content (tweets) are input).
Dai also teaches a distributional feature-based classifier (Dai, Pg 3833 Abstract, discloses:  “According to the cosine similarity between the vector of a word and the vectors of seed words, a polarity score of this word can be calculated.”  This is analogous to the Instant Specification’s description of a distributional feature-based classifier, [0055-0061] which describes calculating distances between words and emotional seed tokens.)
However, Dai does not teach receiving labeled input data and unlabeled input data; extracting, from the labeled input data, a first set of features belonging to a first feature space; extracting, from the labeled input data, a second set of features belonging to a second feature space different from the first feature space; extracting, from the labeled input data, a third set of features belonging to a third feature space different from the first feature space and the second feature space; training a first classifier using the first feature set and applying the trained first classifier to the unlabeled input data to predict a first label, wherein the first classifier is a lexical feature-based classifier; training a second classifier using the second feature set and applying the trained second classifier to the unlabeled input data to predict a second label, wherein the second classifier is a semantic feature-based classifier; training a third classifier using the third feature set and applying the trained third classifier to the unlabeled input data to predict a third label, wherein the third classifier is a distributional feature-based classifier (Dai does not teach a third classifier, but does teach distributional feature-based classifier as shown above); identifying a consensus label for the unlabeled input data based on the first label, the second label, and the third label, wherein identifying the consensus label comprises: () weighting each of the first label, second label, and third label according to respective weights associated with the first, second, and third classifier to produce weighted votes for each unique label; and (ii) selecting the unique label having a highest weighted vote; expanding the labeled input data with supplementary unlabeled data and its consensus label; and retraining at least one of the first classifier and the second classifier based on a training example comprising the expanded labeled input data and the consensus label. 
Blum teaches a method of training a classifier (Blum, Pg 7 Section 6, discloses:  “In order to test the idea of co-training, we applied it to the problem of learning to classify web pages”.  Here, Blum discloses training (“co-training”) a classifier (“classify web pages”)).
receiving labeled input data and unlabeled input data (Blum, Pg 8 Table 1, discloses:  

    PNG
    media_image1.png
    359
    732
    media_image1.png
    Greyscale

Here, Blum discloses receiving (“given”) labeled input data (“a set L of labeled training examples”) and unlabeled input data (“a set U of unlabeled examples”)).
	extracting, from the labeled input data, a first set of features belonging to a first feature space (Blum, Pg 1 Abstract, discloses:  “In particular, we consider a setting in which the description of each example can be partitioned into two distinct views, motivated by the task of learning to classify web pages. For example, the description of a web page can be partitioned into the words occurring on that page, and the words occurring in hyperlinks that point to that page.”  Here, Blum discloses a first feature space (“words occurring on that page”, which is one of “two distinct views”).  Blum, Pg 1 Abstract, continues:  “We assume that either view of the example would be sufficient for learning if we had enough labeled data, but our goal is to use both views together to allow inexpensive unlabeled data to augment a much smaller set of labeled examples. Specifically, the presence of two distinct views of each example suggests strategies in which two learning algorithms are trained separately on each view, and then each algorithm's predictions on new unlabeled examples are used to enlarge the training set of the other.” Here, Blum discloses training based on the data that is belonging to a first feature space (“view”), and in order to use the data for training, the data must be extracted for use.  Thus, Blum discloses extracting a first set of features belonging to a first feature space.  Blum also discloses labeled input data (“a smaller set of labeled examples”)).  Therefore, Blum discloses extracting, from the labeled input data, a first set of features belonging to a first feature space.)
	extracting, from the labeled input data, a second set of features belonging to a second feature space different from the first feature space  (Blum, Pg 1 Abstract, discloses:  “In particular, we consider a setting in which the description of each example can be partitioned into two distinct views, motivated by the task of learning to classify web pages. For example, the description of a web page can be partitioned into the words occurring on that page, and the words occurring in hyperlinks that point to that page.”  Here, Blum discloses a second feature space (“words occurring in hyperlinks that point to that page”, which is one of “two distinct views”).  Blum, Pg 1 Abstract, continues:  “We assume that either view of the example would be sufficient for learning if we had enough labeled data, but our goal is to use both views together to allow inexpensive unlabeled data to augment a much smaller set of labeled examples. Specifically, the presence of two distinct views of each example suggests strategies in which two learning algorithms are trained separately on each view, and then each algorithm's predictions on new unlabeled examples are used to enlarge the training set of the other.” Here, Blum discloses training based on the data that is belonging to a second feature space (“view”), and in order to use the data for training, the data must be extracted for use.  Thus, Blum discloses extracting a second set of features belonging to a second feature space.  Blum also discloses labeled input data (“a smaller set of labeled examples”)).  Therefore, Blum discloses extracting, from the labeled input data, a second set of features belonging to a second feature space.  Blum also discloses “two distinct views”, and thus discloses a second feature space different from the first feature space.)
extracting, from the labeled input data, a third set of features belonging to a third feature space different from the first feature space and the second feature space (Blum, Page 10 Last Paragraph, discloses:  “Similar problems exist in many perception learning tasks involving multiple sensors. For example, consider a mobile robot that must learn to recognize open doorways based on a collection of vision (X1), sonar (X2), and laser range (X3) sensors. The important structure in the above problems is that each instance x can be partitioned into subcomponents xi, where the xi are not perfectly correlated, where each xi can in principle be used on its own to make the classification, and where a large volume of unlabeled instances can easily be collected.”  Here, Blum suggests expanding the two-feature co-training algorithm to three features.  Here, Blum suggests a third feature space (“laser range sensors”) different from the first and second feature spaces (“vision” and “sonar”)).
	training a first classifier using the first feature set extracted from the labeled input data and applying the trained first classifier to the unlabeled input data to predict a first label (Blum, Pg 8 Table 1, discloses:

    PNG
    media_image1.png
    359
    732
    media_image1.png
    Greyscale

Here, Blum discloses training a first classifier using the first feature set (“Use L to train a classifier h1 that considers only the x1 portion of x”, wherein L is “labeled” training examples of x1 and thus the first feature set).  Blum also discloses applying the trained first classifier to the unlabeled input data to predict a first label, as they disclose “Allow h1 to label p positive and n negative examples from U’”, wherein h1 is the first classifier and U’ is unlabeled input data since it is chosen from U, which is “a set U of unlabeled examples”.    This is used to predict a first label (“label p positive and n negative examples”)).
	training a second classifier using the second feature set extracted from the labeled input data and applying the trained second classifier to the unlabeled input data to predict a second label (Blum, Pg 8 Table 1, discloses:

    PNG
    media_image1.png
    359
    732
    media_image1.png
    Greyscale

Here, Blum discloses training a second classifier using the second feature set (“Use L to train a classifier h2 that considers only the x2 portion of x”, wherein L is “labeled” training examples of x2 and thus the second feature set).  Blum also discloses applying the trained second classifier to the unlabeled input data to predict a second label, as they disclose “Allow h2 to label p positive and n negative examples from U’”, wherein h2 is the second classifier and U’ is unlabeled input data since it is chosen from U, which is “a set U of unlabeled examples”.    This is used to predict a second label (“label p positive and n negative examples”)).
training a third classifier using the third feature set extracted from the labeled input data and applying the trained third classifier to the unlabeled input data to predict a third label, wherein the third classifier is a distributional feature-based classifier (Blum, Page 10 Last Paragraph, discloses:  “Similar problems exist in many perception learning tasks involving multiple sensors. For example, consider a mobile robot that must learn to recognize open doorways based on a collection of vision (X1), sonar (X2), and laser range (X3) sensors. The important structure in the above problems is that each instance x can be partitioned into subcomponents xi, where the xi are not perfectly correlated, where each xi can in principle be used on its own to make the classification, and where a large volume of unlabeled instances can easily be collected.”  Here, Blum suggests expanding the two-feature co-training algorithm to three features.  Here, Blum suggests a third feature space (“laser range sensors”) and using that classifier to predict a third label (“where each xi can in principle be used on its own to make the classification”).
Also, recall above that Dai teaches a distributional feature-based classifier, and thus in combination with Blum teaches that the third classifier is a distributional feature-based classifier.)
	expanding the labeled input data with supplementary unlabeled data and its [consensus] label (Blum, Table 1, discloses:

    PNG
    media_image1.png
    359
    732
    media_image1.png
    Greyscale

Here, Blum discloses “Add these self-labeled examples to L”, wherein supplementary unlabeled data with its predicted label (“self-labeled examples”, originally from unlabeled set U) are used for expanding the labeled input data (“Add…to L”, wherein L is the set of labeled data)). *Consensus label taught by Tuarob below.
	retraining at least one of the first classifier and the second classifier based on a training example comprising the expanded labeled input data and the [consensus] label (Blum, Table 1, discloses:

    PNG
    media_image1.png
    359
    732
    media_image1.png
    Greyscale

Here, Blum discloses expanded labeled input data with its predicted label (“Add these self-labeled examples to L”).  Blum also discloses “Loop for k iterations”.  If one goes back to the beginning of the loop, it returns to “Use L to train a classifier h1” and “Use L to train a classifier h2”. Therefore, Blum discloses retraining at least one of the first classifier and the second classifier based on a training example comprising the expanded labeled input data.) *Consensus label taught by Tuarob below.
	Dai and Blum are analogous art because they are both in the field of endeavor of machine learning.
	It would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention to combine the social media sentiment classification of Dai with the co-training of Blum.  This would allow the use of unlabeled examples during training, and one would be motivated to do so in order to save time and resources by avoiding the human effort required to collect a sufficient amount of labeled data (Blum, Intro:  “In many machine learning settings, unlabeled examples are significantly easier to come by than labeled ones [6, 17]. One example of this is web-page classification. Suppose that we want a program to electronically visit some web site and download all the web pages of interest to us, such as all the CS faculty member pages, or all the course home pages at some university [3]. To train such a system to automatically classify web pages, one would typically rely on hand labeled web pages. These
labeled examples are fairly expensive to obtain because they require human effort. In contrast, the web has hundreds of millions of unlabeled web pages that can be inexpensively gathered using a web crawler. Therefore, we would like our learning algorithm to be able to take as much advantage of the unlabeled data as possible.”)
	However, the combination of Dai and Blum thus far does not teach wherein the first classifier is a lexical feature-based classifier; wherein the second classifier is a semantic feature-based classifier; identifying a consensus label for the unlabeled input data based on the first label, the second label, and the third label, wherein identifying the consensus label comprises: (i) weighting each of the first label, second label, and third label according to respective weights associated with the first, second, and third classifier to produce weighted votes for each unique label; and (ii) selecting the unique label having a highest weighted vote
Tuarob teaches wherein the first classifier is a lexical feature-based classifier; wherein the second classifier is a semantic feature-based classifier (Tuarob, Pg 264 Section 5.6, discloses:  “Each of our feature set reflects a different view of the dataset–the NG features reflect the word patterns used in each document, the MC features capture the semantics of the health related terms by capturing the usage of terms appearing together in the same document, the TD features extract topical semantics of the document, and the ST features capture the sentiment semantics of document in terms of level of illness and emotional variants.” Here, Tuarob discloses a lexical feature-based classifier (“NG features reflect the word patterns used in each document”) and a semantic feature-based classifier (“MC features capture the semantics of the health related terms by capturing the usage of terms appearing together in the same document”)).
Tuarob teaches identifying a consensus label for the [unlabeled] input data based on the first label, the second label, and the third label (Recall that Blum above discloses unlabeled input data.  Tuarob, Pg 356 bottom right, discloses:  “1. Proposes to use 5 heterogeneous feature types which represent different aspects of semantics for identification of health related messages in social media. Parameter sensitivity is studied to find the best parameter configuration and base classifier for each feature type. 2. Explores the use of different ensemble methods that allow base classifiers trained with different feature types to make collective decisions.”  Here, Tuareg discloses identifying a consensus label (“classifiers …make collective decisions”) based on the first label, the second label, and the third label (“5 heterogeneous feature types…base classifier for each feature type”)).
wherein identifying the consensus label comprises: (i) weighting each of the first label, second label, and third label according to respective weights associated with the first, second, and third classifier (Tuarob, Pg 261 Section 4.3.2, discloses:  “Weighted Probability Averaging (WPA): Each classifier is given a weight, where the sum of all weights is 1. Each classifier outputs a probability estimate of the positive class. The final output is the weighted average of all the classifiers”, and in the same section states:  “For the WPA method, an instance is classified as positive if the final probability estimate is equal to or greater than the probability cutoff, and negative otherwise.”  Here, Tuarob discloses setting up a cutoff in a continuous value range of a weighted average, where either side of the cutoff represents a different consensus label.)
Tuarob and the combination of Dai and Blum are analogous art because they are both in the field of endeavor of machine learning applied to social media messages.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the social media sentiment classifier with co-training of Dai and Blum, with the social media message classifier with consensus label of Tuarob. The modification would have been obvious because one of ordinary skill in the art would be motivated to leverage different features of the same social media messages in order to improve the accuracy of classifiers to label those messages (Tuarob, Pg 266 Section 6: “Our results are very promising and reaffirm our assumption that the limitation of the N-gram features on the social media domain can be reduced by combining classifiers that learn different characteristics of the data.”)  Furthermore, Tuarob themselves actually suggest combining their technique with Blum’s co-training. (Tuarob, Pg 266 Section 6:  “Future works could seek to improve the classification algorithm [62,63] and to employ semi-supervised methods such as the co-training technique [64] to expand the training data with unlabeled data.”).
However, Tuarob does not explicitly teach wherein identifying the consensus label comprises: (i) weighting each of the first label, second label, and third label according to respective weights associated with the first, second, and third classifier to produce a weighted vote for each of the first label, the second label, and the third label; and (ii) selecting as the consensus label that of the first weighted vote, the second weighted vote, and the third weighted vote having a highest weight.
Kuncheva teaches wherein identifying the consensus label comprises: (i) weighting each of the first label, second label, and third label according to respective weights associated with the first, second, and third classifier to produce a weighted vote for each of the first label, the second label, and the third label; and (ii) selecting as the consensus label that of the first weighted vote, the second weighted vote, and the third weighted vote having a highest weight.  (Kuncheva, Top of Page 263, discloses:  “The weighted majority vote is among the most intuitive and widely used combiners [11,20].  It is the designated combination method derived from minimising a bound on the training error in AdaBoost [4,6]. Freund and Schapire [6] offer a similar probabilistic explanation as an alternative justification for the weights in the two-class version of AdaBoost. Here, we use our framework to derive the multi-class version of the weighted majority vote, and specify the conditions for its optimality. The weighted majority vote follows from relaxing the assumption about equal individual accuracies. Thus, it will be the optimal combiner when the accuracies are equal as well, and the MV combiner is its exact reduced version.”  Below this, Kuncheva discloses “weights”:  “Dropping the first term, which will not influence the class decision, and expressing the classifier weights as”.  Here, Kuncheva discloses a modified version of the majority vote, as Kuncheva states when the weights are equal, the method reduces to MV (majority vote).  In Majority Vote, one selects the label with the highest vote, as stated on Kuncheva Page 262: “Since one of our aims is to give practical recommendations, in the experiments in this study, we adopted the standard majority vote formulation, whereby the class label is obtained by

    PNG
    media_image2.png
    33
    115
    media_image2.png
    Greyscale
”
Thus, Kuncheva’s Weighted Majority Vote discloses choosing a label with the highest weighted vote.)
	Kuncheva and the combination of Dai, Blum, and Tuarob are analogous art because they are both in the field of endeavor of machine learning.  Tuarob discloses various methods of combining an ensemble of classifiers, including Majority Vote and Weighted Probability Averaging.  These are just two of several possible ways of combining classifiers.  Kuncheva’s Weighted Majority Vote is another way of weighting classifiers to arrive at a consensus result, although instead of calculating a weighted average and comparing to a cutoff like in WPA, Kuncheva chooses the highest weighted vote from the individual weighted votes.
	It would have been obvious before the effective filing date of the claimed invention to combine the Weighted Majority Vote of Kuncheva with the classifier ensemble of Dai, Blum, and Tuarob.  One of ordinary skill in the art would be motivated to do so in order to improve the accuracy of results by giving more weight to classifiers that are more significant (Tuarob Page 264 Section 5.5: “The CB classifier is given most weight due to having the most extensive view of the data. The DC classifier is given a twice higher weight compared to TD and ST classifiers since it addresses both the problems posed by the baseline, while the others address only one problem.”) and/or classifiers that are more accurate (Kuncheva Page 261: “Different individual accuracies. When P(si = ωk |ωk ) = pi and P(si = ωj |ωk ) = 1−pi c−1 , for any k, j = 1, . . . ,c j = k, then the weighted majority vote is the optimal combiner with weights as derived in Sect. 2.3.” and Page 263:  “The weighted majority vote follows from relaxing the assumption about equal individual accuracies.”)  Examiner also points out that Weighted Majority Vote is very common in the art, as pointed out by Kuncheva Page 263:  “The weighted majority vote is among the most intuitive and widely used combiners [11,20].”
	Furthermore, while Tuarob’s method is only explicitly disclosed as having one cutoff to split the combined average into one of two classes, one of ordinary skill in the art would be motivated to use Kuncheva’s method in order to be able to perform weighted classification for more than two classes (Kuncheva Page 263:  “Here, we use our framework to derive the multi-class version of the weighted majority vote, and specify the conditions for its optimality”).
	
As per Claim 6, the combination of Dai, Blum, Tuarob, and Kuncheva teaches the method of Claim 1 and third set of features (see Rejection to Claim 1).  Tuarob teaches features selected from the group consisting of lexical features, semantic features, and distribution-based features (Tuarob, Pg 264 Section 5.6, discloses:  “Each of our feature set reflects a different view of the dataset–the NG features reflect the word patterns used in each document, the MC features capture the semantics of the health related terms by capturing the usage of terms appearing together in the same document, the TD features extract topical semantics of the document, and the ST features capture the sentiment semantics of document in terms of level of illness and emotional variants.” Here, Tuarob discloses a lexical feature-based classifier (“NG features reflect the word patterns used in each document”) and a semantic feature-based classifier (“MC features capture the semantics of the health related terms by capturing the usage of terms appearing together in the same document”). Tuarob therefore discloses features selected from lexical features and semantic features, and these fall within the group consisting of lexical features, semantic features, and distribution-based features.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Tuarob with the combination of Dai, Blum, and Kuncheva for at least the reasons recited in Claim 1.

As per Claim 7, the combination of Dai, Blum, Tuarob, and Kuncheva teaches the method of Claim 1 as well as the first set of features and the second set of features, wherein the first set of features are different from the second set of features (see Rejection to Claim 1).  Tuarob teaches features selected from the group consisting of lexical features, semantic features, and distribution-based features (Tuarob, Pg 264 Section 5.6, discloses:  “Each of our feature set reflects a different view of the dataset–the NG features reflect the word patterns used in each document, the MC features capture the semantics of the health related terms by capturing the usage of terms appearing together in the same document, the TD features extract topical semantics of the document, and the ST features capture the sentiment semantics of document in terms of level of illness and emotional variants.” Here, Tuarob discloses a lexical feature-based classifier (“NG features reflect the word patterns used in each document”) and a semantic feature-based classifier (“MC features capture the semantics of the health related terms by capturing the usage of terms appearing together in the same document”). Tuarob therefore discloses features selected from lexical features and semantic features, and these fall within the group consisting of lexical features, semantic features, and distribution-based features.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Tuarob with the combination of Dai, Blum, and Kuncheva for at least the reasons recited in Claim 1.

As per Claim 8, Claim 8 is a system claim corresponding to method Claim 1.  The difference is that it recites an interface for receiving data, a memory, a feature extraction module, and a prediction consensus generation module.  (Tuarob, Pg 266 bottom left, discloses:  “The TwitterB data is processed on a server with a 16-core Intel Xenon E5630 (2.5 GHz) processer and 32 GB available RAM.”  Here, Tuarob discloses a memory (“32 GB available RAM”) and an interface for receiving data, as data was received from Twitter.  Tuarob also discloses a processor, as well as algorithms to extract features and generate a consensus (Pg 257 top left:  “Section 4 discusses our proposed methods, including feature extraction along with analysis on parameter sensitivity and ensemble techniques in detail.”), thus Tuarob discloses a feature extraction module and prediction consensus generation module.)  Claim 8 is rejected for the same reasons as Claim 1.

As per Claim 13, Claim 13 is a system claim corresponding to method Claim 6.   Claim 13 is rejected for the same reasons as Claim 6.

As per Claim 14, Claim 14 is a system claim corresponding to method Claim 7.   Claim 14 is rejected for the same reasons as Claim 7.

As per Claim 15, Claim 15 is a computer-readable medium claim corresponding to method Claim 1.  The difference is that it recites a computer-readable medium. (Tuarob, Pg 266 bottom left, discloses:  “The TwitterB data is processed on a server with a 16-core Intel Xenon E5630 (2.5 GHz) processer and 32 GB available RAM.”)  Claim 15 is rejected for the same reasons as Claim 1.

Claims 5 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Dai, Blum, Tuarob, and Kuncheva in view of Tang et. al. (“Co-Tracking Using Semi-Supervised Support Vector Machines”; hereinafter Tang).
As per Claim 5, the combination of Dai, Blum, Tuarob, and Kuncheva teaches the method of claim 1 as well as generating weights for each of the first, second, and third classifier. (see Rejection to Claim 1).  However, the combination of Dai, Blum, Tuarob, and Kuncheva does not explicitly teach generating the classifier weights based on respective performances of the classifiers against an annotated dataset.  
	Tang teaches generating the classifier weights based on respective performances of the classifiers against an annotated dataset.   (Tang, Section 3.1.2, discloses:  “In order to combine trained classifiers into a final classifier we must assign a weight to each of them. Logically, this weight should be based on the accuracy of each classifier. We therefore adapt the concept from AdaBoost [11] of determining the weight of a classifier based on its error on a
labeled validation set.”  Here, Tang discloses generating the classifier weights (“combine trained classifiers…assign a weight to each of them”) based on respective performances of the classifiers against an annotated dataset (“based on its error on a labeled validation set”)).
Tang and the combination of Dai, Blum, Tuarob, and Kuncheva are analogous art because they both in the field of endeavor of machine learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the sentiment classification with co-training with consensus label of Dai, Blum, Tuarob, and Kuncheva with the classifier weighting on labeled data of Tang. The modification would have been obvious because one of ordinary skill in the art would be motivated to minimize errors (Tang, Section 3.1.2: “determining the weight of a classifier based on its error on a labeled validation set”).

As per Claim 12, Claim 12 is a system claim corresponding to method Claim 5.   Claim 11 is rejected for the same reasons as Claim 5.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Laxman et al. (US 2008/0177684 A1), in [0021], discloses using Weighted Majority Vote
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.










/L.A.S./Examiner, Art Unit 2126 
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126