DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims

This action is responsive to the original application filed on 3/23/2017 and the Remarks and Amendments filed on 11/13/2020.  Claims 1-20 are pending and have been examined.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having 



Claims 1-4, 6-11 and 14-20 are rejected under 35 U.S.C. § 103 as being obvious over DeBarr et al. (DeBarr et al., “Spam Detection using Clustering, Random Forests, and Active Learning”, July 16-17, CEAS 2009 – Sixth Conference on Email and Anti-Spam, pp. 1-3, hereinafter “DeBarr”) in view of Lee et al. (US 20170227995 A1, hereinafter “Lee”) and Kandaswamy et al. (Kandaswamy et al., "Improving Deep Neural Network Performance by Reusing Features Trained with Transductive Transference", 2014,  ICANN 2014: Artificial Neural Networks and Machine Learning – ICANN 2014, pp. 1-8, hereinafter “Kandaswamy”). 

Regarding claim 1, DeBarr discloses receive first data; (Page 2, §3.2; “In this work, clustering is used to select an initial set of email messages to be labeled as training examples”, suggesting receiving first data in the form of an initial set of email messages; and Page 1, Column 2;  the training pool comprises the entirety of the first data)
generate first features based on the first data; (Page 2, §3.3; “After the cluster prototype messages were selected for training, feature selection and model selection were performed using leaveone-out cross validation”, suggesting generating first features based on first data (the entirety of the training pool being the first data) through feature selection)
identify a first set of labels for the first data; (Page 2, §3.2; “In this work, clustering is used to select an initial set of email messages to be labeled as training examples”, suggesting identifying a first set of labels for the first data (the entirety of the training pool being the training data) by labeling an initial set of email messages as training examples, the labeled initial set of email messages are the first set of labels for the first data)
train a first machine learning model, using the first features and the first set of labels; and (Page 2, §3.3; “After the cluster prototype messages were selected for training, feature selection and model selection were performed using leave one-out cross validation. In repeated experiments, the model with the best performance was a Random Forest [1] . . . For each tree, a bootstrap sample is drawn from the labeled data and a decision tree is constructed by considering a random subset of features for each decision node in the tree”, suggesting training a first model (a random forest algorithm) using the first features and the first set of labels, the first set of labels being the labeled initial set of email messages; and Page 1, §3; “clustering a sample of messages from the training pool and obtaining labels for cluster medoids; constructing an initial Random Forest for spam detection”, further suggesting training (constructing) a first model (Random Forest) using the first features and first set of labels)
review the first machine learning model to generate a second machine learning model, by: (Page 2, §3.4; “Once labeled, the selected messages are then added to the cluster prototypes and the Random Forest is retrained”, suggesting reviewing a first model (Random Forest) to generate a second model (retrained Random Forest model))
receiving a second set of labels for the first data, and (Page 2, §3.4; “Once the initial Random Forest model is constructed, additional messages are selected for labeling by choosing examples from the training pool where the probability of spam assigned by the Random Forest model is closest to 0.5”, suggesting receiving a second set of labels (additional messages selected for labeling that, when labeled, become the second set of labels) for the first data, the first data being the entirety of the training pool discussed in § 3)
DeBarr fails to explicitly disclose [a] processing device, comprising: a non-transitory memory storing instructions; and one or more processors in communication with the non-transitory memory, wherein the one or more processors execute the instructions to; training the second machine learning model using the second set of labels and reusing the first features generated based on the first data.
Lee discloses [a] processing device, comprising: a non-transitory memory storing instructions; and one or more processors in communication with the non-transitory memory, wherein the one or more processors execute the instructions to ([0111]; a processor with memory is disclosed).
DeBarr and Lee are analogous art because both are concerned with retraining models using supervised machine learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in supervised machine learning to combine the processing device, processor, and memory of Lee with the model training techniques of DeBarr to yield the predictable result of a processing device, comprising: a non-transitory memory storing instructions; and one or more processors in communication with the non-transitory memory, wherein the one or more processors execute the instructions to: receive first data; generate a plurality of first 
Kandaswamy discloses training the second machine learning model using the second set of labels and reusing the first features generated based on the first data (Abstract; “We use deep neural networks to transfer either low or middle or higher-layer features for a machine trained in either unsupervised or supervised way”, which teaches transfer learning which trains a second model using a second set of labels and reuses the first model’s features generated based on a first data; and Page 4, ¶2; “In the case of the FT approach we reuse the fully trained supervised features S(wS) ⇒ wT of the source problem and then fine-tune again the entire classifier S (wT , cT ) for the target”, which discloses training a second model (the again fine tuned classifier) using a second set of labels and reusing the first features (the reused fully trained supervised features) generated based on the first data; and Page 6, §4.1; “Classifying images of lowercase from a-to-z by reusing supervised features of digits from 0-to-9. We train a CNN to solve Latin digits (specific source problem) and reuse it to solve a lowercase letters (different but related target problem) without having to train it from scratch”, further suggesting the reusing of first features (supervised features of digits) to then train a second model and a second set of labels (the labels a to z derived from images of lowercase); and see generally §2; the section discloses generally training a second model by reusing a first model’s trained features).
DeBarr, Lee, and Kandaswamy are analogous art because all are concerned with retraining models using supervised machine learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in supervised machine learning to combine the reusing feature to train a second model as taught by Kandaswamy with the model training techniques of DeBarr and Lee to yield the predictable result of training the second model using the second set of labels and reusing the first features generated based on the first data.  The motivation for doing so would be to achieve significant performance by transferring learning from source to target problem, by using lower-layer features trained in supervised fashion in case of CNN’s and unsupervised features trained in case of SDA’s (Kandaswamy; Conclusion).


Regarding claim 16, DeBarr discloses receive first data; (Page 2, §3.2; “In this work, clustering is used to select an initial set of email messages to be labeled as training examples”, suggesting receiving first data in the form of an initial set of email messages; and Page 1, Column 2;  the training pool comprises the entirety of the first data)
generate first features based on the first data; (Page 2, §3.3; “After the cluster prototype messages were selected for training, feature selection and model selection were performed using leaveone-out cross validation”, suggesting generating first features based on first data (the entirety of the training pool being the first data) through feature selection)
identify a first set of labels for the first data; (Page 2, §3.2; “In this work, clustering is used to select an initial set of email messages to be labeled as training examples”, suggesting identifying a first set of labels for the first data (the entirety of the training pool being the training data) by labeling an initial set of email messages as training examples, the labeled initial set of email messages are the first set of labels for the first data)
train a first model, using the first features and the first set of labels; and (Page 2, §3.3; “After the cluster prototype messages were selected for training, feature selection and model selection were performed using leave one-out cross validation. In repeated experiments, the model with the best performance was a Random Forest [1] . . . For each tree, a bootstrap sample is drawn from the labeled data and a decision tree is constructed by considering a random subset of features for each decision node in the tree”, suggesting training a first model (a random forest algorithm) using the first features and the first set of labels, the first set of labels being the labeled initial set of email messages; and Page 1, §3; “clustering a sample of messages from the training pool and obtaining labels for cluster medoids; constructing an initial Random Forest for spam detection”, further suggesting training (constructing) a first model (Random Forest) using the first features and first set of labels)
review the first model to generate a second model, by: (Page 2, §3.4; “Once labeled, the selected messages are then added to the cluster prototypes and the Random Forest is retrained”, suggesting reviewing a first model (Random Forest) to generate a second model (retrained Random Forest model))
receiving a second set of labels for the first data, and (Page 2, §3.4; “Once the initial Random Forest model is constructed, additional messages are selected for labeling by choosing examples from the training pool where the probability of spam assigned by the Random Forest model is closest to 0.5”, suggesting receiving a second set of labels (additional messages selected for labeling that, when labeled, become the second set of labels) for the first data, the first data being the entirety of the training pool discussed in § 3)
DeBarr fails to explicitly disclose [a] computer implemented method; training the second model using the second set of labels and reusing the first features generated based on the first data.
Lee discloses [a] computer implemented method ([0111]; a processor with memory is disclosed, and thus a computer implemented method can be achieved).
The motivation to combine DeBarr and Lee is the same as discussed above with respect to claim 1.
Kandaswamy discloses training the second machine learning model using the second set of labels and reusing the first features generated based on the first data (Abstract; “We use deep neural networks to transfer either low or middle or higher-layer features for a machine trained in either unsupervised or supervised way”, which teaches transfer learning which trains a second model using a second set of labels and reuses the first model’s features generated based on a first data; and Page 4, ¶2; “In the case of the FT approach we reuse the fully trained supervised features S(wS) ⇒ wT of the source problem and then fine-tune again the entire classifier S (wT , cT ) for the target”, which discloses training a second model (the again fine tuned classifier) using a second set of labels and reusing the first features (the reused fully trained supervised features) generated based on the first data; and Page 6, §4.1; “Classifying images of lowercase from a-to-z by reusing supervised features of digits from 0-to-9. We train a CNN to solve Latin digits (specific source problem) and reuse it to solve a lowercase letters (different but related target problem) without having to train it from scratch”, further suggesting the reusing of first features (supervised features of digits) to then train a second model and a second set of labels (the labels a to z derived from images of lowercase); and see generally §2; the section discloses generally training a second model by reusing a first model’s trained features).
The motivation to combine DeBarr, Lee, and Kandaswamy is the same as discussed above with respect to claim 1.

Regarding claim 20, DeBarr discloses receive first data; (Page 2, §3.2; “In this work, clustering is used to select an initial set of email messages to be labeled as training examples”, suggesting receiving first data in the form of an initial set of email messages; and Page 1, Column 2;  the training pool comprises the entirety of the first data)
generate first features based on the first data; (Page 2, §3.3; “After the cluster prototype messages were selected for training, feature selection and model selection were performed using leaveone-out cross validation”, suggesting generating first features based on first data (the entirety of the training pool being the first data) through feature selection)
identify a first set of labels for the first data; (Page 2, §3.2; “In this work, clustering is used to select an initial set of email messages to be labeled as training examples”, suggesting identifying a first set of labels for the first data (the entirety of the training pool being the training data) by labeling an initial set of email messages as training examples, the labeled initial set of email messages are the first set of labels for the first data)
train a first model, using the first features and the first set of labels; and (Page 2, §3.3; “After the cluster prototype messages were selected for training, feature selection and model selection were performed using leave one-out cross validation. In repeated experiments, the model with the best performance was a Random Forest [1] . . . For each tree, a bootstrap sample is drawn from the labeled data and a decision tree is constructed by considering a random subset of features for each decision node in the tree”, suggesting training a first model (a random forest algorithm) using the first features and the first set of labels, the first set of labels being the labeled initial set of email messages; and Page 1, §3; “clustering a sample of messages from the training pool and obtaining labels for cluster medoids; constructing an initial Random Forest for spam detection”, further suggesting training (constructing) a first model (Random Forest) using the first features and first set of labels)
review the first model to generate a second model, by: (Page 2, §3.4; “Once labeled, the selected messages are then added to the cluster prototypes and the Random Forest is retrained”, suggesting reviewing a first model (Random Forest) to generate a second model (retrained Random Forest model))
receiving a second set of labels for the first data, and (Page 2, §3.4; “Once the initial Random Forest model is constructed, additional messages are selected for labeling by choosing examples from the training pool where the probability of spam assigned by the Random Forest model is closest to 0.5”, suggesting receiving a second set of labels (additional messages selected for labeling that, when labeled, become the second set of labels) for the first data, the first data being the entirety of the training pool discussed in § 3)
DeBarr fails to explicitly disclose [a] non-transitory computer-readable media storing computer instructions, that when executed by one or more processors, cause the one or more processors to perform the steps of; training the second model using the second set of labels and reusing the first features generated based on the first data.
Lee discloses [a] non-transitory computer-readable media storing computer instructions, that when executed by one or more processors, cause the one or more processors to perform the steps of ([0111]; a processor with memory is disclosed)
The motivation to combine DeBarr and Lee is the same as discussed above with respect to claim 1.
Kandaswamy discloses training the second machine learning model using the second set of labels and reusing the first features generated based on the first data (Abstract; “We use deep neural networks to transfer either low or middle or higher-layer features for a machine trained in either unsupervised or supervised way”, which teaches transfer learning which trains a second model using a second set of labels and reuses the first model’s features generated based on a first data; and Page 4, ¶2; “In the case of the FT approach we reuse the fully trained supervised features S(wS) ⇒ wT of the source problem and then fine-tune again the entire classifier S (wT , cT ) for the target”, which discloses training a second model (the again fine tuned classifier) using a second set of labels and reusing the first features (the reused fully trained supervised features) generated based on the first data; and Page 6, §4.1; “Classifying images of lowercase from a-to-z by reusing supervised features of digits from 0-to-9. We train a CNN to solve Latin digits (specific source problem) and reuse it to solve a lowercase letters (different but related target problem) without having to train it from scratch”, further suggesting the reusing of first features (supervised features of digits) to then train a second model and a second set of labels (the labels a to z derived from images of lowercase); and see generally §2; the section discloses generally training a second model by reusing a first model’s trained features).
The motivation to combine DeBarr, Lee, and Kandaswamy is the same as discussed above with respect to claim 1.

Regarding claim 2, the rejection of claim 1 is incorporated and DeBarr further discloses wherein the first data is received during a first time period, and second data is received during a second time period after the first time period (Page 2, §4; “For the experimental results reported here, the first week of data (9,535 messages) was used as a training pool and the next 12 weeks of messages were used for testing” suggesting that a first data (first week of data) is received during a first time period (first week) and a second data (next 12 weeks of messages) is received during a second time period after the first time period (next 12 weeks)).

Regarding claims 3 and 18, the rejection of claims 1 and 16 are incorporated and DeBarr further discloses determine whether a trigger has occurred, and to generate the second machine learning model in response to determining that the trigger has occurred (Page 2, §3.4; “Once the initial Random Forest model is constructed, additional messages are selected for labeling by choosing examples from the training pool where the probability of spam assigned by the Random Forest model is closest to 0.5. The probability of spam is computed as the proportion of decision trees assigning the spam label. Once labeled, the selected messages are then added to the cluster prototypes and the Random Forest is retrained”, suggesting determining whether a trigger has occurred (determining where the probability of spam assigned by the Random Forest model is closest to 0.5) and where the second model (the retrained Random Forest) is conditionally generated based on the determination that the additional messages for labeling are spam based on a 0.5 probability).

Regarding claim 4, the rejection of claim 1 and 3 are incorporated but DeBarr fails to explicitly disclose wherein the determination whether the trigger has occurred includes determining whether a timer has expired.
Lee discloses wherein the determination whether the trigger has occurred includes determining whether a timer has expired ([0009]; “The retraining may occur automatically when the system determines that confidence in the authentication has been too low for a sufficiently long period of time, such as when the confidence score for multiple authentications within a 20 second period are below 0.2” (emphasis added), suggesting that the determination whether the trigger has occurred (confidence in the authentication) includes determining whether a timer has expired or a certain amount of time has passed, such as 20 seconds).
DeBarr and Lee are analogous art because both are concerned with retraining models using supervised machine learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in supervised machine learning to combine the determination whether the trigger has occurred includes determining whether a timer has expired of Lee with the processing device of DeBarr to yield the predictable result of wherein the determination whether the trigger has occurred includes determining whether a timer has expired.  The motivation for doing so would be to better train, using machine learning techniques, an authentication model to determine if a user is authentic (Lee; [0008]).


Regarding claim 6, the rejection of claim 1 and 3 are incorporated but DeBarr fails to explicitly disclose wherein the determination whether the trigger has occurred includes determining whether an error value in connection with a performance of the first machine learning model has exceeded a threshold.
Lee discloses wherein the determination whether the trigger has occurred includes determining whether an error value in connection with a performance of the first machine learning model has exceeded a threshold ([0075]; “For an authenticated user, one preferred embodiment involves determining if the confidence score is lower than a certain threshold .epsilon..sub.CS for a period of time T, then the system automatically retrains the authentication models” (emphasis added), suggesting that the determination whether the trigger has occurred (confidence in the authentication) includes determining whether an error value in connection with a performance of the first model has exceeded a threshold (a confidence score)).
The motivation to combine DeBarr and Lee is the same as discussed above with respect to claim 3.

Regarding claims 7 and 17, the rejection of claims 1 and 16 are incorporated and DeBarr further discloses wherein a generation of second features based on the second set of labels is avoided by reusing at least a portion of the first features in connection with the generation of the second machine learning model (Page 2, §3.4; “Once labeled, the selected messages are then added to the cluster prototypes and the Random Forest is retrained”, suggesting reusing at least a portion of (or all of) the first features generated in the cluster prototypes generated in section 3.2 with a second set of labels (additional messages selected for labeling) in connection with training the second model (the Random Forest is retrained with the new labels, effectively training a second model). Note that generation of second features is avoided by reusing the first features used to generate the second, retrained model).

8, the rejection of claim 1 is incorporated and DeBarr further discloses wherein the second machine learning model is generated utilizing second features different from the first features generated based on the second set of labels in addition to at least a portion of the first features (Page 2, §3.4; “Once labeled, the selected messages are then added to the cluster prototypes and the Random Forest is retrained”, suggesting that the second model is generated or retrained based on utilizing second features different from the first features (a different cluster prototype as described as features in section 3.2) generated based on the second set of labels (additional messages selected for labeling) in addition to at least a portion of the first features (a cluster prototype generated through feature selection as discussed in section 3.2)).

Regarding claim 9 and 19, the rejection of claim 1 and 16 are incorporated but DeBarr fails to explicitly disclose wherein the second set of labels replaces the first set of labels.
Lee discloses wherein the second set of labels replaces the first set of labels[0009]; “The method may also involve the authentication model being retrained, or adaptively updated to include temporal changes in the user's patterns, where the retraining can include, but is not limited to, incorporating new data into the existing model or using an entirely new set of data”, suggesting wherein the second set of labels replaces the first set of labels in that the model is retrained using an entirely new set of data, thus resulting in the replacement of labels or classifications with respect to the use of entirely new data versus labels created with old data).


Regarding claim 10, the rejection of claim 1 is incorporated and DeBarr further discloses wherein the one or more processors execute the instructions to maintain review metadata by recording properties for the first machine learning model and the second machine learning model, the properties including one or more of a name, a portion of the first features selected for training, an algorithm, or a set of labels (Page 2, §3.4; “Once the initial Random Forest model is constructed, additional messages are selected for labeling by choosing examples from the training pool where the probability of spam assigned by the Random Forest model is closest to 0.5. The probability of spam is computed as the proportion of decision trees assigning the spam label. Once labeled, the selected messages are then added to the cluster prototypes and the Random Forest is retrained”, suggesting maintaining review metadata by recording properties for the first and second model such as a set of labels (spam labels)).

Regarding claim 11, the rejection of claim 1 is incorporated and DeBarr further discloses wherein the one or more processors execute the instructions to maintain review metadata by recording properties for the first machine learning model and the second machine learning model, the properties including at least one of accuracy, a trigger event for each new model generation, a label, a time stamp, or a model name (Page 2, §3.4; “Once the initial Random Forest model is constructed, additional messages are selected for labeling by choosing examples from the training pool where the probability of spam assigned by the Random Forest model is closest to 0.5. The probability of spam is computed as the proportion of decision trees assigning the spam label. Once labeled, the selected messages are then added to the cluster prototypes and the Random Forest is retrained”, suggesting maintaining review metadata by recording properties for the first and second model such as a trigger event for each new model generation (when the probability of spam assigned by the Random Forest model is closest to 0.5)).

Regarding claim 14, the rejection of claim 1 is incorporated and DeBarr further discloses wherein a first portion of the first features is used to train the first machine learning model and a second portion of the first features that comprises a subset of the first portion is used to train the second machine learning model (Page 2, §3.3; “After the cluster prototype messages were selected for training, feature selection and model selection were performed using leaveone-out cross validation. In repeated experiments, the model with the best performance was a Random Forest [1] . . . For each tree, a bootstrap sample is drawn from the labeled data and a decision tree is constructed by considering a random subset of features for each decision node in the tree”, suggesting training a first model (a random forest algorithm) using the first features and the first set of labels; and Page 1, §3; “clustering a sample of messages from the training pool and obtaining labels for cluster medoids; constructing an initial Random Forest for spam detection”, further suggesting training (constructing) a first model (Random Forest) using the first features and first set of labels; and Page 2, §3.4; “Once labeled, the selected messages are then added to the cluster prototypes and the Random Forest is retrained”, suggesting that a second portion of the first features is used to train the second model (a cluster prototype as described as features in section 3.2), the second portion of the first features comprises a subset of the first portion).

Regarding claim 15, the rejection of claim 1 is incorporated and DeBarr further discloses wherein a first portion of the first features is used to train the first machine learning model and the first portion of the first features is used to train the second machine learning model (Page 2, §3.3; “After the cluster prototype messages were selected for training, feature selection and model selection were performed using leaveone-out cross validation. In repeated experiments, the model withthe best performance was a Random Forest [1] . . . For each tree, a bootstrap sample is drawn from the labeled data and a decision tree is constructed by considering a random subset of features for each decision node in the tree”, suggesting training a first model (a random forest algorithm) using the first features and the first set of labels; and Page 1, §3; “clustering a sample of messages from the training pool and obtaining labels for cluster medoids; constructing an initial Random Forest for spam detection”, further suggesting training (constructing) a first model (Random Forest) using the first features and first set of labels; and Page 2, §3.4; “Once labeled, the selected messages are then added to the cluster prototypes and the Random Forest is retrained”, suggesting that a first portion of the first features is used to train the second model (a cluster prototype as described as features in section 3.2)).

12 and 13 are rejected under 35 U.S.C. § 103 as being obvious over DeBarr in view of Lee and Kandaswamy and further in view of Oliver et al. (US 20160285805 A1, hereinafter “Oliver”).

Regarding claim 12, the rejection of claim 1 is incorporated but DeBarr fails to explicitly disclose wherein the one or more processors execute the instructions to store a feature and label table in a memory, the feature and label table including the first features and the first set of labels for the first data.
Oliver discloses wherein the one or more processors execute the instructions to store a feature and label table in a memory, the feature and label table including the first features and the first set of labels for the first data([0029]; “The reliable classifiers classify received messages and provide the statistical message classifier with a knowledge base. A message is parsed to obtain various features . . . Table 1 is used in some embodiments to store various features and the number of times they are determined either as good or spam”, suggesting store a feature and label (number of times messages labeled as good or spam) table in memory; and Claim 1; “executing instructions stored in memory, the instructions being executed by a processor to: classify the received message using one or more classifiers from a plurality of available classifiers, and track a variety of features of the classified message based on the classification, wherein the tracked features are stored in a table and accounts for a number of times a particular feature appeared in the classified message” (emphasis added), further suggesting storing a feature and label table in memory).


Regarding claim 13, the rejection of claims 1 and 12 are incorporated but DeBarr fails to explicitly disclose wherein the one or more processors execute the instructions to update the feature and the label table in the memory to include the second set of labels for the first data.
Oliver discloses wherein the one or more processors execute the instructions to update the feature and the label table in the memory to include the second set of labels for the first data ([0029]; “The reliable classifiers classify received messages and provide the statistical message classifier with a knowledge base. A message is parsed to obtain various features . . . Table 1 is used in some embodiments to store various features and the number of times they are determined either as good or spam”, suggesting store a feature and label (number of times messages labeled as good or spam) table in memory; and Claim 1; “executing instructions stored in memory, the instructions being executed by a processor to: classify the received message using one or more classifiers from a plurality of available classifiers, and track a variety of features of the classified message based on the classification, wherein the tracked features are stored in a table and accounts for a number of times a particular feature appeared in the classified message” (emphasis added), further suggesting storing a feature and label table in memory; and [0004]; “The spam protection program periodically updates the personalized statistical searcher by processing the categorized messages. When a new message comes in, the improved statistical searcher determines whether the incoming message is spam. The updating of the personalized statistical searcher is typically done by finding the tokens and features in the messages and updating a score or probability associated with each feature or token found in the messages”, suggesting that the classifier and, therefore, the table is updated with second sets of labels for the first data (labels or classifications for new messages)).
The motivation to combine DeBarr, Lee, Kandaswamy, and Oliver is the same as discussed above with respect to claim 12.

Allowable Subject Matter

Claim 5 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Response to Arguments

Applicant’s arguments and amendments, filed on 11/13/2020, with respect to the 35 USC § 103 rejection of claims 1-20 have been considered but are not persuasive.

Beginning on Page 8 of the Remarks, filed on 11/13/2020, Applicant argues that Kandaswamy does not disclose “receiving a second set of labels for the first data”, and “training the second machine learning model using the second set of labels and reusing the first features generated based on the first data”.  Examiner respectfully disagrees.  Kandaswamy, particularly on page 6, section 4.1, discloses receiving the second set of labels (through classifying images of lowercase from a to z), and Kandaswamy also discloses training the second machine learning model using the second set of labels (labeled a to z) and reusing the first features (the reused supervised features of digits) generated based on the first data. Further, Page 4, ¶2 of Kandaswamy discloses “In the case of the FT approach we reuse the fully trained supervised features S(wS) ⇒ wT of the source problem and then fine-tune again the entire classifier S (wT , cT ) for the target”, which further suggests training a second model (the again fine tuned classifier) using a second set of labels and reusing the first features (the reused fully trained supervised features) generated based on the first data.  Applicant has not provided any specific evidence or argument as to why these cited features of Kandaswamy fail to teach or render obvious the claimed limitation “training the second machine learning model using the second set of labels and reusing the first features generated based on the first data”.  

Further, on page 9 of the remarks, Applicant argues that it would not be obvious to combine DeBarr, Lee, and Kandaswamy to yield the predictable result of the limitation “training the second machine learning model using the second set of labels and reusing the first features generated based on the first data”.  In response to applicant’s argument that there is no teaching, suggestion, or motivation to combine the references, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art.  See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007).  In this case, DeBarr, Lee, and Kandaswamy are analogous art because all are concerned with retraining models using supervised machine learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in supervised machine learning to combine the reusing feature to train a second model as taught by Kandaswamy with the model training techniques of DeBarr and Lee to yield the predictable result of training the second model using the second set of labels and reusing the first features generated based on the first data.  The motivation for doing so would be to achieve significant performance by transferring learning from source to target problem, by using lower-layer features trained in supervised fashion in case of CNN’s and unsupervised features trained in case of SDA’s (Kandaswamy; Conclusion).

In response to Applicant’s allegation, on page 9, ¶2 of the remarks, that one skilled in the art would not have modified the techniques of DeBarr and Lee to implement Kandaswamy’s transductive transference to arrive at the claimed matter, Examiner would like to note that it is not the generic transductive transference of Kandaswamy that is incorporated into the rejection to arrive at the claimed subject matter, but specifically the idea of training a second machine learning model (or retrained classifier) using a second set of labels and reusing a first set of features generated based on the first set of data.  Kandaswamy clearly discloses this claimed limitation in at least page 6, section 4.1 and page 4, ¶2.  

Applicant further argues, on page 9, last paragraph of the remarks that the selection of the “second labels” is nontrivial in the case where an active learning system is to be modified to incorporate transductive transference.  Again, it is not the broad notion of incorporating transductive transference into an active learning system, but rather, the more specific idea of training a second machine learning model (or retrained classifier) using a second set of labels and reusing a first set of features generated based on the first set of data as claimed.  

Prima facie obviousness has been properly established in the rejection above, and the motivation to combine DeBarr, Lee, and Kandaswamy, all of which concern the general notion of retraining machine learning models, would be to achieve significant performance by transferring learning from source to target problem, by using lower-layer 

Accordingly, Applicant’s arguments are not persuasive and the 35 USC § 103 rejection of claims 1-20 STANDS.


Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:

Forman et al. (US 20080103996 A1).

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Brent Hoover whose telephone number is (303)297-4403.  The examiner can normally be reached on Monday - Friday 9-5 MST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/KAMRAN AFSHAR/           Supervisory Patent Examiner, Art Unit 2125