DETAILED ACTION
This Office Action is in response to the remarks entered on 03/10/2021. Claims 1, 15, were amended. No claims were added. No claims were cancelled.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination.

In reference to Applicant’s arguments about: Rejections under 35 U.S.C. §103:
 	Applicant’s Argument: 
Claims 1-20 were rejected under 35 U.S.C. § 103 over Lee (U.S. 2018/0285771) in view of Kozitsky (U.S. 2016/0148076). The rejection is respectfully traversed, as the combination of Lee and Kozitsky do not teach or suggest training a second classifier in the manner claimed. 
The present application describes that parameters of a first classifier, trained on a first dataset class, can be transferred to a second classifier to more efficiently train the second classifier to recognize both a first class and a second class of data. Once the parameters are transferred to the second classifier, the second classifier is trained on a merged subset of the first dataset class and the second dataset class to form a merged class. In the past, the second classifier would be trained on a combination of all of the first and second datasets. The merged class thus has less training data than a 
The claim term "a subset of the first dataset" is unreasonably broadly interpreted to read on the entire labeled data 100 of Lee. 
Claim 1 recites: "merging a subset of the first dataset and the second dataset into a merged class." Emphasis added. The Office Action asserts that Lee teaches "the merged class is considered as the combination of labeled data 100, unlabeled data 102 and classifier 106, wherein the classifier 106 containing the prediction rules are considered as the first parameters, the subset of the first dataset is considered as the set of first labeled data 100." Note that this interpretation requires the entire first labeled data 100 to be included in the subset of the first dataset class. This is believed an unreasonably broad interpretation of the claim term "a subset of the first dataset." 
The present application provides the following description of the term, subset: "This allows the classifier to be completely trained using only subsets of training data for already trained classes with a complete set of training data for the class to be added, birds. In one example, the complete set of training data for birds may include 1000 images, while only 10 images apiece may be used for training the dogs and cats classes. These are just example numbers that may vary significantly in further embodiments, but do illustrate that a significant amount of prior training may be leveraged to extend the model to one or more additional classes."

Examiner’s Response: 
Examiner’s respectfully disagrees to applicant’s argument, because the person having ordinary skill in the art would clearly recognize the subset of the data set can be a entire set of the data or small set of the dataset. However, based on language in the claim 1, it is not stated that the subset have to be a smaller set than the first dataset, therefore, Examiner interprets the subset of the first dataset is the entire dataset. Therefore, the argument is not persuasive, the rejection is still maintained. 
Applicant’s Argument: 
Lee does not teach or suggest: "loading the first parameters into a second classifier" 

The claim term "loading" is different than training, as later in claim 1, the second classifier is trained. The cited labeled data 100 and unlabeled data 102 of Lee are thus not loaded into a second classifier but may be used to further train classifier 106.
It is noted that the Office Action has previously interpreted "first parameters" as being taught by Lee, without taking into account that the claim 1 element describing the first parameters includes that the first parameters comprise model weights. While later in the Office Action, Kozitsky is cited as disclosing that the parameters include model weights, there is still no teaching that model weights are loaded into a second classifier, as Lee does not teach such a second classifier. 
The Office Action states that Lee does not teach "parameters comprising model weights." 
Kozitsky is cited as teaching "parameters comprising model weights." The Office Action asserts that it would have been obvious to modify Lee's method by having parameters comprising model weights. However, the result of the combination would result in loading both the training data (previously interpreted as parameters) and model weights into classifier 106, which was interpreted as the first classifier. There is still no teaching by the combination of the references to load parameters into a second 

Examiner’s Response: 
Examiner respectfully disagrees to applicant’s argument, because Lee teaches “loading the first parameter into the second classifier” as it can be seen at Lee [Par. 0024] teaches the labeled data 100, unlabeled 102 and classifier 106 are processed by the second classifier (semi supervised), wherein, the classifier 106 including a rules corresponding to the parameters. The person having ordinary skill in the art would clearly understand that the parameters need to be loaded before the training process is started. 
Furthermore, Examiner respectfully reminds applicant that Kozitsky is only brought to cure the specific deficiencies of Lee regarding their respective claims. Lee teaches loading the first parameters into the second classifier. Furthermore, Kozitsky teaches the parameters including the model weight. Therefore, the argument is not persuasive, the rejection is still maintained. 
In additional, Examiner respectfully reminds applicant that Lee teaches loading the first parameter into the second classifier (semi-supervised). Furthermore, the first dataset 101 and second dataset 102 and classifier 106 are trained on the second classifier (semi-supervised). Therefore, Lee teaches merged datasets are corresponding to the combination of the first, second dataset and classifer106 to train 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (Pub. No. US 2018/0285771- hereinafter, Lee) in view of Kozitsky et al. (Pub. No. US2016/0148076-hereinafter, Kozitsky).
Regarding to claim 1, Lee teaches  a method comprising: obtaining a first classifier trained on a first dataset having a first dataset class (Lee, [Fig.1, [Par. 0024, lines 5-9], “The initial supervised learning module 104 executed by a computer program of a computerized machine learning tool processes the labeled data 100 to produce a classifier 106. The labeled data 100, the unlabeled data 102, and the classifier 106 are processed by the semi-supervised learning module 108 executed by a ,
the first classifier having a plurality of first-parameters (Lee, [Par.0040, lines 1-4], “The classifier 106, updated classifier 110, and out­put classifier 130 are data structure containing rules to predict the labels from the features of objects in the unla­beled data 102.” Examiner’s note, the classifier 106 is considered as first classifier, wherein the classifier 106 containing rules to predict the labels, the rules are considered as the parameters.)
[…]; 
obtaining a second dataset having a second dataset class (Lee, [Par.0036, lines 1-2], “The unlabeled data 102 consists of a plurality of objects, each consisting of a plurality of features.” Examiner’s note, the second dataset is considered as the unlabeled data 102. Furthermore, see [Par. 0037, lines 1-3], “In one embodiment of the invention, 3D micros­copy images of neurons with unknown classes are the unlabeled data.” Examiner’s note, unlabeled data 102 having dataset class (unknown class).);
loading the first parameters into a second classifier (Lee, [Par.0024, lines 9-12], “The labeled data 100, the unlabeled data 102, and the classifier 106 are processed by the semi-supervised learning module 108 executed by a computer program of the computerized” and furthermore, see [Par.0040, lines 1-3], “The classifier 106, updated classifier 110, and output classifier 130 are data structure containing rules ; 
merging a subset of the first dataset class and the second dataset class into a merged dataset; and training the second classifier using the merged dataset (Lee, [Par.0024, lines 6-12], “The initial supervised learning module 104 executed by a computer program of a computerized machine learning tool processes the labeled data 100 to produce a classifier 106. The labeled data 100, the unlabeled data 102, and the classifier 106 are processed by the semi-supervised learning module 108 executed by a computer program of the computerized machine learning tool to produce an updated classifier 110 which is learnt from both the labeled data 100 and the unlabeled data 102.” Examiner’s note, the merged class is considered as the combination of labeled data 100, unlabeled data 102 and classifier 106, wherein the classifier 106 containing the prediction rules are considered as the first parameters, the subset of the first dataset is considered as the set of first labeled data 100. Merging of the datasets corresponding the combination of the first dataset, second dataset and classifier 106).
However, Lee does not teach parameters comprising model weights; 
On the other hand, Kozitsky teaches parameters comprising model weights (Kozitsky, [Par.0038, lines 9-13], “Parameters for this classifier include the number of decision trees, a cost matrix (penalty or weight applied to decision trees for making an incorrect prediction), and a weighting matrix (bias one of the two classes).”));
Lee and Kozitsky are analogous in arts because they have the same field of endeavor of using a machine learning to classify the dataset.

Regarding to claim 9, is being rejected as the same reason as the claim 1.
Additionally, Kozitsky teaches a device comprising: a processor; and a memory device coupled to the processor and having a program stored thereon for execution by the processor to perform operations comprising (Kozitsky, [Fig.12, Par. 0062, lines 3-7], “The data processing system 100 includes a processor 105 with one or more microprocessors. The system 100 also includes memory 110 for storing data and programs for execution by the processing system.”).
Regarding to claim 16, is being rejected as the same reason as the claim 1.
Additionally, Kozitsky teaches a machine readable storage device having instructions for execution by a processor of a machine to cause the processor to perform operations comprising (Kozitsky, [Par.0060], “These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an 
Regarding to claim 2, Lee teaches the method of claim 1 wherein the first parameters are fixed in the second classifier during training of the second classifier (Lee, [Par.0036, lines 1-6], “The unlabeled data 102 consists of a plurality of objects, each consisting of a plurality of features The unlabeled data is different from the labeled data 100 due to its absence of known labels. The features associated with each 
unlabeled objects are the same as features associated with the labeled data 100.” Furthermore, see [Par 0040, lines 1-4], “The classifier 106, updated classifier 110, and output classifier 130 are data structure containing rules to predict the labels from the features of objects in the unlabeled data 102.” Examiner’s note, fist parameters are considered as the rules to predict the labels from the features of objects in the unlabeled data 102, futures of the first dataset as the same as the second dataset, therefore, the rules (first parameters) are fixed during the training of second classifier.).
Regarding to claim 10, is being rejected as the same reason as the claim 2.
Regarding to claim 17, is being rejected as the same reason as the claim 2.
Regarding to claim 3, Lee teaches the method of claim 1 wherein the first dataset further comprises multiple first dataset classes (Lee, [Par. 0034, lines 1-2], “The labeled data 100 consists of a plurality of objects having a pairing of a label and features” Furthermore, see [Par. 0035, lines 1-3], “In one embodiment of the invention, the data are 3D microscopy images of neurons.. The labels are known neuron classes such as Purkinje, Granule, and Motor neurons are used as the label in the labeled 
and wherein merging a subset of the first dataset class comprises merging multiple subsets of the first dataset corresponding to the classes with the second dataset ([Par.0024, lines 6-12], “The initial supervised learning module 104 executed by a computer program of a computerized machine learning tool processes the labeled data 100 to produce a classifier 106. The labeled data 100, the unlabeled data 102, and the classifier 106 are processed by the semi-supervised learning module 108 executed by a computer program of the computerized machine learning tool to produce an updated classifier 110 which is learnt from both the labeled data 100 and the unlabeled data 102.” Examiner’s note, the merged datasets are considered as the combination of labeled data 100, unlabeled data 102 and classifier 106, the subset of the first dataset classes are considered as the dataset classes of first labeled data 100. Furthermore, the first dataset (labeled data 100) having first dataset classes (neuron classes) are corresponding to the multiple subsets of the first dataset).
Regarding to claim 11, is being rejected as the same reason as the claim 3.
Regarding to claim 18, is being rejected as the same reason as the claim 3.
Regarding to claim 4, Lee teaches the method of claim 3 wherein the second dataset further comprises multiple second dataset that are merged with the multiple subsets of the multiple first dataset (Lee, [Par. 0037, lines 1-5], “In one embodiment of the invention, 3D micros copy images of neurons with unknown classes are the unlabeled data. In another embodiment of the invention, the objects are images in an image database with unknown categories.” Examiner’s note, second dataset .). Furthermore, see, (Lee, [Par. 0035, lines 1-11], “In one embodiment of the invention, the data are 3D microscopy images of neurons. The labels are known neuron classes such as Purkinje, Granule, and Motor neurons are used as the label in the labeled data. The features associated with each neuron can consist of measurements such as the total number of branches, the average length of the branches, and the volume of the neuron's soma. In another embodiment of the invention, the objects are images in an image database and the labels are the categories of the images such as human, cat, dog, car, boat, airplane, computer, phone, etc.” Examiner’s note, the multiple first dataset (.e.i the labels are known neuron classes such as Purkinje, Granule, and Motor neuron are used as the label in the labeled data) are merged with the second dataset. Wherein, the first dataset including multiple dataset classes are corresponding to the multiple datasets.)
Regarding to claim 12, is being rejected as the same reason as the claim 4.
Regarding to claim 19, is being rejected as the same reason as the claim 4.
The claims 5 and 13 were cancelled. 
Regarding to claim 6, Lee, as modified in view of Kozitsky teaches the method of claim 1 wherein the model first parameters comprise model bias (Kozitsky, [Par.0038, lines 9-13], “Parameters for this classifier include the number of decision .
Lee and Kozitsky are analogous in arts because they have the same filed of endeavor of using a machine learning to classify the dataset.
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified Lee’s method, further in view of Kozitsky by having the parameters comprising the model bias. The modification would have been obvious because one of the ordinary skills in art would be motivated to improve the accuracy of classifying data (Kozitsky, [Par.0006, lines 5-8], “This highly selective behavior helps the automated OCR solution reduce the number of mistakes that it makes. Given the market requirements for highly accurate OCR (99% or better)”)
Regarding to claim 14, is being rejected as the same reason as the claim 6.
Regarding to claim 20, is being rejected as the same reason as the claim 6.
Regarding to claim 7, Lee teaches the method of claim 6 wherein the model first parameters are injected into a concat layer of the second classifier during training of the second classifier (Lee, [Par.0036, lines 1-6], “The unlabeled data 102 consists of a plurality of objects, each consisting of a plurality of features The unlabeled data is different from the labeled data 100 due to its absence of known labels. The features associated with each un labelled objects are the same as features associated with the labeled data 100.” Furthermore, see [Par 0040, lines 1-4], “The classifier 106, updated classifier 110, and output classifier 130 are data structure containing rules to predict the labels from the features of objects in the unlabeled data 102.” Examiner’s 
Regarding to claim 8, Lee teaches the method of claim 7 wherein the first parameters are fixed using configs (Lee, [Par.0036, lines 1-6], “The unlabeled data 102 consists of a plurality of objects, each consisting of a plurality of features The unlabeled data is different from the labeled data 100 due to its absence of known labels. The features associated with each un labelled objects are the same as features associated with the labeled data 100.” Furthermore, see [Par 0040, lines 1-4], “The classifier 106, updated classifier 110, and output classifier 130 are data structure containing rules to predict the labels from the features of objects in the unlabeled data 102.” Examiner’s note, fist parameters are considered as the rules to predict the labels from the features of objects in the unlabeled data 102, futures of the first dataset as the same as the second dataset, therefore, the rules (first parameters) are fixed during the training.).
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any 
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure is provide below.
Johnston et al. (Patent No. 10339468 –hereinafter, Johnston): teaches classifying a merging the subset of labeled data and a training dataset to improve the accuracy of a training. 
Lin et al. (Paten No. 9830526 –hereinafter, Lin): teaches a training data including a first training subset from the first training dataset and a second training subsets from a second dataset. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EM N TRIEU whose telephone number is (571) 272-5747. The examiner can normally be reached on 7:30 - 5:00 M_TH. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent 


 
/E.T./ Examiner, Art Unit 2126                    

/BABOUCARR FAAL/Primary Examiner, Art Unit 2184