PNG
    media_image1.png
    340
    340
    media_image1.png
    Greyscale
United States Patent and Trademark Office    
        
            
                                
            
        
    

Commissioner for Patents
United States Patent and Trademark Office
P.O. Box 1450
Alexandria, VA 22313-1450
www.uspto.gov











BEFORE THE PATENT TRIAL AND APPEAL BOARD


Application Number: 15/616,655
Filing Date: 7 Jun 2017
Appellant(s): Hu et al.



__________________
Bradley A. Forrest

For Appellant

This is in response to the appeal brief filed 10/20/2021.














EXAMINER’S ANSWER

(1) Grounds of Rejection to be reviewed on Appeal
Every ground of rejection set forth in the Office action dated 05/11/2021 from which the appeal is taken is being maintained by the examiner except for the grounds of rejection (if any) listed under the subheading “WITHDRAWN REJECTIONS.”  New grounds of rejection (if any) are provided under the subheading “NEW GROUNDS OF REJECTION.”
The following ground(s) of rejection are applicable to the appealed claims.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (Pub. No. US 2018/0285771- hereinafter, Lee) in view of Kozitsky et al. (Pub. No. US2016/0148076-hereinafter, Kozitsky).
Regarding to claim 1, Lee teaches  a method comprising: obtaining a first classifier trained on a first dataset having a first dataset class (Lee, [Fig.1, [Par. 0024, lines 5-9], “The initial supervised learning module 104 executed by a computer program of a computerized machine learning tool processes the labeled data 100 to produce a classifier 106. The labeled data 100, the unlabeled data 102, and the classifier 106 are processed by the semi-supervised learning module 108 executed by a computer program of the computerized” Examiner’s note, a first dataset is considered as the labeled data 100. Furthermore, see [Par.0034, lines 1-4], “The labeled data 100 consists of a plurality of objects having a pairing of a label and features. The label of the data consists of a unique name or value that identifies the class the object belongs to.”),
the first classifier having a plurality of first-parameters (Lee, [Par.0040, lines 1-4], “The classifier 106, updated classifier 110, and out­put classifier 130 are data structure containing rules to predict the labels from the features of objects in the unla­beled data 102.” Examiner’s note, the classifier 106 is considered as first classifier, wherein the classifier 106 containing rules to predict the labels, the rules are considered as the parameters.)
[…]; 
obtaining a second dataset having a second dataset class (Lee, [Par.0036, lines 1-2], “The unlabeled data 102 consists of a plurality of objects, each consisting of a plurality of features.” Examiner’s note, the second dataset is considered as the ;
loading the first parameters into a second classifier (Lee, [Par.0024, lines 9-12], “The labeled data 100, the unlabeled data 102, and the classifier 106 are processed by the semi-supervised learning module 108 executed by a computer program of the computerized” and furthermore, see [Par.0040, lines 1-3], “The classifier 106, updated classifier 110, and output classifier 130 are data structure containing rules to predict the labels” Examiner’s note, the rules are considered as the first parameters loading into the second classifier (semi supervised)); 
merging a subset of the first dataset class and the second dataset class into a merged dataset; and training the second classifier using the merged dataset (Lee, [Par.0024, lines 6-12], “The initial supervised learning module 104 executed by a computer program of a computerized machine learning tool processes the labeled data 100 to produce a classifier 106. The labeled data 100, the unlabeled data 102, and the classifier 106 are processed by the semi-supervised learning module 108 executed by a computer program of the computerized machine learning tool to produce an updated classifier 110 which is learnt from both the labeled data 100 and the unlabeled data 102.” Examiner’s note, the merged class is considered as the combination of labeled data 100, unlabeled data 102 and classifier 106, wherein the classifier 106 containing the prediction rules are considered as the first parameters, the subset of the first dataset is considered as the set of first labeled data 100. Merging of .
However, Lee does not teach parameters comprising model weights; 
On the other hand, Kozitsky teaches parameters comprising model weights (Kozitsky, [Par.0038, lines 9-13], “Parameters for this classifier include the number of decision trees, a cost matrix (penalty or weight applied to decision trees for making an incorrect prediction), and a weighting matrix (bias one of the two classes).”));
Lee and Kozitsky are analogous in arts because they have the same field of endeavor of using a machine learning to classify the dataset.
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified Lee’s method, further in view of Kozitsky by having the parameters comprising the model weights. The modification would have been obvious because one of the ordinary skills in art would be motivated to improve the accuracy of classifying data (Kozitsky, [Par.0006, lines 5-8], “This highly selective behavior helps the automated OCR solution reduce the number of mistakes that it makes. Given the market requirements for highly accurate OCR (99% or better)”)
Regarding to claim 9, is being rejected as the same reason as the claim 1.
Additionally, Kozitsky teaches a device comprising: a processor; and a memory device coupled to the processor and having a program stored thereon for execution by the processor to perform operations comprising (Kozitsky, [Fig.12, Par. 0062, lines 3-7], “The data processing system 100 includes a processor 
Regarding to claim 16, is being rejected as the same reason as the claim 1.
Additionally, Kozitsky teaches a machine readable storage device having instructions for execution by a processor of a machine to cause the processor to perform operations comprising (Kozitsky, [Par.0060], “These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function/act specified in the block or blocks.”).
Regarding to claim 2, Lee teaches the method of claim 1 wherein the first parameters are fixed in the second classifier during training of the second classifier (Lee, [Par.0036, lines 1-6], “The unlabeled data 102 consists of a plurality of objects, each consisting of a plurality of features The unlabeled data is different from the labeled data 100 due to its absence of known labels. The features associated with each 
unlabeled objects are the same as features associated with the labeled data 100.” Furthermore, see [Par 0040, lines 1-4], “The classifier 106, updated classifier 110, and output classifier 130 are data structure containing rules to predict the labels from the features of objects in the unlabeled data 102.” Examiner’s note, fist parameters are considered as the rules to predict the labels from the features of objects in the unlabeled data 102, futures of the first dataset as the same as the second dataset, therefore, the rules (first parameters) are fixed during the training of second classifier.).
Regarding to claim 10, is being rejected as the same reason as the claim 2.
Regarding to claim 17, is being rejected as the same reason as the claim 2.
Regarding to claim 3, Lee teaches the method of claim 1 wherein the first dataset further comprises multiple first dataset classes (Lee, [Par. 0034, lines 1-2], “The labeled data 100 consists of a plurality of objects having a pairing of a label and features” Furthermore, see [Par. 0035, lines 1-3], “In one embodiment of the invention, the data are 3D microscopy images of neurons.. The labels are known neuron classes such as Purkinje, Granule, and Motor neurons are used as the label in the labeled data.” Examiner’s note, therefore, the first dataset (labeled data 100) having first dataset classes (neuron classes).)
and wherein merging a subset of the first dataset class comprises merging multiple subsets of the first dataset corresponding to the classes with the second dataset ([Par.0024, lines 6-12], “The initial supervised learning module 104 executed by a computer program of a computerized machine learning tool processes the labeled data 100 to produce a classifier 106. The labeled data 100, the unlabeled data 102, and the classifier 106 are processed by the semi-supervised learning module 108 executed by a computer program of the computerized machine learning tool to produce an updated classifier 110 which is learnt from both the labeled data 100 and the unlabeled data 102.” Examiner’s note, the merged datasets are considered as the combination of labeled data 100, unlabeled data 102 and classifier 106, the subset of the first dataset classes are considered as the dataset classes of first labeled data 100. Furthermore, the first dataset (labeled data 100) having first dataset classes (neuron classes) are corresponding to the multiple subsets of the first dataset).
Regarding to claim 11, is being rejected as the same reason as the claim 3.
Regarding to claim 18, is being rejected as the same reason as the claim 3.
Regarding to claim 4, Lee teaches the method of claim 3 wherein the second dataset further comprises multiple second dataset that are merged with the multiple subsets of the multiple first dataset (Lee, [Par. 0037, lines 1-5], “In one embodiment of the invention, 3D micros copy images of neurons with unknown classes are the unlabeled data. In another embodiment of the invention, the objects are images in an image database with unknown categories.” Examiner’s note, second dataset (unlabeled dataset having unknown dataset classes) (.e.i  if the data is 3D micros-copy images of neurons with unknown classes are the unlabeled data) since second dataset having multiple objects are classified then it will produce multiple second dataset. The first dataset (labeled data 100) having first dataset classes (neuron classes) are corresponding to the multiple subsets of the first dataset).). Furthermore, see, (Lee, [Par. 0035, lines 1-11], “In one embodiment of the invention, the data are 3D microscopy images of neurons. The labels are known neuron classes such as Purkinje, Granule, and Motor neurons are used as the label in the labeled data. The features associated with each neuron can consist of measurements such as the total number of branches, the average length of the branches, and the volume of the neuron's soma. In another embodiment of the invention, the objects are images in an image database and the labels are the categories of the images such as human, cat, dog, car, boat, airplane, computer, phone, etc.” Examiner’s note, the multiple first dataset (.e.i the labels are known neuron classes such as Purkinje, Granule, and Motor neuron are used as the 
Regarding to claim 12, is being rejected as the same reason as the claim 4.
Regarding to claim 19, is being rejected as the same reason as the claim 4.
The claims 5 and 13 were cancelled. 
Regarding to claim 6, Lee, as modified in view of Kozitsky teaches the method of claim 1 wherein the model first parameters comprise model bias (Kozitsky, [Par.0038, lines 9-13], “Parameters for this classifier include the number of decision trees, a cost matrix (penalty or weight applied to decision trees for making an incorrect prediction), and a weighting matrix (bias one of the two classes).).
Lee and Kozitsky are analogous in arts because they have the same filed of endeavor of using a machine learning to classify the dataset.
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified Lee’s method, further in view of Kozitsky by having the parameters comprising the model bias. The modification would have been obvious because one of the ordinary skills in art would be motivated to improve the accuracy of classifying data (Kozitsky, [Par.0006, lines 5-8], “This highly selective behavior helps the automated OCR solution reduce the number of mistakes that it makes. Given the market requirements for highly accurate OCR (99% or better)”)
Regarding to claim 14, is being rejected as the same reason as the claim 6.
Regarding to claim 20, is being rejected as the same reason as the claim 6.
Regarding to claim 7, Lee teaches the method of claim 6 wherein the model first parameters are injected into a concat layer of the second classifier during training of the second classifier (Lee, [Par.0036, lines 1-6], “The unlabeled data 102 consists of a plurality of objects, each consisting of a plurality of features The unlabeled data is different from the labeled data 100 due to its absence of known labels. The features associated with each un labelled objects are the same as features associated with the labeled data 100.” Furthermore, see [Par 0040, lines 1-4], “The classifier 106, updated classifier 110, and output classifier 130 are data structure containing rules to predict the labels from the features of objects in the unlabeled data 102.” Examiner’s note, fist parameters are considered as the rules to predict the labels from the features of objects in the unlabeled data 102, futures of the first dataset as the same as the second dataset, therefore, the rules (first parameters) are injected into the training of second classifier.)
Regarding to claim 8, Lee teaches the method of claim 7 wherein the first parameters are fixed using configs (Lee, [Par.0036, lines 1-6], “The unlabeled data 102 consists of a plurality of objects, each consisting of a plurality of features The unlabeled data is different from the labeled data 100 due to its absence of known labels. The features associated with each un labelled objects are the same as features associated with the labeled data 100.” Furthermore, see [Par 0040, lines 1-4], “The classifier 106, updated classifier 110, and output classifier 130 are data structure containing rules to predict the labels from the features of objects in the unlabeled data 102.” Examiner’s note, fist parameters are considered as the rules to predict the labels from the features .	
NEW GROUNDS OF REJECTION
No new ground of rejection is presented.  
WITHDRAWN REJECTIONS
No rejections  is being withdrawn. 
(2) Response to Argument
Appellant’s Argument: 
The claim term "a subset of the first dataset" is unreasonably broadly interpreted to read on the entire labeled data 100 of Lee.
Claim 1 recites: "obtaining a first classifier trained on a first dataset" and "merging a subset of the first dataset and the second dataset into a merged class." Emphasis added. The Final Office Action asserts that Lee teaches "the merged class is considered as the combination of labeled data 100, unlabeled data 102 and classifier 106, wherein the classifier I 06 containing the prediction rules are considered as the first parameters, the subset of the first dataset is considered as the set of first labeled data 100." Note that this unreasonably broad interpretation requires the entire first labeled data I 00 to be included in the subset of the first dataset class. This is believed an unreasonably broad interpretation of the claim term "a subset of the first dataset. "
The present application provides the following description of the term, subset: "This allows the classifier to be completely trained using only subsets of training data for already trained classes with a complete set of training data for the class to be added, birds. In one example, the complete set of training data for birds may include 1000 
A person having ordinary skill in the art would clearly recognize that a subset of the first dataset is less than the entire first dataset upon having read the specification. The specification further describes that: "In various embodiments, the model may be efficiently extended, leveraging the existing knowledge. For example, the customer may also want a classifier to recognize birds. Rather than train a classifier with significant amounts of training data from all three classes, dogs, cats, and birds, the existing knowledge regarding the dogs and cats classes may be used along with a smaller set of training data for dogs and cats combined with a complete data set for birds. This allows the classifier to be completely trained using only subsets of training data for already trained classes with a complete set of training data for the class to be added, birds."50 Emphasis has been added to point out that the specification specifically requires less than a full first dataset for use in training the second classifier. It would be unreasonable to interpret subset as anything more than less that a full first dataset.
On Page 4 of the Final Office Action of May 11, 2021, the Examiner states that a "person having ordinary skill in the art would clearly recognize the subset of the data set can be an entire set of the data or small set of the dataset." This reasoning ignores the view of a person having ordinary skill in the art would have, having read the application. As stated above, the application itself is very clear that a subset is smaller than the entire first dataset. Such an interpretation corresponds with what and how the inventor 

Examiner’s Response: 
Examiner’s respectfully disagrees with applicant’s argument, because a person having ordinary skill in the art would clearly recognize the subset of the data set can be an entire set of the data or portion of the dataset. The claim does not describe how the subset is generated.  Lacking any description in the claim of this subset or on how it is generated, the subset could encompass the entire dataset.  This rationale is reasonable based on the following definitions of subset: ,
a. (https://www.yourdictionary.com/subset):

    PNG
    media_image2.png
    221
    1311
    media_image2.png
    Greyscale


b. (https://www.dictionary.com/browse/subset):
    PNG
    media_image3.png
    373
    1087
    media_image3.png
    Greyscale




    PNG
    media_image4.png
    85
    1351
    media_image4.png
    Greyscale



d. (https://www.collinsdictionary.com/us/dictionary/english/subset):

    PNG
    media_image5.png
    125
    870
    media_image5.png
    Greyscale

Therefore, Examiner interprets the subset of the first dataset is the entire dataset. 
Appellant points out one of example in specification that “One example in the specification indicates that the subset of the first dataset includes only 10 of 1000 images that are in the first dataset”, however, this is an example of the subset but it is only an example and not definition for the term subset and it’s also does not show how the subset is being selected. Therefore that is reasonable to interpret the subset can be an entire set of the first data. Therefore, the argument is not persuasive, the rejection is still maintained.
Appellant’s Argument: 
Lee does not teach or suggest: "loading the first parameters into a second classifier"
Claim 1 also recites: "loading the first parameters into a second classifier". Lee is cited as teaching this element at Paragraph [0024], lines 9-12: "The labeled data 100, the unlabeled data 102, and the classifier 106 are processed by the semi-supervised 
The claim term "loading" is different than training, as later in claim 1, the second classifier is trained as a separate step from loading. Both cannot mean the same thing. The cited labeled data 100 and unlabeled data 102 of Lee are thus not parameters that are loaded into a second classifier as claimed, but instead may be used to further train classifier 106 of Lee. Claim 1 specifically recites first loading the first parameters in the second classifier and then training the second classifier using the merged dataset. The first parameters are not part of the merged dataset. 
On Page 6 of the Final Office Action of May 11, 2021, the Examiner continues to assert that processing data by classifier 106 in Lee includes rules corresponding to parameters. There is no express teaching in Lee of this assertion. Lee, Paragraph [0024], cited by the Examiner in making this assertion simply appears to label data that is used in training. In claim 1, the first classifier is already trained and has parameters that include model weights. The claimed parameters are vastly different from the labeling of training data that is performed by Lee.

Examiner’s Response: 
Examiner respectfully disagrees to applicant’s argument, because Lee teaches “loading the first parameter into the second classifier” as it can be seen at Lee [Par. 0024], teaches the labeled data 100, unlabeled 102 and classifier 106 are processed by the second classifier (semi supervised), wherein, the classifier 106 including a 
Appellant’s Argument: 
The Examiner, in error, is equating training a model to loading parameters of an already trained first classifier into a second classifier, as the second classifier is trained after the parameters are loaded. 
The Final Office Action has previously interpreted "first parameters" as being taught by parameters includes that the first parameters comprise model weights. While later in the Final Office Action, Kozitsky is cited as disclosing that the parameters include model weights yet there is still no teaching that model weights are loaded into a second classifier, as Lee does not teach such a second classifier.


Kozitsky is cited as teaching "parameters comprising model weights." The Final Office Action asserts that it would have been obvious to modify Lee's method by having parameters comprising model weights. However, such a combination would result in loading both the training data (previously interpreted as parameters) and model weights into classifier I 06, which was interpreted as the first classifier. There is still no teaching by the combination of the references to load parameters into a second classifier and then train the second classifier with the merged dataset as claimed. The training data cannot both be loaded in a previous element and then also used to train a second classifier. Given this incongruity in the combination, there is no likelihood of success in attempting to make the combination. 
Independent claims 9 and 16 contain elements that are similar to claim I and are believed rejected in error for at least the same reasons as claim 1.
Examiner’s Response: 
As set forth in the rejection, Lee teaches loading first parameters into a second classifier. However, Lee does not teach first parameter comprising model weights. Kozitsky teaches first parameter including a model weight.  The combined teachings of Lee and Kozitsky teach the limitation as claimed.  Appellant seems to argue against the references individually, and one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).  It is noted that the limitation recites “first parameters comprising model weights” which is open ended and does not excludes other values other than weights.  As set forth above, paragraph [0024] of Lee teaches the labeled data 100 and the unlabeled data 102 and classifier 106 are processed by the semi-supervised learning module 108 to produce an updated classifier 110.  The merged data corresponds to the labeled and unlabeled data which is used to in the learning process which produces updated classifier 110.  
Appellant’s Argument: 
Claim 2 depends from claim 1 and recites that: "the first parameters are fixed in the second classifier during training of the second classifier." In the rejection of claim 2, the Final Office Acton first describes unlabeled data and labeled data and their features. It is unclear what that description has to do with the claimed first parameters that include model weights. Parameter weights are the result of training models using training data. The Final Office Action continues by asserting that classifiers are data structures containing rules, and that such rules are fixed and that fixing such rules are fixed during training of a second classifier. This reasoning is very difficult to comprehend. 
Training generally modifies weights of a model. There is no teaching that the asserted rules of Lee are not modified during training. Thus, even if somehow rules were equated to parameter weights, there is no express or implied teaching that the rules are not modified during training in Lee. 
Claims 10 and 17 are similar to claim 2 and are believed rejected in error for at least the same reasons.
Examiner’s Response:

Lee teaches the method of claim 1 wherein the first parameters are fixed in the second classifier during training of the second classifier (Lee, [Par.0036, lines 1-6], “The unlabeled data 102 consists of a plurality of objects, each consisting of a plurality of features The unlabeled data is different from the labeled data 100 due to its absence of known labels. The features associated with each unlabeled objects are the same as features associated with the labeled data 100.” Furthermore, see [Par 0040, lines 1-4], “The classifier 106, updated classifier 110, and output classifier 130 are data structure containing rules to predict the labels from the features of objects in the unlabeled data 102.” Examiner’s note, fist parameters are considered as the rules to predict the labels from the features of objects in the unlabeled data 102, futures of the first dataset as the same as the second dataset, therefore, the rules (first parameters) are fixed during the training of second classifier, which first parameters don’t need to be modified or corrected will be stay the same and load into the second classifier.). 
Additionally, Lee further teaches the labeled data 100 and the unlabeled data 102 and classifier 106 are processed by the semi-supervised learning module 108 to produce an updated classifier 110. The semi-supervised learning module 108 is processed by loading the labeled data 100 and unlabeled data 102 and classifier 106, the classifier 106 is data structure containing the prediction rules, wherein, the 
Furthermore, Kozitsky teaches parameters comprising model weights (Kozitsky, [Par.0038, lines 9-13], “Parameters for this classifier include the number of decision trees, a cost matrix (penalty or weight applied to decision trees for making an incorrect prediction), and a weighting matrix (bias one of the two classes).”) Kozitsky teaches parameter including a model weight to train on the classifier, therefore, these parameters are considered as first parameter, the classifier is considered as a first classifier, as it can be seen at [Par.0038.]. Kozitsky is only brought to cure the specific deficiencies of Lee regarding their respective claims.
Claims 10 and 17 are being rejected for the same reason of the claim 2 as explained above.
Appellant’s Argument: 
Claim 3 depends from claim 1 and recites that: "the first dataset further comprises multiple first dataset classes and wherein merging a subset of the first dataset comprises merging multiple subsets of the first dataset corresponding to the classes with the second dataset." The language of claim 3 further helps distinguish that 
The Final Office asserts that Lee describes: "the merged datasets are considered as the combination of labeled data 100, unlabeled data 102 and classifier 106, the subset of the first dataset classes are considered as the data set classes of first labeled data 100. Furthermore, the first dataset (labeled data 100) having first dataset classes (neuron classes) are corresponding to the multiple subsets of the first dataset." These assertions continue the unreasonable interpretation that subsets can be the entire first dataset. 
Claims 11 and 18 are similar to claim 3 and are believed rejected in error for at least the same reasons.
Examiner’s Response: 
Examiner respectfully disagrees.  As set forth above, a subset could include all elements of a set and nothing in claim 3 excludes this possibility.  Lee teaches the method of claim 1 wherein the first dataset further comprises multiple first dataset classes [Par. 0034, lines 1-2], “The labeled data 100 consists of a plurality of objects having a pairing of a label and features” Furthermore, see [Par. 0035, lines 1-3], “In one embodiment of the invention, the data are 3D microscopy images of neurons.. The labels are known neuron classes such as Purkinje, Granule, and Motor neurons are used as the label in the labeled data.” Examiner’s note, therefore, the first dataset 
 Furthermore, even the claim recites “the first dataset comprise a multiple first dataset classes”, that is still not clarify that the subset of the first dataset is smaller than the first dataset class, because the claim 3 further recites “merging a subset of the first dataset comprises merging multiple subsets of the first dataset corresponding to the classes with the second dataset” therefore, merging the subset of first dataset with the second dataset , wherein ,the subset of the first dataset can be an entire set of the first dataset because subset of first dataset includes multiple subsets of the first dataset. The subset of multiple subsets can be the entire dataset. 
Furthermore, the labeled data 100 is consider as the subset of the first dataset, and the labeled data 100 having a multiple neuron classes are considered as the multiple subsets of the first dataset corresponding to the classes. Therefore, that is correctly interpreted the labeled data 100 is subset of the first dataset that includes a multiple subsets corresponding to the classes. 
Claims 11 and 18 are being rejected for the same reason of the claim 3 as explained above.
Appellant’s Argument: 
Claim 6 depends from claim 1 and recites that: "the first parameters comprise model bias." The Final Office Action cites Kozitsky, Paragraph [0038], referencing Parameters corresponding to a decision tree classifier 44. There is no assertion that these parameters are from a first classifier and loaded into a second classifier as received in claim 1. 

Examiner’s response: 
Kozitsky is only brought to cure the specific deficiencies of Lee regarding their respective claims. Lee teaches loading the first parameters into the second classifier. Furthermore, Kozitsky teaches the “first parameters comprise model bias.". 
Claims 14 and 20 are being rejected for the same reason of the claim 6 as reason is explained above.
Appellant’s Argument: 
Claim 7 depends from claim 6 and recites that: "the first parameters are injected into a concat layer of the second classifier during training of the second classifier." The Final Office Action asserts that Paragraph [0036] of Lee teaches such injection into a concat layer. However, such Paragraph of Lee mentions nothing of a concat layer and only mentions features associated with objects, which have nothing to do with parameters from a classifier being loaded into, much less injected into a concat layer of a second classifier as claimed.
Examiner’s response: 
Lee, [Par.0036, lines 1-6], “The unlabeled data 102 consists of a plurality of objects, each consisting of a plurality of features The unlabeled data is different from the labeled data 100 due to its absence of known labels. The features associated with each unlabeled objects are the same as features associated with the labeled data 100.” Furthermore, see [Par 0040, lines 1-4], “The classifier 106, updated classifier 110, and output classifier 130 are data structure containing rules to predict the labels from the 
For the above reasons, it is believed that the rejections should be sustained.
Respectfully submitted,
Conferees: 
/E.T./Examiner, Art Unit 2128                                                                                                                                                                                             
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128     
                                                                                                                                                                                                   /MENG YAO ZHE/Primary Examiner                                                                                                                                                                                                        
Requirement to pay appeal forwarding fee.  In order to avoid dismissal of the instant appeal in any application or ex parte reexamination proceeding, 37 CFR 41.45 requires payment of an appeal forwarding fee within the time permitted by 37 CFR 41.45(a), unless appellant had timely paid the fee for filing a brief required by 37 CFR 41.20(b) in effect on March 18, 2013.