DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 2/17/2020 and 9/6/2022 was filed and considered. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments
Applicant’s arguments with respect to claims 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant’s argument – (pages 10-13) Applicant’s main argument is repeated from pages 10-13 which is the lack of teaching regarding the new claim amendment language “Clustering the output feature vectors into plurality of clusters” and “Analyzing the clusters of the output feature vectors with the class label corresponding to the second inference task to determine…” This is the main argument with the repeat statement that the citied prior art of Dube et al (US 2021/0174191) does not each such new claim language. Please read the Remarks for further detail. 
Examiner response – And update search found that Burges et al (US 2004/0260550) teaches the new claim amendment such as “Clustering the output feature vectors into plurality of clusters” and “Analyzing the clusters of the output feature vectors with the class label corresponding to the second inference task to determine…” in paragraph 0110, where output feature vectors are cluster by k-means clustering. Please read the Office Action below for further detail. Examiner suggest amendment of object claim language for advancement of application. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 7, 10, 12, 15 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Dube et al (US 2021/0174191) in view of Burges et al (US 2004/0260550).
Claim 1:
Dube et al (US 2021/0174191) teaches:
A method comprising: 
obtaining a neural network having a plurality of layers that was trained to perform a first inference task (figure 3-4 and 0069 -0073 teach pre-trained (trained) neural network (i.e. 10 layers in 0070), where 0017 detail task such as classification, regression and/or clustering); 
labeling individual samples in an input dataset for the neural network with class labels corresponding to a second inference task (0023-0024 teaches pre-trained neural network and repurposed the pre-trained neural network (second inference task) of target database for images (individual samples) labeling); 
applying the neural network to the input dataset to obtain output feature vectors of an intermediate layer of the plurality of layers (0069-0073, especially 0070 teaches neural network having 10 layers to obtain t10 feature vector, which means one feature vector per layer from plurality of layers; figure 4 teaches layers L1-N, where each layer such as L1 output feature vector); 
corresponding to the second inference task to determine a coherence score of the intermediate layer, wherein the coherence score indicates a semantic coherence of features produced by the intermediate layer for performing the second inference task (Above teaches the second inference task; figure 4 and 0076-0080 teaches output feature vector of each layer, L1-Ln, have target example feature vector and sour example feature vectors, where averages are the coherence score); and 
outputting the coherence score for the intermediate layer (figure 4 teaches output learning rate 410 view as the coherence score for each layer).
Dube et al teaches all the subject matter above with clustering (XXX), but not the following which is taught by Burges et al:
	Clustering the output feature vectors into plurality of clusters (0110 teaches where TDNN (neural network) output feature vectors are cluster by well- known k-means clustering)’
Analyzing the clusters of the output feature vectors with the class label (0110 teaches clustering by k-means foe each of the classes (class label)) 
 Dube et al and Burges et al are both in the field of neural network output with regard to feature vector from data such that the outcome is predictable. 
Therefore it would have been obvious to one having ordinary skill before the effective filing date to modify Dube et al by Burges et al regarding clustering of feature vectors such the clustering technique improve the clustering and provide output probabilities for data tagging to individuals as disclosed by Burges et al in 0110.



Claim 7:
Burges et al further teaches:
The method of claim 1, wherein the clustering is performed using a k-means clustering technique, a mean shift clustering technique, or a density-based spatial clustering technique (0110 teaches clustering by k-means).

Claim 10:
Dube et al (US 2021/0174191) teaches:
One or more non-transitory computer-accessible storage media storing program (0091 teaches computer readable storage medium not to be construed as being transitory signals) instructions that when executed on or across one or more processors implement neural network analysis system and cause the neural network analysis system to: 
obtain a neural network having a plurality of layers that was trained to perform a first inference task (figure 3-4 and 0069 -0073 teach pre-trained (trained) neural network (i.e. 10 layers in 0070), where 0017 detail task such as classification, regression and/or clustering); 
obtain an input dataset for the neural network, wherein individual samples in the input dataset are labeled with class labels corresponding to a second inference task (0023-0024 teaches pre-trained neural network and repurposed the pre-trained neural network (second inference task) of target database for images (individual samples) labeling); 
apply the neural network to the input dataset to obtain output feature vectors of an intermediate layer of the plurality of layers (0069-0073, especially 0070 teaches neural network having 10 layers to obtain t10 feature vector, which means one feature vector per layer from plurality of layers; figure 4 teaches layers L1-N, where each layer such as L1 output feature vector);
corresponding to the second inference task to determine a coherence score of the intermediate layer, wherein the coherence score indicates a semantic coherence of features produced by the intermediate layer for performing the second inference task (Above teaches the second inference task; figure 4 and 0076-0080 teaches output feature vector of each layer, L1-Ln, have target example feature vector and sour example feature vectors, where averages are the coherence score); and 
output the coherence score for the intermediate layer (figure 4 teaches output learning rate 410 view as the coherence score for each layer).
Dube et al teaches all the subject matter above with clustering (XXX), but not the following which is taught by Burges et al:
	Clustering the output feature vectors into plurality of clusters (0110 teaches where TDNN (neural network) output feature vectors are cluster by well- known k-means clustering)’
Analyzing the clusters of the output feature vectors with the class label (0110 teaches clustering by k-means foe each of the classes (class label)) 
 Dube et al and Burges et al are both in the field of neural network output with regard to feature vector from data such that the outcome is predictable. 
Therefore it would have been obvious to one having ordinary skill before the effective filing date to modify Dube et al by Burges et al regarding clustering of feature vectors such the clustering technique improve the clustering and provide output probabilities for data tagging to individuals as disclosed by Burges et al in 0110.

Claim 12:
Burges et al further teaches:
The one or more non-transitory computer-accessible storage media of claim 10, wherein to determine the coherence score, the program instructions when executed on or across the one or more processors cause the neural network analysis system to perform a clustering of the output feature vectors using a k-means clustering technique, a mean shift clustering technique, or a density-based spatial clustering technique (0110 teaches clustering by k-means).

Claim 15:
Dube et al (US 2021/0174191) teaches:
A system (figure 2 teaches system), comprising: 
one or more hardware processors with associated memory that implement a neural network analysis system, configured to (figure 2 processor (206 and 210) with memory (206, 224, 201D)): 
obtain a neural network having a plurality of layers that was trained to perform a first inference task (figure 3-4 and 0069 -0073 teach pre-trained (trained) neural network (i.e. 10 layers in 0070), where 0017 detail task such as classification, regression and/or clustering); 
obtain an input dataset for the neural network, wherein individual samples in the input dataset are labeled with class labels corresponding to a second inference task (0023-0024 teaches pre-trained neural network and repurposed the pre-trained neural network (second inference task) of target database for images (individual samples) labeling); 
apply the neural network to the input dataset to obtain output feature vectors of an intermediate layer of the plurality of layers ((0069-0073, especially 0070 teaches neural network having 10 layers to obtain t10 feature vector, which means one feature vector per layer from plurality of layers; figure 4 teaches layers L1-N, where each layer such as L1 output feature vector); 
corresponding to the second inference task to determine a coherence score of the intermediate layer, wherein the coherence score indicates a semantic coherence of features produced by the intermediate layer for performing the second inference task (Above teaches the second inference task; figure 4 and 0076-0080 teaches output feature vector of each layer, L1-Ln, have target example feature vector and sour example feature vectors, where averages are the coherence score); and 
output the coherence score for the intermediate layer (figure 4 teaches output learning rage 410 view as the coherence score for each layer).
Dube et al teaches all the subject matter above with clustering (XXX), but not the following which is taught by Burges et al:
	Clustering the output feature vectors into plurality of clusters (0110 teaches where TDNN (neural network) output feature vectors are cluster by well- known k-means clustering)’
Analyzing the clusters of the output feature vectors with the class label (0110 teaches clustering by k-means foe each of the classes (class label)) 
 Dube et al and Burges et al are both in the field of neural network output with regard to feature vector from data such that the outcome is predictable. 
Therefore it would have been obvious to one having ordinary skill before the effective filing date to modify Dube et al by Burges et al regarding clustering of feature vectors such the clustering technique improve the clustering and provide output probabilities for data tagging to individuals as disclosed by Burges et al in 0110.

Claim 17:
Burges et al further teaches:
The system of claim 15, wherein to determine the coherence score, the neural network analysis system is configured to perform a clustering of the output feature vectors using a k-means clustering technique, a mean shift clustering technique, or a density-based spatial clustering technique (0110 teaches clustering by k-means).


Claims 2, 4-6, 11, 16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Dube et al (US 20121/0174191) and Burges et al (US 2004/0260550) in view of Zhuravlev (US 2019/0311194).
Claim 2:
Dube et al and Burges et al teaches the following subject matter:
The method of claim 1, comprising: 
determining respective coherence scores for multiple ones of the plurality of layers (Dube et al - figure 4 and 0076-0080 teaches each layer L1-Ln output feature vector with averages from target and source, as well as a learning rage score 410).
Dube et al and Burges et al teaches all the subject matter above, but not the following which is taught by Zhuravlev (US 2019/0311194):
selecting a particular intermediate layer from the multiple layers based at least in part on their respective coherence scores (0030, 0048, 0068, claim 8 and 17 teaches selecting layer from plurality of layers of a CNN where the output is that of the initial classifier. One ordinary in the art can see the selection of layer base on comparing to initial output classifier. Figure 2 teaches convolutional neural network (CNN) where each CNN has layers); 
building another machine learning model from output features of the particular intermediate layer to perform the second inference task (figure 3 part 340-350 and figure 4 part 430-440 teaches train a classifier base on the statistical data (score) comparing to the initial output classifier).
Dube et al and Burges et al and Zhuravlev are both in the field of image analysis, especially analysis of feature vector output of each layer of a neural network for evaluation such that the combine outcome is predictable. 
Therefore it would have been obvious to one having ordinary skill before the effective filing date to modify neural network of Dube et al and Burges et al with further analysis of selection of layers of Zhuravlev in order to train a network such trained classifiers may be used as a grapheme classifier suitable for character recognition of images including text in any language and can be trained automatically as disclosed by  Zhuravlev in 0066. 

Claim 4:
Zhuravlev further teaches
The method of claim 2, wherein building the other machine learning model comprises training another neural network, wherein the other neural network is configured to receive the output features of the particular intermediate layer to perform the second inference task (0030, 0048, 0068, claim 8 and 17 teaches selecting layer from plurality of layers of a CNN where the output is that of the initial classifier. One ordinary in the art can see the selection of layer base on comparing to initial output classifier. Figure 2 teaches convolutional neural network (CNN) where each CNN has layers; figure 3 part 340-350 and figure 4 part 430-440 teaches train a classifier base on the statistical data (score) comparing to the initial output classifier).

Claim 5:
Zhuravlev further teaches
The method of claim 2, wherein building the other machine learning model comprises adding one or more additional layers to the neural network, wherein the one or more additional layers are configured to receive the output features of the particular intermediate layer to generate additional output classes (0030, 0048, 0068, claim 8 and 17 teaches selecting layer from plurality of layers of a CNN where the output is that of the initial classifier. One ordinary in the art can see the selection of layer base on comparing to initial output classifier. Figure 2 teaches convolutional neural network (CNN) where each CNN has layers; figure 3 part 340-350 and figure 4 part 430-440 teaches train a classifier base on the statistical data (score) comparing to the initial output classifier. The adding of more layer would depend on the statistical data (score) when compare, where if these values are changing for each layer than the dynamic of the trained neural network will change by the addition/subtraction of layers).

Claim 6:
Zhuravlev further teaches
The method of claim 2, wherein: the first inference task is a first image classification task (0071 teaches first classifier for first set of grapheme images); and the second inference task is a different image classification task to classify images to a different set of classes (0071 teaches second classifier, a different classifier, for second plurality of set of graphemes).

Claim 11:
Dube et al and Burges et al teaches the following subject matter:
The one or more non-transitory computer-accessible storage media of claim 10, wherein the program instructions when executed on or across the one or more processors cause the neural network analysis system to: 
determine respective coherence scores for multiple ones of the plurality of layers (Dube et al - figure 4 and 0076-0080 teaches each layer L1-Ln output feature vector with averages from target and source, as well as a learning rage score 410).
Dube et al and Burges et al teaches all the subject matter above, but not the following which is taught by Zhuravlev (US 2019/0311194):
select a particular intermediate layer from the multiple layers for building another machine learning model based at least in part on their respective coherence scores (0030, 0048, 0068, claim 8 and 17 teaches selecting layer from plurality of layers of a CNN where the output is that of the initial classifier. One ordinary in the art can see the selection of layer base on comparing to initial output classifier. Figure 2 teaches convolutional neural network (CNN) where each CNN has layers; figure 3 part 340-350 and figure 4 part 430-440 teaches train a classifier base on the statistical data (score) comparing to the initial output classifier).
Dube et al and Burges et al and Zhuravlev are both in the field of image analysis, especially analysis of feature vector output of each layer of a neural network for evaluation such that the combine outcome is predictable. 
Therefore it would have been obvious to one having ordinary skill before the effective filing date to modify neural network of Dube et al and Burges et al with further analysis of selection of layers of Zhuravlev in order to train a network such trained classifiers may be used as a grapheme classifier suitable for character recognition of images including text in any language and can be trained automatically as disclosed by  Zhuravlev in 0066. 

Claim 16:
Dube et al and Burges et al teaches the following subject matter:
The system of claim 15, wherein the neural network analysis system is configured to: 
determine respective coherence scores for multiple ones of the plurality of layers (Dube et al - figure 4 and 0076-0080 teaches each layer L1-Ln output feature vector with averages from target and source, as well as a learning rage score 410).
Dube et al and Burges et al teaches all the subject matter above, but not the following which is taught by Zhuravlev (US 2019/0311194):
select a particular intermediate layer from the multiple layers for building another machine learning model based at least in part on their respective coherence scores (0030, 0048, 0068, claim 8 and 17 teaches selecting layer from plurality of layers of a CNN where the output is that of the initial classifier. One ordinary in the art can see the selection of layer base on comparing to initial output classifier. Figure 2 teaches convolutional neural network (CNN) where each CNN has layers; figure 3 part 340-350 and figure 4 part 430-440 teaches train a classifier base on the statistical data (score) comparing to the initial output classifier).
Dube et al and Burges et al and Zhuravlev are both in the field of image analysis, especially analysis of feature vector output of each layer of a neural network for evaluation such that the combine outcome is predictable. 
Therefore it would have been obvious to one having ordinary skill before the effective filing date to modify neural network of Dube et al and Burges et al with further analysis of selection of layers of Zhuravlev in order to train a network such trained classifiers may be used as a grapheme classifier suitable for character recognition of images including text in any language and can be trained automatically as disclosed by  Zhuravlev in 0066. 

Claim 20:
Zhuravlev further teaches
The system of claim 15, wherein: the neural network is trained to perform the first inference task is a first image classification task (0071 teaches first classifier for first set of grapheme images); and the second inference task is a different image classification task to classify images to a different set of classes (0071 teaches second classifier, a different classifier, for second plurality of set of graphemes).

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Dube et al (US 20121/0174191) and Burges et al (US 2004/0260550) in view of Zhuravlev (US 2019/0311194) as applied to claim 2 above, and further in view of Goswami et al (US 2019/0238568).
Claim 3:
Dube et al and Burges et al teaches particular intermediate layer to perform the second inference task above and Zhuravlev teaches all the subject matter above, but not the following which is taught by Goswami et al:
The method of claim 2, wherein building the other machine learning model comprises training a support vector machine (SVM) for the output features of the (0051 teaches training of SVM from layers from feature vectors for training of categorize of undistorted/distorted image).
Dube et al and Burges et al and Zhuravlev and Goswami et al are all in the field of image analysis, especially analysis of feature vector from layers of a neural network, such that the combine outcome is predictable. 
Therefore it would have been obvious to one having ordinary skill before the effective filing date to modify neural network of Dube et al and Burges et al and Zhuravlev by Goswami et al to include training of support vector machine would enable the order to detect distortions in input images, the pattern of the intermediate representations for undistorted images are compared with distorted images at each layer as disclosed by Goswami et al in 0051.

Allowable Subject Matter
Claims 8-9 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. At the time of examiner unable to find prior art teaching determining an entropy of a first distribution of the class labels over the samples in the input dataset; determining a conditional entropy of a second distribution of the class labels over the individual samples in the input dataset given respective clusters assigned to the individual samples' output feature vectors; and subtracting the conditional entropy from the entropy to determine a mutual information metric for the first and second distributions.
Claims 13-14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. At the time of examiner unable to find prior art teaching determining an entropy of a first distribution of the class labels over the samples in the input dataset; determining a conditional entropy of a second distribution of the class labels over the individual samples in the input dataset given respective clusters assigned to the individual samples' output feature vectors; and subtracting the conditional entropy from the entropy to determine a mutual information metric for the first and second distributions.
Claims 18-19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. At the time of examiner unable to find prior art teaching determining an entropy of a first distribution of the class labels over the samples in the input dataset; determining a conditional entropy of a second distribution of the class labels over the individual samples in the input dataset given respective clusters assigned to the individual samples' output feature vectors; and subtracting the conditional entropy from the entropy to determine a mutual information metric for the first and second distributions.
Examiner suggest amendment of object claim language for advancement of application. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Wshah et al (US 2017/0140253) teaches MULTI-LAYER FUSION IN A CONVOLUTIONAL NEURAL NETWORK FOR IMAGE CLASSIFICATION – 0027- provided a method of constructing a convolutional neural network (CNN) for domain adaptation utilizing features extracted from multiple levels, including: selecting a CNN architecture including convolutional layers and fully connected layers corresponding to predetermined features associated with a domain; training the CNN on a source domain data set; selecting a plurality of the predetermined features from many of the convolutional layers across the trained CNN; extracting the selected predetermined features from the trained CNN; concatenating the extracted predetermined features to form a final feature vector; connecting the final feature vector to a fully connected neural network classifier; and, fine-tuning the fully connected neural network classifier from a target domain data set.
SON et al (US 2018/0032867) teach NEURAL NETWORK METHOD AND APPARATUS – 0119 - ularizes parameters of a layer to be additionally trained, based on a function to evaluate a loss of a feature vector, as discussed above. For example, the lightening apparatus 1230 may set a candidate range that minimizes the loss of the feature vector as a lightweight range, or for a corresponding layer, layer portion, or the neural network overall minimizes corresponding errors or losses or maximizes corresponding performances, and thus perform regularization as discussed above. The lightening apparatus 1230 may also quantize parameters, although not shown in FIG. 12, as discussed above.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to TSUNG-YIN TSAI whose telephone number is (571)270-1671. The examiner can normally be reached 7am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on (571) 272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TSUNG YIN TSAI/Primary Examiner, Art Unit 2656