DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 2/17/2020 was filed and considered. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 7, 10, 12, 15 and 17 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Dube et al (US 2021/0174191.
Claim 1:
Dube et al (US 2021/0174191) teaches:
A method comprising: 
obtaining a neural network having a plurality of layers that was trained to perform a first inference task (figure 3-4 and 0069 -0073 teach pre-trained (trained) neural network (i.e. 10 layers in 0070), where 0017 detail task such as classification, regression and/or clustering); 
labeling individual samples in an input dataset for the neural network with class labels corresponding to a second inference task (0023-0024 teaches pre-trained neural network and repurposed the pre-trained neural network (second inference task) of target database for images (individual samples) labeling); 
applying the neural network to the input dataset to obtain output feature vectors of an intermediate layer of the plurality of layers (0069-0073, especially 0070 teaches neural network having 10 layers to obtain t10 feature vector, which means one feature vector per layer from plurality of layers; figure 4 teaches layers L1-N, where each layer such as L1 output feature vector); 
analyzing the output feature vectors to determine a coherence score of the intermediate layer, wherein the coherence score indicates a semantic coherence of features produced by the intermediate layer for performing the second inference task (figure 4 and 0076-0080 teaches output feature vector of each layer, L1-Ln, have target example feature vector and sour example feature vectors, where averages are the coherence score); and 
outputting the coherence score for the intermediate layer (figure 4 teaches output learning rate 410 view as the coherence score for each layer).

Claim 7:
The method of claim 1, wherein determining the coherence score comprises performing a clustering of the output feature vectors of by the intermediate layer (0017 teaches parts of neural network for classification, regression and/or clustering; 0068 teaches feature vector with difference calculation such as grouped/cluster differently into one or more. Above shows that feature vector output from each layer.).


Claim 10:
Dube et al (US 2021/0174191) teaches:
One or more non-transitory computer-accessible storage media storing program (0091 teaches computer readable storage medium not to be construed as being transitory signals) instructions that when executed on or across one or more processors implement neural network analysis system and cause the neural network analysis system to: 
obtain a neural network having a plurality of layers that was trained to perform a first inference task (figure 3-4 and 0069 -0073 teach pre-trained (trained) neural network (i.e. 10 layers in 0070), where 0017 detail task such as classification, regression and/or clustering); 
obtain an input dataset for the neural network, wherein individual samples in the input dataset are labeled with class labels corresponding to a second inference task (0023-0024 teaches pre-trained neural network and repurposed the pre-trained neural network (second inference task) of target database for images (individual samples) labeling); 
apply the neural network to the input dataset to obtain output feature vectors of an intermediate layer of the plurality of layers (0069-0073, especially 0070 teaches neural network having 10 layers to obtain t10 feature vector, which means one feature vector per layer from plurality of layers; figure 4 teaches layers L1-N, where each layer such as L1 output feature vector);
analyze the output feature vectors to determine a coherence score of the intermediate layer, wherein the coherence score indicates a semantic coherence of features produced by the intermediate layer for performing the second inference task (figure 4 and 0076-0080 teaches output feature vector of each layer, L1-Ln, have target example feature vector and sour example feature vectors, where averages are the coherence score); and 
output the coherence score for the intermediate layer (figure 4 teaches output learning rate 410 view as the coherence score for each layer).
Claim 12:
The one or more non-transitory computer-accessible storage media of claim 10, wherein to determine the coherence score, the program instructions when executed on or across the one or more processors cause the neural network analysis system to perform a clustering of the output feature vectors of by the intermediate layer (0017 teaches parts of neural network for classification, regression and/or clustering; 0068 teaches feature vector with difference calculation such as grouped/cluster differently into one or more. Above shows that feature vector output from each layer).

Claim 15:
Dube et al (US 2021/0174191) teaches:
A system (figure 2 teaches system), comprising: 
one or more hardware processors with associated memory that implement a neural network analysis system, configured to (figure 2 processor (206 and 210) with memory (206, 224, 201D)): 
obtain a neural network having a plurality of layers that was trained to perform a first inference task (figure 3-4 and 0069 -0073 teach pre-trained (trained) neural network (i.e. 10 layers in 0070), where 0017 detail task such as classification, regression and/or clustering); 
obtain an input dataset for the neural network, wherein individual samples in the input dataset are labeled with class labels corresponding to a second inference task (0023-0024 teaches pre-trained neural network and repurposed the pre-trained neural network (second inference task) of target database for images (individual samples) labeling); 
apply the neural network to the input dataset to obtain output feature vectors of an intermediate layer of the plurality of layers ((0069-0073, especially 0070 teaches neural network having 10 layers to obtain t10 feature vector, which means one feature vector per layer from plurality of layers; figure 4 teaches layers L1-N, where each layer such as L1 output feature vector); 
analyze the output feature vectors to determine a coherence score of the intermediate layer, wherein the coherence score indicates a semantic coherence of features produced by the intermediate layer for performing the second inference task (figure 4 and 0076-0080 teaches output feature vector of each layer, L1-Ln, have target example feature vector and sour example feature vectors, where averages are the coherence score); and 
output the coherence score for the intermediate layer (figure 4 teaches output learning rage 410 view as the coherence score for each layer).

Claim 17:
The system of claim 15, wherein to determine the coherence score, the neural network analysis system is configured to perform a clustering of the output feature vectors of by the intermediate layer (0017 teaches parts of neural network for classification, regression and/or clustering; 0068 teaches feature vector with difference calculation such as grouped/cluster differently into one or more. Above shows that feature vector output from each layer).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 4-6, 11, 16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Dube et al (US 20121/0174191) in view of Zhuravlev (US 2019/0311194).

Claim 2:
Dube et al teaches the following subject matter:
The method of claim 1, comprising: 
determining respective coherence scores for multiple ones of the plurality of layers (figure 4 and 0076-0080 teaches each layer L1-Ln output feature vector with averages from target and source, as well as a learning rage score 410).
Dube et al teaches all the subject matter above, but not the following which is taught by Zhuravlev (US 2019/0311194):
selecting a particular intermediate layer from the multiple layers based at least in part on their respective coherence scores (0030, 0048, 0068, claim 8 and 17 teaches selecting layer from plurality of layers of a CNN where the output is that of the initial classifier. One ordinary in the art can see the selection of layer base on comparing to initial output classifier. Figure 2 teaches convolutional neural network (CNN) where each CNN has layers); 
building another machine learning model from output features of the particular intermediate layer to perform the second inference task (figure 3 part 340-350 and figure 4 part 430-440 teaches train a classifier base on the statistical data (score) comparing to the initial output classifier).
Dube et al and Zhuravlev are both in the field of image analysis, especially analysis of feature vector output of each layer of a neural network for evaluation such that the combine outcome is predictable. 
Therefore it would have been obvious to one having ordinary skill before the effective filing date to modify neural network of Dube et al with further analysis of selection of layers of Zhuravlev in order to train a network such trained classifiers may be used as a grapheme classifier suitable for character recognition of images including text in any language and can be trained automatically as disclosed by  Zhuravlev in 0066. 
Claim 4:
Zhuravlev further teaches
The method of claim 2, wherein building the other machine learning model comprises training another neural network, wherein the other neural network is configured to receive the output features of the particular intermediate layer to perform the second inference task (0030, 0048, 0068, claim 8 and 17 teaches selecting layer from plurality of layers of a CNN where the output is that of the initial classifier. One ordinary in the art can see the selection of layer base on comparing to initial output classifier. Figure 2 teaches convolutional neural network (CNN) where each CNN has layers; figure 3 part 340-350 and figure 4 part 430-440 teaches train a classifier base on the statistical data (score) comparing to the initial output classifier).

Claim 5:
Zhuravlev further teaches
The method of claim 2, wherein building the other machine learning model comprises adding one or more additional layers to the neural network, wherein the one or more additional layers are configured to receive the output features of the particular intermediate layer to generate additional output classes (0030, 0048, 0068, claim 8 and 17 teaches selecting layer from plurality of layers of a CNN where the output is that of the initial classifier. One ordinary in the art can see the selection of layer base on comparing to initial output classifier. Figure 2 teaches convolutional neural network (CNN) where each CNN has layers; figure 3 part 340-350 and figure 4 part 430-440 teaches train a classifier base on the statistical data (score) comparing to the initial output classifier. The adding of more layer would depend on the statistical data (score) when compare, where if these values are changing for each layer than the dynamic of the trained neural network will change by the addition/subtraction of layers).
Claim 6:
Zhuravlev further teaches
The method of claim 2, wherein: the first inference task is a first image classification task (0071 teaches first classifier for first set of grapheme images); and the second inference task is a different image classification task to classify images to a different set of classes (0071 teaches second classifier, a different classifier, for second plurality of set of graphemes).

Claim 11:
Dube et al teaches the following subject matter:
The one or more non-transitory computer-accessible storage media of claim 10, wherein the program instructions when executed on or across the one or more processors cause the neural network analysis system to: 
determine respective coherence scores for multiple ones of the plurality of layers (figure 4 and 0076-0080 teaches each layer L1-Ln output feature vector with averages from target and source, as well as a learning rage score 410).
Dube et al teaches all the subject matter above, but not the following which is taught by Zhuravlev (US 2019/0311194):
select a particular intermediate layer from the multiple layers for building another machine learning model based at least in part on their respective coherence scores (0030, 0048, 0068, claim 8 and 17 teaches selecting layer from plurality of layers of a CNN where the output is that of the initial classifier. One ordinary in the art can see the selection of layer base on comparing to initial output classifier. Figure 2 teaches convolutional neural network (CNN) where each CNN has layers; figure 3 part 340-350 and figure 4 part 430-440 teaches train a classifier base on the statistical data (score) comparing to the initial output classifier).
Dube et al and Zhuravlev are both in the field of image analysis, especially analysis of feature vector output of each layer of a neural network for evaluation such that the combine outcome is predictable. 
Therefore it would have been obvious to one having ordinary skill before the effective filing date to modify neural network of Dube et al with further analysis of selection of layers of Zhuravlev in order to train a network such trained classifiers may be used as a grapheme classifier suitable for character recognition of images including text in any language and can be trained automatically as disclosed by  Zhuravlev in 0066. 

Claim 16:
Dube et al teaches the following subject matter:
The system of claim 15, wherein the neural network analysis system is configured to: 
determine respective coherence scores for multiple ones of the plurality of layers (figure 4 and 0076-0080 teaches each layer L1-Ln output feature vector with averages from target and source, as well as a learning rage score 410).
Dube et al teaches all the subject matter above, but not the following which is taught by Zhuravlev (US 2019/0311194):
select a particular intermediate layer from the multiple layers for building another machine learning model based at least in part on their respective coherence scores (0030, 0048, 0068, claim 8 and 17 teaches selecting layer from plurality of layers of a CNN where the output is that of the initial classifier. One ordinary in the art can see the selection of layer base on comparing to initial output classifier. Figure 2 teaches convolutional neural network (CNN) where each CNN has layers; figure 3 part 340-350 and figure 4 part 430-440 teaches train a classifier base on the statistical data (score) comparing to the initial output classifier).
Dube et al and Zhuravlev are both in the field of image analysis, especially analysis of feature vector output of each layer of a neural network for evaluation such that the combine outcome is predictable. 
Therefore it would have been obvious to one having ordinary skill before the effective filing date to modify neural network of Dube et al with further analysis of selection of layers of Zhuravlev in order to train a network such trained classifiers may be used as a grapheme classifier suitable for character recognition of images including text in any language and can be trained automatically as disclosed by  Zhuravlev in 0066. 

Claim 20:
Zhuravlev further teaches
The system of claim 15, wherein: the neural network is trained to perform the first inference task is a first image classification task (0071 teaches first classifier for first set of grapheme images); and the second inference task is a different image classification task to classify images to a different set of classes (0071 teaches second classifier, a different classifier, for second plurality of set of graphemes).

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Dube et al (US 20121/0174191) in view of Zhuravlev (US 2019/0311194) as applied to claim 2 above, and further in view of Goswami et al (US 2019/0238568).
Claim 3:
Dube et al teaches particular intermediate layer to perform the second inference task above and Zhuravlev teaches all the subject matter above, but not the following which is taught by Goswami et al:
The method of claim 2, wherein building the other machine learning model comprises training a support vector machine (SVM) for the output features of the (0051 teaches training of SVM from layers from feature vectors for training of categorize of undistorted/distorted image).
Dube et al and Zhuravlev and Goswami et al are all in the field of image analysis, especially analysis of feature vector from layers of a neural network, such that the combine outcome is predictable. 
Therefore it would have been obvious to one having ordinary skill before the effective filing date to modify neural network of Dube et al and Zhuravlev by Goswami et al to include training of support vector machine would enable the order to detect distortions in input images, the pattern of the intermediate representations for undistorted images are compared with distorted images at each layer as disclosed by Goswami et al in 0051.

Allowable Subject Matter
Claims 8-9 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. At the time of examiner unable to find prior art teaching determining an entropy of a first distribution of the class labels over the samples in the input dataset; determining a conditional entropy of a second distribution of the class labels over the individual samples in the input dataset given respective clusters assigned to the individual samples' output feature vectors; and subtracting the conditional entropy from the entropy to determine a mutual information metric for the first and second distributions.
Claims 13-14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. At the time of examiner unable to find prior art teaching determining an entropy of a first distribution of the class labels over the samples in the input dataset; determining a conditional entropy of a second distribution of the class labels over the individual samples in the input dataset given respective clusters assigned to the individual samples' output feature vectors; and subtracting the conditional entropy from the entropy to determine a mutual information metric for the first and second distributions.
Claims 18-19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. At the time of examiner unable to find prior art teaching determining an entropy of a first distribution of the class labels over the samples in the input dataset; determining a conditional entropy of a second distribution of the class labels over the individual samples in the input dataset given respective clusters assigned to the individual samples' output feature vectors; and subtracting the conditional entropy from the entropy to determine a mutual information metric for the first and second distributions.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Wshah et al (US 2017/0140253) teaches MULTI-LAYER FUSION IN A CONVOLUTIONAL NEURAL NETWORK FOR IMAGE CLASSIFICATION – 0027- provided a method of constructing a convolutional neural network (CNN) for domain adaptation utilizing features extracted from multiple levels, including: selecting a CNN architecture including convolutional layers and fully connected layers corresponding to predetermined features associated with a domain; training the CNN on a source domain data set; selecting a plurality of the predetermined features from many of the convolutional layers across the trained CNN; extracting the selected predetermined features from the trained CNN; concatenating the extracted predetermined features to form a final feature vector; connecting the final feature vector to a fully connected neural network classifier; and, fine-tuning the fully connected neural network classifier from a target domain data set.
SON et al (US 2018/0032867) teach NEURAL NETWORK METHOD AND APPARATUS – 0119 - ularizes parameters of a layer to be additionally trained, based on a function to evaluate a loss of a feature vector, as discussed above. For example, the lightening apparatus 1230 may set a candidate range that minimizes the loss of the feature vector as a lightweight range, or for a corresponding layer, layer portion, or the neural network overall minimizes corresponding errors or losses or maximizes corresponding performances, and thus perform regularization as discussed above. The lightening apparatus 1230 may also quantize parameters, although not shown in FIG. 12, as discussed above.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to TSUNG-YIN TSAI whose telephone number is (571)270-1671. The examiner can normally be reached 7am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on (571) 272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TSUNG YIN TSAI/Primary Examiner, Art Unit 2656