DETAILED ACTION
This action is in response the communications filed on 07/01/2022 in which claims 1, 3, 9, 10, 11, and 18 are amended, claim 12 is canceled, claims 19-21 is added, and therefore claims 1-11 and 13-21 are pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement filed on 08/23/2019 fails to comply with 37 CFR 1.98(a)(1), which requires the following: (1) a list of all patents, publications, applications, or other information submitted for consideration by the Office; (2) U.S. patents and U.S. patent application publications listed in a section separately from citations of other documents (There are two IDS submitted on 08/23/2019. 16/397990 is listed in the NPL section of one of the IDS. Patent applications must be listed in the patent application section); (3) the application number of the application in which the information disclosure statement is being submitted on each page of the list; (4) a column that provides a blank space next to each document to be considered, for the examiner’s initials; and (5) a heading that clearly indicates that the list is an information disclosure statement.  The information disclosure statement has been placed in the application file, but the information referred to therein has not been considered.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 9 and 18 are rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

Claims 9 and 18 recite the limitation "the query.” There is insufficient antecedent basis for this limitation in the claim. The limitation appears to be referring to “query” in claim 1 of the previous version, but has been amended (crossed out), thus inconsistent and indefinite. For examination purposes examiner has interpreted “the query” to any query.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 8-9, 10, 17-18 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Yan ("Multi-Instance Deep Learning: Discover Discriminative Local Anatomies for Bodypart Recognition") in view of Montenegro ("A framework for interactive training of automatic image analysis models based
on learned image representations, active learning and visualization techniques").

In regard to claims 1, 10 and 21, Yan teaches: A system for selecting candidates for labeling and use in training a convolutional neural network (CNN), comprising: a memory to store instructions; a set of one or more hardware processors configured to execute the instructions to cause the system to perform operations including: (Yan, p. 1339 "we trained our models on a 64-bit desktop with i7–2600 (3.4 GHz) CPU, 16GB RAM and NVIDIA GTX-660 3GB GPU.")
processing an annotated medical image to generate a plurality of input candidates for using in fine-tuning a pre-trained CNN; (Yan, p. 1332, abstract "The same situation stays in medical images [medical images] as well. 'Bodypart identity' of a transversal slice… In this work, we design a multistage deep learning framework for image classification and apply it on bodypart recognition."; p. 1339 "As shown in Fig. 1, transversal slices of CT scans are categorized into 12 body sections (classes). "; see e.g. Fig. 1 Human body with 12 parts is an annotated medical image and images of those 12 sections are generated input candidates.; p. 1336, section B. Learning Stage I: Multi-Instance CNN Pre-Train "Given a training set T={Xm,m=1,…,M} with corresponding labels lm [e.g. each image is labeled / annotated]."; p. 1332 "In the boosting stage, the pre-learned CNN is further boosted [fine-tuning a pre-trained CNN]... for image classification.")
wherein the processing includes performing a data augmentation operation (Yan, p. 1338 right col. "… we augment data [e.g. a data augmentation operation] using label-preserving transformations [25], [47]. Specifically we simply apply up to 10% (relate to image size) random translation to increase training data samples."; also see the citations from the following steps. Yan does different kinds of data augmentation - anything creating more data to train the ML model - adding more images by translating them and also using local patches with labels as training samples.)
by (i) cropping the annotated medical image into multiple parts, each cropped part having a label based on the annotated medical image, (Yan, p. 1340 left col. "... cropping operation extracts 50×50 local patches [e.g. multiple parts, cropped parts] from each image with 10-pixel step size."; p. 1336, section B. Learning Stage I: Multi-Instance CNN Pre-Train "Given a training set T={Xm,m=1,…,M} with corresponding labels lm... These local patches [e.g. multiple parts] become the basic training samples of the CNN and their labels are inherited from the original images [having a label based on the annotated medical image], i.e., all xmn∈L(Xm) share the same label lm"; cropping is extracting smaller patches/images from a big one.)
(ii) translating each cropped part into a group of multiple identically labeled patches sharing a same label, and (Yan, p. 1338 right col. "Specifically we simply apply up to 10% (relate to image size) random translation to increase training data samples."; p. 1336, section B. Learning Stage I: Multi-Instance CNN Pre-Train "Each training image, Xm, is divided into a set of local patches defined as L(X_m)={xmn,n=1,…,N}. [each of X_1..X_M is a group] These local patches become the basic training samples of the CNN and their labels are inherited from the original images, i.e., all xmn∈L(Xm) share the same label lm [sharing a same label]"; images are translated and so are their local patches, which inherit labels from the image, i.e. translating each cropped part of the image and form a group of patches sharing a same label.)
(iii) naming each respective group of multiple identically labeled patches as one of the plurality of input candidates for use in training the CNN; (Yan, p. 1337 see Fig. 4 Bags of local patches, (x_1, l_1) ... (x_M, l_M) [e.g. naming groups of patches] are provided to CNN, p. 1336 "These local patches become the basic training samples of the CNN [for use in training CNN] and their labels are inherited from the original images, i.e., all xmn∈L(Xm) share the same label lm [identically labeled patches]"; e.g. all the patches (x_11, x_12... x_1N) in X_1 group are named/labeled with l_1 for use in training CNN. Naming means designate these patches/images to be part of the training set.)
providing the plurality of input candidates having the plurality of identically labeled patches to a pre-trained CNN; (Yan, p. 1336, section B. Learning Stage I: Multi-Instance CNN Pre-Train "Given a training set T={Xm,m=1,…,M} with corresponding labels lm. Each training image, Xm, is divided into a set of local patches defined as L(Xm)={xmn,n=1,…,N}. These local patches become the basic training samples of the CNN [a pre-trained CNN] and their labels are inherited from the original images, i.e., all xmn∈L(Xm) share the same label lm [identically labelled patches]"; Section I. Introduction, In the pre-train stage... as long as one local patch (instance) is correctly labeled, the class of corresponding slice (bag) is considered to be correct.) (Training images are input candidates, which include a set of local patches, and they are identically labelled.)
executing the pre-trained CNN to determine a plurality of probabilities for each of the plurality of input candidates, wherein each of the plurality of probabilities define a likelihood that a unique patch amongst t---he plurality of identically labeled patches of the respective input candidate corresponds to a label; (Yan, p. 1336 "Given a training set T={Xm,m=1,…,M}… where P(lm|xmn;W) is the probability that the local patch xmn is correctly classified as lm using CNN coefficients W...") (For each respective image in T, determining P(lm|xmn;W), which is the probability/likelihood for each unique patch in the pre-train stage/CNN [executing the pre-trained CNN].)


    PNG
    media_image1.png
    68
    255
    media_image1.png
    Greyscale

    PNG
    media_image2.png
    306
    681
    media_image2.png
    Greyscale
Yan does not teach, but Montenegro teaches: identifying a subset of candidates amongst the plurality of input candidates based on the determined probabilities, wherein the subset does not include all of the plurality of input candidates; 

(Montenegro, see section 3.3.2 EGL for Image Selection in Convolutional Neural Networks, Algorithm 3.2 line 8: Tµ = Tµ ∪ Imax; see line 5: the determined probability of the patch; see section 3.2.1 Expected Gradient Length, equation 3-2 "In other words, to select that instances that would impact the greatest change to the current model as if we knew their labels:... Where c is the total number of labels or classes, the Expected Gradient Length algorithm (EGL) works by sorting the Φ values") (Tau_u is a subset of input candidates. Training image set Tau is the input candidates. Because Tau_u is selected from Tau with max sigma values, Tau_u is a subset of Tau and Tau_u does not include all input candidates, and the selection process is based on the Φ EGL values [the determined probabilities].)
labeling the subset of candidates to produce labeled candidates; and (Montenegro, section 3.2 Active Learning Model For CNN " An active learner may pose queries, usually in the form of unlabelled data instances to be labelled by an oracle (e.g., a human annotator"; section 3.2.1 "and then adding them to the training dataset by asking an oracle to give us the ground truth label of those samples.")(The concept of active learner can be applied to the subset of candidates, i.e. querying a human annotator/an oracle/an external source to give a new label to the subset candidates selected above.)

fine-tuning the pre-trained CNN using the labeled candidates. (Montenegro, see section 3.3.2 EGL for Image Selection in Convolutional Neural Networks, Algorithm 3.2 line 10, re-training/update/fine-tuning the model M using the newly selected images.)

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the data selection for re-training CNN of Yan to include the EGL algorithm for selecting images for re-training the model of Montenegro. Doing so would significantly reduce training time obtaining a really good performance. (Montenegro, p. 3"In this work, we introduce the expected gradient length algorithm into the training of deep convolutional neural networks for exudate classification in eye fundus images. Our proposed method was able to significantly reduce training time obtaining a really good performance.")

Claims 10 and 21 recite substantially the same limitation as claim 1, therefore the rejection applied to claim 1 also apply to claims 10 and 21. In addition, Yan teaches: (claim 21) Non-transitory computer readable storage media having instructions stored thereupon that, when executed by a processor and a memory of a system, the instructions cause the system to perform operations comprising (Yan, p. 1339 "we trained our models on a 64-bit desktop with i7–2600 (3.4 GHz) CPU, 16GB RAM and NVIDIA GTX-660 3GB GPU.")

In regard to claims 8 and 17, reference is made to the rejection of claims 1 and 10 respectively, and further, Yan teaches: The system of claim 1, wherein the subset of candidates are misclassified candidates. (Yan, p. 1336, Section C. Learning Stage II: CNN Boosting "In the second stage of our learning framework, the main task is to boost the pre-trained CNN using selected local patches... 1) local patches where the pre-trained CNN has higher responses on wrong classes")
(selecting instances with higher responses on wrong classes for re-training, i.e. assuming those data are misclassified candidates and fed those data for re-training CNN)

In regard to claims 9 and 18, reference is made to the rejection of claims 1 and 10 respectively, and further, Yan does not teach, but Montenegro teaches: The system of claim 1, wherein the at least one hardware processor is also configured to form a set of labeled candidates from previously labeled candidates and (Montenegro, see section 3.3.2 EGL for Image Selection in Convolutional Neural Networks, Algorithm 3.2 line 8: Tµ = Tµ ∪ Imax; see 3.3.1 "This process must be done over all the possible labels for each sample. Once we have computed the Φ values for all the samples, we sort them and select the k samples with higher EGL values."; see 3.3.2 "we compute the interestingness of an image by patchifying the image with a given stride and then densely computing Φ, then sorting the images by their top EGL values and finally adding the labels and patches that belongs to the more interesting image to the training set for further")(Tau_u is a set of labelled candidates. Training image set Tau is previously labelled candidates. Tau_u is formed/selected from Tau.)
the labeled candidates produced in response to the query. (Montenegro, section 3.2 An active learner may pose queries, usually in the form of unlabelled data instances to be labelled by an oracle (e.g., a human annotator"; section 3.2.1 "and then adding them to the training dataset by asking an oracle to give us the ground truth label of those samples.")(The concept of active learner can be applied to the subset of candidates, i.e. querying a human annotator/an oracle/an external source to give new label to the subset candidates selected above.)
The rationale for combining the teachings of Yan and Montenegro is the same as set forth in the rejection of claims 1 and 10 respectively.

Claims 2-3 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Yan in view of Montenegro in further view of Hou ("Patch-based Convolutional Neural Network for Whole Slide Tissue Image Classification").

In regard to claim 2, reference is made to the rejection of claim 1, and further, Yan and Montenegro do not teach, but Hou teaches: The system of claim 1, wherein the at least one hardware processor is also configured to determine an average of the plurality of probabilities for each of the plurality of candidates. (Hou, see section 3. Discriminative patch selection "Therefore, to obtain a more robust P(Hi,j | X), we apply the following two steps: First, we train two CNNs on two different scales in parallel. P(yi | xi,j ; θ) is the averaged prediction of the two CNNs. Second, we simply denoise the probability map P(yi | xi,j ; θ) of each image with a Gaussian kernel to compute P(Hi,j | X).")(P(Hi,j | X) is determined by averaging probabilities P(yi | xi,j ; θ) of two CNNs or weighted average of P(yi | xi,j ; θ) using Gaussian kernel. P(yi | xi,j ; θ) is the plurality of probabilities. P(Hi,j | X) is an average.) 

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the data selection for re-training CNN of the combination of Yan and Montenegro to include spatial smoothing (averaging probabilities) and image-level threshold of Hou. Doing so would ensure at least 1 − P1 or P1 percent of patches are selected for retraining. (Hou, see section 3 "There are two advantages of our method. First, by using the image-level threshold, there are at least 1 − P1 percent of patches that are considered discriminative for each image")

In regard to claim 3, reference is made to the rejection of claim 2, and further, Yan and Montenegro do not teach, but Hou teaches: The system of claim 2, wherein the at least one hardware processor is also configured to select a top percentage of the plurality of identically labeled patches for a candidate when the average is greater than a threshold. (Hou, see section 3 "Patches xi,j that have P(Hi,j | X) larger than a threshold Ti,j are considered discriminative and are selected to continue training the CNN… We obtain the threshold Ti,j for P(Hi,j | X) as follows: We note Si as the set of P(Hi,j | X) values for all xi,j of the i-th image... We introduce the image-level threshold Hi as the P1-th percentile of Si... where P1 and P2 are predefined. The threshold Ti,j is defined as the minimum value between Hi and Ri"; see Fig. 2 "The pixel intensities are the predicted probabilities (output of CNN) that the corresponding patches have the same label as the image.")(selecting P1 or 1-P1 percentage of Si, which is the set of P(Hi,j | X) values corresponding to patches that have same image-level label, when P(Hi,j | X) larger than a threshold Ti,j)

The rationale for combining the teachings of Yan, Montenegro and Hou is the same as set forth in the rejection of claim 2.


In regard to claim 11, reference is made to the rejection of claim 10, and further, Yan and Montenegro do not teach, but Hou teaches: The method of claim 10, further comprising determining an average of the plurality of probabilities for each of the plurality of candidates; and (Hou, see section 3. Discriminative patch selection "Therefore, to obtain a more robust P(Hi,j | X), we apply the following two steps: First, we train two CNNs on two different scales in parallel. P(yi | xi,j ; θ) is the averaged prediction of the two CNNs. Second, we simply denoise the probability map P(yi | xi,j ; θ) of each image with a Gaussian kernel to compute P(Hi,j | X).")(P(Hi,j | X) is determined by averaging probabilities P(yi | xi,j ; θ) of two CNNs or weighted average of P(yi | xi,j ; θ) using Gaussian kernel. P(yi | xi,j ; θ) is the plurality of probabilities. P(Hi,j | X) is an average.) 
selecting a top percentage of the plurality of identically labeled patches for a candidate when the average is greater than a threshold. (Hou, see section 3 "Patches xi,j that have P(Hi,j | X) larger than a threshold Ti,j are considered discriminative and are selected to continue training the CNN… We obtain the threshold Ti,j for P(Hi,j | X) as follows: We note Si as the set of P(Hi,j | X) values for all xi,j of the i-th image... We introduce the image-level threshold Hi as the P1-th percentile of Si... where P1 and P2 are predefined. The threshold Ti,j is defined as the minimum value between Hi and Ri"; see Fig. 2 "The pixel intensities are the predicted probabilities (output of CNN) that the corresponding patches have the same label as the image.")(selecting P1 or 1-P1 percentage of Si, which is the set of P(Hi,j | X) values corresponding to patches that have same image-level label, when P(Hi,j | X) larger than a threshold Ti,j)

The rationale for combining the teachings of Yan, Montenegro and Hou is the same as set forth in the rejection of claim 2.

Claims 4-5 and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Yan in view of Montenegro in further view of Maggiori ("Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification").
In regard to claims 4 and 13, reference is made to the rejection of claims 1 and 10 respectively, and further, Yan and Montenegro do not teach, but Maggiori teaches: The system of claim 1, wherein the at least one hardware processor is also configured to determine an entropy of each of the subset of candidates based on the plurality of probabilities of the corresponding candidate. (Maggiori, p. 647 "The loss function L quantifies the misclassification by comparing the target label vectors y(i) and the predicted label vectors y^(i) , for n training samples i=1…n . In this paper, we use the common cross-entropy loss, defined as…”) 
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the data selection for re-training CNN of the combination of Yan and Montenegro to include cross-entropy of Maggiori. Doing so provide fast convergence rates when training neural networks and is numerically stable. (Maggiori, p. 647 "The cross-entropy loss has fast convergence rates when training neural networks (compared with, for instance, the Euclidean distance between y and ˆy) and is numerically stable when coupled with softmax normalization")


    PNG
    media_image3.png
    69
    216
    media_image3.png
    Greyscale
In regard to claims 5 and 14, reference is made to the rejection of claims 4 and 13 respectively, and further, Yan and Montenegro do not teach, but Maggiori teaches: The system of claim 4, wherein the at least one hardware processor is configured to determine the entropy Et of the corresponding candidate i using the following equation 


    PNG
    media_image4.png
    86
    270
    media_image4.png
    Greyscale
where: m is the number of patches for the corresponding candidate; |Y| is the number of possible labels; and pi j,k is the probability that patch j candidate i corresponds to label k. (Maggiori, p. 647 "The loss function L quantifies the misclassification by comparing the target label vectors y(i) and the predicted label vectors y^(i) , for n training samples i=1…n . In this paper, we use the common cross-entropy loss, defined as

(Maggiori, "n training samples i=1…n"; "a set L of possible classes"; yki is the probability.)

The rationale for combining the teachings of Yan, Montenegro and Maggiori is the same as set forth in the rejection of claims 4 and 13 respectively.

Claims 6-7 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Yan in view of Montenegro in further view of Chakraborty ("Active Batch Selection via Convex Relaxations with Guaranteed Solution Bounds").
In regard to claims 6 and 15, reference is made to the rejection of claims 1 and 10 respectively, and further, Yan and Montenegro do not teach, but Chakraborty teaches: The system of claim 1, wherein the at least one hardware processor is also configured to determine a diversity of each of the subset of candidates based on the plurality of probabilities of the corresponding candidate. (Chakraborty, p. 1947, SECTION 3 "We quantify the quality of a batch of selected samples based on their informativeness and diversity") 
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the data selection for re-training CNN of the combination of Yan and Montenegro to include informativeness and diversity of selected samples of Chakraborty. Doing so would select samples such that each point furnishes valuable information and having minimal redundancy among them. (Chakraborty, "that is, we would like to select a batch of samples such that each point furnishes valuable information and the selected samples have minimal redundancy among them.")


    PNG
    media_image5.png
    52
    266
    media_image5.png
    Greyscale
In regard to claims 7 and 16, reference is made to the rejection of claims 6 and 15 respectively, and further, Yan and Montenegro do not teach, but Chakraborty teaches: The system of claim 6, wherein the at least one hardware processor is configured to determine the diversity d, of the corresponding candidate i using the following equation:


    PNG
    media_image6.png
    64
    220
    media_image6.png
    Greyscale
m is the number of patches for the corresponding candidate; |Y| is the number of possible labels; pi j,k is the probability that patch j candidate i corresponds to label k; and pi l,k is the probability that patch l candidate i corresponds to label k. (Chakraborty, p. 1947, SECTION 3 The Proposed Batch Mode Active Learning Formulation "Let Y denote the set of possible classes in the problem."; "In addition to c, a divergence matrix R∈R|Ut|×|Ut| is also defined whose (i,j)th entry is a measure of redundancy between unlabeled points xi and xj (higher the value of Rij, lower the redundancy). The divergence measure between two points is an estimate of the amount of information overlap between the points, which is captured by the symmetric Kullback Leibler divergence. Let pi and pj denote the vectors of posterior probabilities of two points xi and xj in the unlabeled pool with respect to all the classes. Then, the (i,j)th entry in matrix R is eq  ual to the symmetric KL divergence between the two vectors of probability values [29]:... Specifically, we define a binary vector m with |Ut| entries (m∈{0,1}|Ut|×1) where each entry mi denotes whether the corresponding unlabeled point xi")

    PNG
    media_image7.png
    37
    74
    media_image7.png
    Greyscale
(mT*D*m when the m are binary vectors will produce the required sum, i.e. summing the appropriate (selected) entries.)
The rationale for combining the teachings of Yan, Montenegro and Chakraborty is the same as set forth in the rejection of claims 6 and 15 respectively.

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Yan in view of Montenegro in further view of Grinsven ("Fast Convolutional Neural Network Training Using Selective Data Sampling: Application to Hemorrhage Detection in Color Fundus Images").

In regard to claim 19, reference is made to the rejection of claim 1, and further, Yan and Montenegro do not teach, but Grinsven teaches: The system of claim 1: wherein each one of the plurality of input candidates named constitutes a single Annotation Unit (AU); and wherein the method further comprises labeling the entirety of each single AU as either informative or non-informative. (Grinsven, p. 1274 left col. "In this paper, we propose an innovative sampling heuristic to identify informative training samples in a common medical image classification task, namely abnormality detection. The proposed heuristic will dynamically increase the probability of misclassified normal samples to be selected in each training iteration."; p. 1275 left col. "A dynamic CNN training strategy is presented where informative normal samples [informative] are dynamically selected at each training epoch from a large pool of medical images."; p. 1273 abstract, "Weights are assigned to the training samples and informative samples are more likely to be included in the next CNN training iteration."; also see p. 1276 left col. section D. Selective Sampling; a medical image as a single unit is identified as informative, and others not selected are non-informative samples.)

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Yan and Montenegro to include the method to identify informative samples of Grinsven. Doing so would help to increase the efficiency of the CNN learning process and to reduce the training time. (Grinsven, p. 1273 right col. "An approach to identify informative normal samples will help to increase the efficiency of the CNN learning process and to reduce the training time.")

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Yan in view of Montenegro in further view of Oliveira ("A Data Augmentation Methodology to Improve Age Estimation Using Convolutional Neural Networks").

In regard to claim 20, reference is made to the rejection of claim 1, and further, Yan and Montenegro do not teach, but Oliveira teaches: The system of claim 1, wherein translating each cropped part into the group of multiple identically labeled patches comprises at least one of: (i) enlarging each cropped part into a larger patch when compared with an original size for the cropped part and re-cropping the larger patch back to the original size to yield a new patch as part of performing the data augmentation operation; (ii) increasing a pixel size of each cropped part to form an increased pixel size patch when compared with an original size for the cropped part and re-cropping the increased pixel size patch to a centered pre-defined quantity of pixels as a new patch after performing the data augmentation operation; (iii) applying a configurable percentage of a resized bounding box to each cropped part in vertical and horizontal directions to yield a new patch as part of performing the data augmentation operation; (Oliveira, p. 92 right col. "This process is illustrated in Figure 7, in which the top left coordinate of the face (Point A), becomes the Point B. The size of the bounding box increases from Sa to Sb while keeping the same center and enlarging the face region in 14%.") (iv) applying a rotation based data augmentation operation at an identified center of a polyp location within one of the cropped parts to yield a new patch as part of performing the data augmentation operation; (v) applying a rotation-plus-scale based data augmentation operation by extracting multiple different physical sizes from one of the cropped parts and by further rotating longitudinal and cross-sectional vessel planes around a vessel axis to yield a new patch as part of performing the data augmentation operation; and (vi) rotating one of the cropped parts eight times by mirroring and flipping the respective cropped part into mirrored and flipped translations to yield new patches as part of performing the data augmentation operation. (Oliveira, p. 92 right col. "When the face is detected near the limits of the image and the increased region exceeds the image 
    PNG
    media_image8.png
    167
    287
    media_image8.png
    Greyscale
boundaries, the original face is mirrored in eight directions and a face crop is applied, as it can be seen in Figure 8."; p.90 "This article proposes a methodology to perform data augmentation in the context of age estimation from face images.")

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Yan and Montenegro to include the method to crop images of Oliveira. Doing so would compensate the difference of the actual input image and the size of the input to a CNN model, e.g. GoogLeNet CNN. (Oliveira, p. 92 right col. "The crop applied by GoogLeNet CNN is 224×224 pixels and the input image received by Caffe Framework is 256×256 pixels, this is 14% bigger than crop size. So, a face region expansion has been performed in order to compensate for this difference.")
Response to Arguments
Section III IDS submission (see p. 15 middle): “Applicants will review and re-submit a new IDS…”
Examiner answers: There are two IDS submitted on 08/23/2019. 16/397990 is listed in the NPL section of one of the IDS. Patent applications must be listed in the patent application section.

Applicant's amendments with respect to claim objections have been fully considered and are sufficient to overcome the rejection. The objections to the claims have been withdrawn.
Applicant's arguments with respect to the rejection of the claims under 35 U.S.C. 103 have been fully considered but they are moot:
Applicant argues: (see p. 21 middle): “… neither Yan nor Montenegro disclose: processing an annotated medical image… wherein the processing includes performing a data augmentation operation by (i) cropping… (ii) translating… (iii) naming…” 
Examiner answers: the arguments do not apply to the new citation (Yan) being used in the current rejection. Yan teaches an annotated medical image in Fig. 1 and abstract. Specifically, transversal slices of CT scans with 12 classes are annotated medical images.  Further Yan does different kinds of data augmentation - anything creating more data to train the ML model - adding more images by translating them and also using local patches with labels as training samples. Further (i) cropping is extracting smaller patches/images from a big one, (ii) images are translated and so are their local patches, which inherit labels from the image, and (iii) naming means designate these patches/images to be part of the training set. See details in 103 section.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SU-TING CHUANG whose telephone number is (408)918-7519.  The examiner can normally be reached on Monday - Thursday 8-5 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571)272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.C./Examiner, Art Unit 2122                 

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122