DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Objections
Claim 12 objected to because of the following informalities:  line 4 recites, “a feature 6Application No. 16/367,955 Reply to Office Action of October 8, 2020 extracted from the Currently Amended data, a feature computed from the Currently Amended data”. The term “Currently Amended” should be cancelled. Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:


Claim 4 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Claim 4 recites, “the positive characteristic and/or the negative characteristic does not form part of the set of pixel or voxel intensities for the array of pixel or voxel positions” (emphasis added). It appears that specification does not any support for this limitation. Although, paragraph [0062] (as published) discloses, “each set of training data 60 comprises a respective set of medical imaging data, for example a set of pixel or voxel intensities for an array of pixel or voxel positions” and paragraph [0189] discloses, “For example, operations may be performed on data comprising sets of pixel or voxel positions and associated intensities”. Neither of those paragraphs, explicitly disclose, “the positive characteristic and/or the negative characteristic does not form part of the set of pixel or voxel intensities for the array of pixel or voxel positions”. Therefore, above claimed feature raises the new matter issue. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-3, 5, 8-13, and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aizawa et al. (US 20190362233, hereinafter “Aizawa”), and further in view of Kuroda et al. (US 20180307946, hereinafter “Kuroda”).
Regarding claim 1, Aizawa discloses, 
A system comprising processing circuitry configured to perform training of a model for predicting from input data at least one predicted output, wherein to perform training of the model (A neural network trained on similar triplets generates similarity rankings, e.g., as part of the operation of a recommendation engine. The trained neural network (i.e., the layer configuration and specific parameters generated during the training process 400 of FIG. 4) is used to generate a feature embedding of any data point(s) that is passed through the trained neural network, Para. [0075]-[0082] and also, see Fig. 1), the processing circuitry is configured to: 
“receive a plurality of training data sets (FIG. 1 illustrates a process 100 for training a neural network for a similarity ranking engine or recommendation engine using training data, such as images, in the form of triplets. The process 100 begins with collecting data (110) for forming into triplets. This data may include images, waveform representations of audio clips, bag-of-words representations of text, or any other quantity that can be expressed numerically, Para. [0044])”; 
“receive from a user a selection of a first characteristic from a set of characteristic as a positive characteristic, wherein the user considers values for the first characteristic to be relevant to prediction of the at least one predicted output (The reference data point and positive data point(s) may be classified beforehand as data corresponding to similar items. These items may be considered similar because they are visually similar, i.e., they look similar. For instance, the reference data point may represent a particular article of clothing, such as a black dress, and the positive data point may represent a similar (but not identical) article of clothing, such as another black dress, Paras. [0046]-[0047])”; 
“receive from the user a selection of a second characteristic of from a set of characteristic as a negative characteristic, wherein the user considers values for the second characteristic to be less relevant or irrelevant to prediction of the at least one predicted output (The reference data point and the negative data point(s) may or may not have been classified beforehand as data corresponding to a dissimilar item(s). For instance, the negative data point(s) for a given reference point may be selected by randomly sampling from the set of all data points that are not positive data points for the given reference data point, Paras. [0048]-0049])”.
However, Aizawa does not explicitly disclose, “perform positive supervision of the model using the positive characteristic such that the training of the model to predict the at least one predicted output is sensitive to values for the first characteristic and perform negative supervision of the model using the negative characteristic such that the training of the model to predict the at least one predicted output is insensitive to values for the second characteristic.”
In a similar field of endeavor, Kuroda discloses, “perform positive supervision of the model using the positive characteristic such that the training of the model to predict the at least one predicted output is sensitive to values for the first characteristic (The evaluation acquisition unit 32 acquires any one of a " positive evaluation" indicating that the content of the input data coincides with the label….A case that the evaluation of a label is the positive evaluation means that the content of the input data belongs to a category represented by the label, Paras. [0090]-[0094]) and perform negative supervision of the model using the negative characteristic such that the training of the model to predict the at least one predicted output is insensitive to values for the second characteristic (The evaluation acquisition unit 32 acquires a " negative evaluation" indicating that the content of the input data does not coincide with the label….A case that the evaluation of a label is the negative evaluation means that the content of the input data does not belong to a category represented by the label, Paras. [0090]-[0084]).”

Regarding claim 2, the combination of Aizawa and Kuroda discloses everything claimed as applied above (see claim 1), further Aizawa discloses, “wherein the model comprises a neural network (neural network 440 and 540 in Fig. 4 and Fig. 5, respectively).”
Regarding claim 3, the combination of Aizawa and Kuroda discloses everything claimed as applied above (see claim 1), however Aizawa does not disclose, “wherein each of the plurality of training data sets comprises respective image data and the training of the model further comprises performing supervision of the model using the image data such that the model is trained to use the image data in prediction.”	In a similar field of endeavor, Kuroda discloses, “wherein each of the plurality of training data sets comprises respective image data (A training system 100 (see FIG. 4) according to an embodiment is a system which trains a parameter of a recognition unit 11 (see FIG. 1) recognizing content of recognition target data. The recognition target data is data to be recognized by a computer, and examples of the recognition target data include image data, Paras. [0073]-[0075]) and the training of the model further comprises performing supervision of the model using the image data such that the model is trained to use the image data in prediction  (The artificial neurons for output externally output the recognition score. The number of artificial neurons for output to be prepared is the same as the number of labels. In other words, the recognition score is output for each label in the neural network. In the example in FIG. 2, three artificial neurons are prepared correspondingly to three labels "dog", "person", and "flower". The artificial neurons for output a recognition score B1 corresponding to the label of "dog", a recognition score B2 corresponding to the label of "person", and a recognition score B3 corresponding to the label of "flower", Paras. [0076]-[0080]).”
Therefore it would have been obvious to one of ordinary skill in art before the effective filing date of the claimed invention to modify Aizawa by specifically providing wherein each of the plurality of training data sets comprises respective image data and the training of the model further comprises performing supervision of the model using the image data such that the model is trained to use the image data in prediction, as taught by Kuroda for the purpose providing technique to prevent training based on incorrect evaluation from being performed.
Regarding claim 5, the combination of Aizawa and Kuroda discloses everything claimed as applied above (see claim 1), however Aizawa does not disclose, “wherein the processing circuitry is further configured to receive a target data set and to process the target data set using the trained model to predict said at least one predicted output for the target data set.”(A training system 100 (see FIG. 4) according to an embodiment is a system which trains a parameter of a recognition unit 11 (see FIG. 1) recognizing content of recognition target data. The recognition target data is data to be recognized by a computer, and examples of the recognition target data include image data, Paras. [0073]-[0075]).”
Therefore it would have been obvious to one of ordinary skill in art before the effective filing date of the claimed invention to modify Aizawa by specifically providing wherein the processing circuitry is further configured to receive a target data set and to process the target data set using the trained model to predict said at least one predicted output for the target data set, as taught by Kuroda for the purpose providing technique to prevent training based on incorrect evaluation from being performed.
Regarding claim 8, the combination of Aizawa and Kuroda discloses everything claimed as applied above (see claim 1), further Aizawa discloses, “wherein the supervision of the model using the negative characteristic is performed using gradient reversal (Optimizing the objective function, for example, using stochastic gradient descent, results in a model that can project the item into vector space in which projections of images of similar objects are nearby and projections of images of dissimilar objects are far apart, Para. [0029]).”
Regarding claim 9, the combination of Aizawa and Kuroda discloses everything claimed as applied above (see claim 1), further Aizawa discloses, “wherein the positive (the neural network is trained using triplets that each include an anchor image of an item, a positive image of a similar item, and a negative image of a dissimilar item ("similar triplets") instead of triplets that each include an anchor image of an item, a positive image of the same item, and a negative image of a different item ("same triplets"), Para. [0026]).”
Regarding claim 10, the combination of Aizawa and Kuroda discloses everything claimed as applied above (see claim 1), further Aizawa discloses, “wherein the positive characteristic comprises at least one of a feature extracted from the training data set, a feature computed from the training data set, a manually defined feature, a feature extracted from the input data, a feature computed from the input data (The process 100 begins with collecting data (110) for forming into triplets. This data may include images, waveform representations of audio clips, bag-of-words representations of text, or any other quantity that can be expressed numerically. The data can be collected using any of a variety of methods, including keyword-searching a collaboratively filtered (image) database or simply reviewing a set of images, Para. [0044]).”
Regarding claim 11, the combination of Aizawa and Kuroda discloses everything claimed as applied above (see claim 1), further Aizawa discloses, “wherein the negative characteristic comprises at least one of: scanner manufacturer, acquiring institution, image modality, enumerated protocol variant, image scale, intensity, acquisition direction, presence of image artifacts, a data type that is unrelated to a (The reference data point and the negative data point(s) may or may not have been classified beforehand as data corresponding to a dissimilar item(s). For instance, the negative data point(s) for a given reference point may be selected by randomly sampling from the set of all data points that are not positive data points for the given reference data point, Paras. [0046]-[0048]).”
Regarding claim 12, the combination of Aizawa and Kuroda discloses everything claimed as applied above (see claim 1), further Aizawa discloses, “wherein the negative characteristic comprises at least one of a feature extracted or computed from a training data set, a manually defined feature, a feature extracted from the Currently Amended data, a feature computed from the Currently Amended data. (The process 100 begins with collecting data (110) for forming into triplets. This data may include images, waveform representations of audio clips, bag-of-words representations of text, or any other quantity that can be expressed numerically. The data can be collected using any of a variety of methods, including keyword-searching a collaboratively filtered (image) database or simply reviewing a set of images, Para. [0044]).”
Regarding claim 13, the combination of Aizawa and Kuroda discloses everything claimed as applied above (see claim 1), further Aizawa discloses, “wherein the at least one predicted output comprises at least one of: a classification, a regression, a detection, a score, a segmentation (first embedding 504a (output vector) representing features of the first image 502a and a second embedding 504b (output vector) representing features of the second image 502b. Comparing these first embedding 504a and the second embedding 504b using a visual similarity metric 506 generates a score 508 representing the distance in feature space between the embeddings, Para. [0081]).”
Regarding claim 18, the combination of Aizawa and Kuroda discloses everything claimed as applied above (see claim 1), further Aizawa discloses, “wherein the input data and training data sets comprise at least one of: document data, text data, genomic data, time series data, laboratory data, vital signs data (Before the process 500 begins, a plurality of triplets is first generated by classifying reference images of clothing items as three positive images of similar clothing items and randomly selecting three negative images from the set of all images in the dataset, Paras. [0080]-[0082]).”
Regarding claim 19, Aizawa discloses, 
A training method to train a model to predict from input data at least one predicted output, (A neural network trained on similar triplets generates similarity rankings, e.g., as part of the operation of a recommendation engine. The trained neural network (i.e., the layer configuration and specific parameters generated during the training process 400 of FIG. 4) is used to generate a feature embedding of any data point(s) that is passed through the trained neural network, Para. [0075]-[0082] and also, see Fig. 1), the training method comprising: 
“receiving receive a plurality of training data sets (FIG. 1 illustrates a process 100 for training a neural network for a similarity ranking engine or recommendation engine using training data, such as images, in the form of triplets. The process 100 begins with collecting data (110) for forming into triplets. This data may include images, waveform representations of audio clips, bag-of-words representations of text, or any other quantity that can be expressed numerically, Para. [0044])”; 
“receive from a user a selection of a first characteristic from a set of characteristic as a positive characteristic, wherein the user considers values for the first characteristic to be relevant to prediction of the at least one predicted output (The reference data point and positive data point(s) may be classified beforehand as data corresponding to similar items. These items may be considered similar because they are visually similar, i.e., they look similar. For instance, the reference data point may represent a particular article of clothing, such as a black dress, and the positive data point may represent a similar (but not identical) article of clothing, such as another black dress, Paras. [0046]-[0047])”; 
“receive from the user a selection of a second characteristic of from a set of characteristic as a negative characteristic, wherein the user considers values for the second characteristic to be less relevant or irrelevant to prediction of the at least one predicted output (The reference data point and the negative data point(s) may or may not have been classified beforehand as data corresponding to a dissimilar item(s). For instance, the negative data point(s) for a given reference point may be selected by randomly sampling from the set of all data points that are not positive data points for the given reference data point, Paras. [0048]-0049])”.
However, Aizawa does not explicitly disclose, “training the model, the training of the model comprising: perform positive supervision of the model using the positive characteristic such that the training of the model to predict the at least one predicted output is sensitive to values for the first characteristic and perform negative supervision 
In a similar field of endeavor, Kuroda discloses, “training the model, the training of the model comprising: perform positive supervision of the model using the positive characteristic such that the training of the model to predict the at least one predicted output is sensitive to values for the first characteristic (The evaluation acquisition unit 32 acquires any one of a " positive evaluation" indicating that the content of the input data coincides with the label….A case that the evaluation of a label is the positive evaluation means that the content of the input data belongs to a category represented by the label, Paras. [0090]-[0094]) and perform negative supervision of the model using the negative characteristic such that the training of the model to predict the at least one predicted output is insensitive to values for the second characteristic (The evaluation acquisition unit 32 acquires a " negative evaluation" indicating that the content of the input data does not coincide with the label….A case that the evaluation of a label is the negative evaluation means that the content of the input data does not belong to a category represented by the label, Paras. [0090]-[0084]).”
Therefore it would have been obvious to one of ordinary skill in art before the effective filing date of the claimed invention to modify Aizawa by specifically providing training the model, the training of the model comprising: perform positive supervision of the model using the positive characteristic such that the training of the model to predict the at least one predicted output is sensitive to values for the first characteristic 
Regarding claim 20, Aizawa discloses, 
A system comprising processing circuitry (A neural network trained on similar triplets generates similarity rankings, e.g., as part of the operation of a recommendation engine. The trained neural network (i.e., the layer configuration and specific parameters generated during the training process 400 of FIG. 4) is used to generate a feature embedding of any data point(s) that is passed through the trained neural network, Para. [0075]-[0082] and also, see Fig. 1) is configured to: 
“receive a target data set (the trained neural network 540 receives a first image 502a showing a first clothing item and a second image 502b showing a second clothing item, Fig. 5)” and 
“ process the target data set using a trained model to predict at least one predicted output for the target data set, wherein to train the model (It produces a first embedding 504a (output vector) representing features of the first image 502a and a second embedding 504b (output vector) representing features of the second image 502b. Comparing these first embedding 504a and the second embedding 504b using a visual similarity metric 506 generates a score 508 representing the distance in feature space between the embeddings, Fig. 5)”, the processing circuitry is configured to:
(FIG. 1 illustrates a process 100 for training a neural network for a similarity ranking engine or recommendation engine using training data, such as images, in the form of triplets. The process 100 begins with collecting data (110) for forming into triplets. This data may include images, waveform representations of audio clips, bag-of-words representations of text, or any other quantity that can be expressed numerically, Para. [0044])”; 
“receive from a user a selection of a first characteristic from a set of characteristic as a positive characteristic, wherein the user considers values for the first characteristic to be relevant to prediction of the at least one predicted output (The reference data point and positive data point(s) may be classified beforehand as data corresponding to similar items. These items may be considered similar because they are visually similar, i.e., they look similar. For instance, the reference data point may represent a particular article of clothing, such as a black dress, and the positive data point may represent a similar (but not identical) article of clothing, such as another black dress, Paras. [0046]-[0047])”; 
“receive from the user a selection of a second characteristic of from a set of characteristic as a negative characteristic, wherein the user considers values for the second characteristic to be less relevant or irrelevant to prediction of the at least one predicted output (The reference data point and the negative data point(s) may or may not have been classified beforehand as data corresponding to a dissimilar item(s). For instance, the negative data point(s) for a given reference point may be selected by randomly sampling from the set of all data points that are not positive data points for the given reference data point, Paras. [0048]-0049])”.
However, Aizawa does not explicitly disclose, “perform positive supervision of the model using the positive characteristic such that the training of the model to predict the at least one predicted output is sensitive to values for the first characteristic and perform negative supervision of the model using the negative characteristic such that the training of the model to predict the at least one predicted output is insensitive to values for the second characteristic.”
In a similar field of endeavor, Kuroda discloses, “perform positive supervision of the model using the positive characteristic such that the training of the model to predict the at least one predicted output is sensitive to values for the first characteristic (The evaluation acquisition unit 32 acquires any one of a " positive evaluation" indicating that the content of the input data coincides with the label….A case that the evaluation of a label is the positive evaluation means that the content of the input data belongs to a category represented by the label, Paras. [0090]-[0094]) and perform negative supervision of the model using the negative characteristic such that the training of the model to predict the at least one predicted output is insensitive to values for the second characteristic (The evaluation acquisition unit 32 acquires a " negative evaluation" indicating that the content of the input data does not coincide with the label….A case that the evaluation of a label is the negative evaluation means that the content of the input data does not belong to a category represented by the label, Paras. [0090]-[0084]).”
.

Claim(s) 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aizawa, in view of Kuroda and further in view of Rephaeli et al. (US 20190065905, hereinafter “Rephaeli”).
Regarding claim 4, the combination of Aizawa and Kuroda discloses everything claimed as applied above (see claim 1), however the combination of Aizawa and Kuroda does not disclose, “wherein the input data includes training data sets comprising a respective set of pixel or voxel intensities for an array of pixel or voxel positions, and the positive characteristic and/or the negative characteristic does not form part of the set of pixel or voxel intensities for the array of pixel or voxel positions”.
In a similar field of endeavor, Rephaeli discloses, “wherein the input data includes training data sets comprising a respective set of pixel or voxel intensities for an array of pixel or voxel positions, and the positive characteristic and/or the negative characteristic does not form part of the set of pixel or voxel intensities for the array of (FIG. 4 shows a training algorithm using a convolutional neural network that may identify the hidden variables. A raw speckle image 402 is divided into regions of interest 404. Each region of interest may be a fixed size around a pixel. The region of interest may be used as a training image. For each training image, a flow-rate value and the depth of the blood vessel may be assigned based on a ground truth flow map 406. The ground truth flow map may be obtained because the flows are known or measured in the sample associated with the raw speckle image. The training data, including intensities of pixels, may be fed to a neural network, Paras. [0038] and [0062]).”
Therefore it would have been obvious to one of ordinary skill in art before the effective filing date of the claimed invention to modify the combination of Aizawa and Kuroda by specifically providing perform positive supervision of the model using the positive characteristic such that the training of the model to predict the at least one predicted output is sensitive to values for the first characteristic and perform negative supervision of the model using the negative characteristic such that the training of the model to predict the at least one predicted output is insensitive to values for the second characteristic, as taught by Rephaeli for the purpose providing a technique using the trained machine learning model and the test image data set to generate output data for the test image data set.

Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aizawa, in view of Kuroda and further in view of Laxman et al. (US 20080177680, hereinafter “Laxman”).
Regarding claim 6, the combination of Aizawa and Kuroda discloses everything claimed as applied above (see claim 1), however the combination of Aizawa and Kuroda does not disclose, “wherein a first subset of the training data sets comprises training data sets comprising values for the positive characteristic, and a second subset of the training data sets comprises training data sets comprising values for the negative characteristic.”
In a similar field of endeavor, Laxman discloses, “wherein a first subset of the training data sets comprises training data sets comprising values for the positive characteristic, and a second subset of the training data sets comprises training data sets comprising values for the negative characteristic (training data divider can divide a set of training data into subsets. As an example, the training data divider can divide the training data into a positive subset containing training data classified as positive and a negative subset containing training data classified as negative, Para. [0031] and Fig. 2).”
Therefore it would have been obvious to one of ordinary skill in art before the effective filing date of the claimed invention to modify the combination of Aizawa and Kuroda by specifically providing wherein a first subset of the training data sets comprises training data sets comprising values for the positive characteristic, and a second subset of the training data sets comprises training data sets comprising values for the negative characteristic, as taught by Laxman for the purpose of effectively and efficiently reducing the errors in the training data.

Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aizawa, in view of Kuroda, and further in view of Townsend et al. (US 20170249534, hereinafter “Towns”).
Regarding claim 7, the combination of Aizawa and Kuroda discloses everything claimed as applied above (see, claim 1), however the combination of Aizawa and Kuroda does not explicitly disclose, “wherein the training of the model comprises training the model to predict the at least one predicted output from the image data, such that the trained model is configured to predict the predicted output a target data set in the absence of values for the positive characteristic and negative characteristic.”
In a similar field of endeavor, Towns discloses, “wherein the training of the model comprises training the model to predict the at least one predicted output from the image data, such that the trained model is configured to predict the predicted output a target data set in the absence of values for the positive characteristic and negative characteristic (the classifier training unit 20 is operable to: (a) determine, in respect of each generated image or a sub-set of the generated images, the degree of similarity between that image and each of the other generated images, using the numerical vectors for the generated images; and/or (b) derive a trend prediction model from the numerical vectors and trends for each stored data set, or for a sub-set of the generated images, using a deep learning method, Paras. [0065]-[0068]).”
Therefore, it would have been obvious to one of ordinary skill in art before the effective filing date of the claimed invention to modify the combination of Aizawa and Kuroda by specifically providing wherein the training of the model comprises training the .

Claim(s) 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aizawa, in view of Kuroda, and further in view of Al-Haimi et al. (US 20190286620, hereinafter “Alhaimi”).
Regarding claim 14, the combination of Aizawa and Kuroda discloses everything claimed as applied above (see, claim 1), however the combination of Aizawa and Kuroda does not explicitly disclose, “wherein the processing circuitry is further configured to artificially generate further data sets, and wherein the training of the model uses the plurality of training data sets and the generated further data sets.”
In a similar field of endeavor, Alhaimi discloses, “wherein the processing circuitry is further configured to artificially generate further data sets, and wherein the training of the model uses the plurality of training data sets and the generated further data sets (the system may train one or more artificial neural networks associated with the transformation classifiers. For example, the system may train a multi-class artificial neural network classifier based on a training set of examples matching complex-type source fields to corresponding target field, Paras. [0063]-[0067]).”
Therefore, it would have been obvious to one of ordinary skill in art before the effective filing date of the claimed invention to modify the combination of Aizawa and .

Allowable Subject Matter
Claim(s) 15-17 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  the closest prior art does not teach or suggest, “wherein the processing circuitry is configured 10to artificially generate the further data sets by augmenting at least some of the plurality of training data sets, the augmenting of each training data set comprising adjusting at least one augmentation parameter of the training data sets and wherein the training of the model comprises performing supervision of the model using the at least one augmentation parameter such that the model is trained to discount values for the at 15least one augmentation in the prediction of the at least one predicted output.”

Relevant reference(s)
US 20180240551: The invention is related to hierarchical machine learning models to identify an anatomical structure of interest and perform diagnostic procedures 
US 20160093048:  The invention are related to apparatuses and methods for learning a similarity metric using deep learning based techniques for multimodal medical images. A novel similarity metric for multi-modal images is provided using the corresponding states of pairs of image patches to generate a classification setting for each pair. The classification settings are used to train a deep neural network via supervised learning.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GOLAM SOROWAR whose telephone number is (571)270-3761.  The examiner can normally be reached on Mon-Fri: 8:30AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Appiah can be reached on (571) 272-7904.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/GOLAM SOROWAR/           Primary Examiner, Art Unit 2641