DETAILED ACTION
[1]	Remarks
I.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
II.	Claims 1-21 are pending and have been examined, where claims 1-9 and 11-21 is/are rejected and claim 10 is/are objected to. Explanations will be provided below.
III.	Inventor and/or assignee search were performed and determined no double patenting rejection(s) is/are necessary.
IV.	Patent eligibility (updated in 2019) shown by the following: Claims 1-21 pass patent eligibility test because there are no limitation or a combination of limitations amounting to an abstract idea. Also the following limitation or the combinations of the limitations: “training the NLP ML model using an input of the text based reports of the training dataset and a ground truth comprising the outcome of the at least one visual finding generated by the visual ML model in response to an input of the images corresponding to the text based reports of the training dataset; training the visual ML model using an input of the images of the training dataset and a ground truth comprising the outcome of the at least one NLP category generated by the NLP ML model in response to an input of the text based reports corresponding to the images of the training dataset” effects a transformation or a reduction of a particular article to a different state or thing / adds a specific limitation(s) other than what is well-understood, routine and conventional in the field, or adding unconventional steps that confine the claim to a particular useful application and providing improvements to the technical field of deep learning, which recite additional elements that integrate the judicial exception into a practical application and amounting significant more. 
V.	There are no PCT associated with the current application.

[2]	Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):                                                                                                          
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

Use of the word “means” (or “step for”) in a claim with functional language creates a rebuttable presumption that the claim element is to be treated in accordance with 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph).  The presumption that 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph) is invoked is rebutted when the function is recited with sufficient structure, material, or acts within the claim itself to entirely perform the recited function.  Absence of the word “means” (or “step for”) in a claim creates a rebuttable presumption that the claim element is not to be treated in accordance with 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph).  The presumption that 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph) is not invoked is rebutted when the claim element recites function but fails to recite sufficiently definite structure, material or acts to perform that function. 
Claim elements in this application that use the word “means” (or “step for”) are presumed to invoke 35 U.S.C. 112(f) except as otherwise indicated in an Office action.  Similarly, claim elements that do not use the word “means” (or “step for”) are presumed not to invoke 35 U.S.C. 112(f) except as otherwise indicated in an Office action.
Claim(s) 1-21 are not interpreted under 35 U.S.C. 112(f) or pre-AIA  U.S.C. 112 6th paragraph because of the following reason(s): limitations are modified by sufficient structure or material for performing the claimed function; they are method claims with no association to generic placeholder(s); they are CRM claims. Upon examination of the specification and claims, the examiner has determined, under the best understanding of the scope of the claim(s), rejection(s) under 35 U.S.C. 112(a)/(b) is not necessitated because of the following reasons: sufficient support are provided in the written description / drawings of the invention.

[3]	Grounds of Rejection
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a) person shall be entitled to a patent unless—
(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention; or
(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

or
(b) the invention was patented or described in a printed publication in this or a foreign country or in public use or on sale in this country, more than one year prior to the date of application for patent in the United States.

Claims 1-7, 9, 11-13 and 15-21 are rejected under 35 U.S.C. 102(b) as being anticipated by Zhang “TandemNet: Distilling Knowledge from Medical Images Using Diagnostic Reports as Optional Semantic References.” 

Regarding claim 1, Zhang discloses a computer implemented method for training a visual machine learning (ML) model component and a natural language processing (NLP) ML model component, comprising: 
providing a training dataset including, for each of a plurality of sample individuals, a medical image and a corresponding text based report (see figure 1 below, both inputs are based on visual “image” and text “report”); 
providing the NLP ML model component for generating an outcome of at least one NLP category in response to an input of a target text based report (see figure 1 bottom section, is responsible for processing words and outputting the meaning of the words); 

    PNG
    media_image1.png
    102
    671
    media_image1.png
    Greyscale

providing the visual ML model component for generating an outcome of at least one visual finding in response to an input of a target image (see top of figure 1, showing two ResNets Convolutional Neural Networks in series); and 

    PNG
    media_image2.png
    97
    751
    media_image2.png
    Greyscale

concurrently training the NLP ML model component and the visual ML model component using the training dataset (see figure 1 below, also see “Introduction” last paragraph), by:

    PNG
    media_image3.png
    72
    649
    media_image3.png
    Greyscale


    PNG
    media_image4.png
    223
    868
    media_image4.png
    Greyscale

training the NLP ML model using an input of the text based reports of the training dataset and a ground truth comprising the outcome of the at least one visual finding generated by the visual ML model in response to an input of the images corresponding to the text based reports of the training dataset (see figure 6, pathologist's annotations are in black and the automatic results of TandemNet are in green, which accurately describe the semantic concepts, the output here are generated by both visual and text inputs in model of figure 1, “”Prediction”); 

    PNG
    media_image5.png
    180
    1129
    media_image5.png
    Greyscale

training the visual ML model using an input of the images of the training dataset and a ground truth comprising the outcome of the at least one NLP category generated by the NLP ML model in response to an input of the text based reports corresponding to the images of the training dataset (see figure 5, From left to right: Test images (the bottom shows disease labels), pathologist's annotations, visual attention w/o text. visual attention and corresponding text attention, in model of figure 1, “”Prediction”).

    PNG
    media_image6.png
    139
    1182
    media_image6.png
    Greyscale
.

Regarding claim 2, Zhang discloses the method of claim 1, wherein the NLP ML model is trained using a supervised approach with the input of the based reports and the ground truth outcome of the visual ML model (see figure 6 below, the text printed in black is the ground truth), and 

    PNG
    media_image5.png
    180
    1129
    media_image5.png
    Greyscale
concurrently the visual ML model is trained using a supervised approach with the input of the images and the ground truth outcome of the NLP ML model (see figure 5 below, the ground truth image is shown as binary mask):

    PNG
    media_image6.png
    139
    1182
    media_image6.png
    Greyscale
.

Regarding claim 3, Zhang discloses The method of claim 1, wherein the concurrently training is performed iteratively (see Experiment section Implementation details, one epoch has N number of iterations depending on the batch size ):

    PNG
    media_image7.png
    108
    673
    media_image7.png
    Greyscale
.

Regarding claim 4, Zhang discloses the method of claim 1, further comprising: prior to the concurrently training, weakly labelling a subset of the text based reports of the training dataset with a weak label indicative of presence or absence of the at least one NLP category in respective target based reports (see figure 3, the average text attention per feature type to each disease are labels, low grade is read as weakly labeled); and 

    PNG
    media_image8.png
    242
    666
    media_image8.png
    Greyscale

wherein the concurrently training is performed using the training dataset with weak labels of the text based reports (see figure 1, both visual and text data are trained at the same time).

Regarding claim 5, Zhang discloses the method of claim 4, wherein weakly labelling comprises weakly labelling about 5-20% of the text based reports of the training dataset with the weak label (see Experiment Section below):

    PNG
    media_image9.png
    70
    654
    media_image9.png
    Greyscale
.

Regarding claim 6, Zhang discloses the method of claim 4, wherein weakly labelling comprises automatically weakly labelling the subset of the text based reports using a simple set of rules (see last paragraph of Introduction):

    PNG
    media_image10.png
    68
    652
    media_image10.png
    Greyscale
.

Regarding claim 7, Zhang discloses the method of claim 1, wherein the at least one NLP category outcome of the NLP ML model and the at least one visual finding outcome of the visual ML model are from a common set and of a same format (see figure 5 and figure 6, both uses same type / format image):

    PNG
    media_image11.png
    167
    984
    media_image11.png
    Greyscale
.

Regarding claim 9, Zhang discloses the method of claim 1, wherein the at least one NLP category outcome of the NLP ML model component is an indication of a visual finding depicted in an image corresponding to a text based report inputted into the NLP ML model component (see figure 6 below, where the NLP model reports: the nuclei are severely pleomorphic, the nuclei are crowded to a moderate degree, polarity of  nuclei is negligibly lost. mitosis is rare
throughout the tissue. the nuclei have inconspicuous nucleoli conclusion high grade):

    PNG
    media_image12.png
    167
    321
    media_image12.png
    Greyscale

and the at least one visual finding outcome of the visual ML model component is an indication of the visual finding depicted in the image corresponding to the text based report inputted into the NLP ML model component (see figure 5 below, the visual result reports: pictured nuclei are severely pleomorphic moderate crowding of the nuclei can be seen polarity along the basement membrane is completely lost, mitosis appears to be rare the nucleoli of nuclei are inconspicuous).

Regarding claim 11, Zhang discloses the method of claim 1, wherein concurrently training comprises concurrently training the NLP ML model component and the visual ML model component using a combined visual and NLP consensus loss function (see equation 2 is the combined output from V and S, which are the visual and text):

    PNG
    media_image13.png
    114
    653
    media_image13.png
    Greyscale
.
Regarding claim 12, Zhang discloses the method of claim 1, wherein the combined visual and NLP consensus loss function comprises a cross model consensus loss function that encourages high consensus between the NLP ML model and the visual ML model (see equation 4 shows loss function combining visual and text data output shown in figure 1, V and S components are concatenate):

    PNG
    media_image14.png
    69
    558
    media_image14.png
    Greyscale
.

Regarding claim 13, Zhang discloses the method of claim 11, wherein concurrently training further comprises training the NLP ML model component using an NLP loss function that is computed for the training of the NLP ML model and excludes data obtained from the training of the visual ML model component (see figure 1, the top portion, ResNets, takes images as input, bottom portion takes text / words as input, both are mutually exclusive).

Regarding claim 15, Zhang discloses the method of claim 13, wherein the NLP loss function penalizes the NLP ML model for errors made during an initial inaccurate labeling of a subset of text based reports of the training dataset made prior to the concurrently training (see “Prediction Module”):

    PNG
    media_image15.png
    70
    662
    media_image15.png
    Greyscale
.

Regarding claim 16, Zhang discloses the method of claim 1, wherein the visual ML model component is implemented as a neural network (see figure 1, CNN is read as the visual ML model). 

Regarding claim 17, Zhang discloses the method of claim 1, wherein the NLP ML model component is implemented as a neural network (see figure 1, LSTM is read as the text ML model).

Regarding claim 18, Zhang discloses the method of claim 1, wherein a target text report is inputted into an NLP processing path comprising the NLP ML model component that generates the NLP category (see figure 1, the bottom portion is read as NLP model and takes words as input), and a target image corresponding to the target text report is inputted in a visual processing path comprising the visual ML model component that generates the at least one visual finding (see figure 1, top portion of the model takes images as input), 

    PNG
    media_image16.png
    243
    1290
    media_image16.png
    Greyscale

wherein the NLP processing path and the visual processing path are concurrently executed during the concurrent training (see figure 1, the ResNets and LSTM are executed in parallel).

Regarding claims 19 and 21 see the rationale and rejection for claim 1. The algorithms are implemented on a computer. 
Regarding claim 20, Zhang discloses the method of claim 19, further comprising: receiving a text based report corresponding to the medical image of the subject inputted into the visual ML model (see figure 1, the bottom portion takes report with words as input, which corresponds to image); inputting the text based report into a NLP ML model (see figure 1, the bottom portion takes report with words as input); 
    PNG
    media_image17.png
    223
    763
    media_image17.png
    Greyscale

obtaining as an outcome of the NLP ML model, at least one NLP category indicative of at least one visual finding of the medical image described in the text based report, wherein the NLP ML model comprises the NLP ML model component that is concurrently trained with the visual ML model component on the training dataset (see figure 1 the NLP model and CNN models are trained in parallel); and generating an alert when the at least one NLP category outcome of the NLP ML model does not match the at least one visual finding outcome of the visual ML model (see figure 2 below, where the confusion matrix indicates the errors in classifications):

    PNG
    media_image18.png
    267
    754
    media_image18.png
    Greyscale
.

Claim Rejections - 35 USC § 103
1.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

2.	Claims 8 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Brestel (US 20190340753).

Regarding claim 8, Zhang discloses all the limitations of claim 1 but is silent in disclosing the method of claim 1, wherein each of the at least one NLP category outcome of the NLP ML model and the at least one visual finding outcome of the visual ML model is a binary classification indicative of positive or negative finding found in the image and corresponding text based report. Brestal discloses the method of claim 1, wherein each of the at least one NLP category outcome of the NLP ML model and the at least one visual finding outcome of the visual ML model is a binary classification indicative of positive or negative finding found in the image and corresponding text based report (see paragraph 144, based radiology reports of the sample individuals, some reports are excluded from the training dataset, for each respective text based radiology report included in the sub-set, each one of the sentences of the respective text based radiology report is mapped to one of: one of the indications of visual finding types, denoting a positive finding from the findings supported by the model, a negative finding, and neutral data, the multi-label neural network may be trained according to the fully covered training dataset and associated anatomical images). 
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to include binary classification in order to determine whether to treat said disease if positive is detected for improving diagnosis. 

Regarding claim 14, Brestal discloses the method of claim 13, wherein the NLP ML model comprises a binary classifier and the NLP loss function comprises a standard binary cross entropy loss (see paragraph 149, the multi-label neural network is trained using a categorical cross-entropy loss function). See the motivation for claim 8.

[4]	Claim Objection
Claim 10 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

With regards to claim 10, the examiner cannot find any applicable prior art providing teachings for the following limitation(s): “the method of claim 1, further comprising: computing a correlation value indicative of a correlation between the at least one NLP category outcome of the NLP ML model and the at least one visual finding outcome of the visual ML model for an input of an image and corresponding text based report; and in response to the correlation value being below a threshold indicative of dis-correlation between the at least one NLP category outcome of the NLP ML model and the at least one visual finding outcome of the visual ML model, storing the image and corresponding text based report in a user-training dataset; and providing the user-training dataset for presentation on a display”  in combination with the rest of the limitations of claim 1.

Yeatman (US 20110106740) discloses neural network based classifier is then trained 18 to identify unknown objects based on the input latent class characteristics of the known objects, where the neural network is trained, sample data, corresponding to characteristics of an unknown object, is received and input to the trained network 20, where receiving the sample data the neural network calculates and provides the likelihood that the unknown object is a member of each known class of objects 22 based on the correlation between said latent class characteristics of each of the known objects and the characteristics of the unknown object (see paragraph 37).



CONTACT INFORMATION
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEX LIEW (duty station is located in New York City) whose telephone number is (571)272-8623 (FAX 571-273-8623), cell (917)763-1192 or email alexa.liew@uspto.gov. Please note the examiner cannot reply through email unless an internet communication authorization is provided by the applicant. The examiner can be reached anytime. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached on 571-272-7332.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ALEX KOK S LIEW/Primary Examiner, Art Unit 2668                                                                                                                                                                                                        Telephone: 571-272-8623
Date: 7/6/22