DETAILED ACTION
The applicant’s request for continued examination regarding application number 15/855,015, filed December 27, 2017 has been entered.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 

Continued Examination Under 37 CFR 1.114 
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on March 30, 2022 has been entered. 

Response to Amendments
The amendment filed March 30, 2022 has been entered. Examiner acknowledges receipt of Amendments to Application 15/855,015, which include: Amendments to the Claims, and Remarks containing Applicant’s amendments. 
Regarding Applicant’s Remarks, Examiner acknowledges Claims 1-20 remain pending in the application. 

Response to Arguments
Examiner acknowledges receipt of Arguments to Application 15/855,015, which include: Remarks containing Applicant’s arguments. 
Regarding Applicant's arguments for Claims 1-20 under 35 U.S.C. 103 as being unpatentable over Ou et al., Multi-class pattern classification using neural networks, 2006 [hereafter referred as Ou] in view of Tafazoli et al., CA2972183A1, published 06/22/2017 [hereafter referred as Tafazoli], in further view of Islam et al., Abnormality Detection and Localization in Chest X-Rays using Deep Convolutional Neural Networks, September 27, 2017 [hereafter referred as Islam], Applicant’s arguments with respect to the above claims have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Examiner’s analysis of the claims with respect to new art references and the corresponding claim mappings are provided in the sections indicated below.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-4, 9-12, and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over 
Wang et al., A Multi-view Deep Convolutional Neural Networks for Lung Nodule Segmentation, 2017 EMBC July 11-15, 2017, published September 14 2017 [hereafter referred as Wang] in view of Dong et al., Learning to Read Chest X-Ray Images from 16000+ Examples Using CNN, 2017 IEEE/ACM CHASE July 17-19 2017, published August 17 2017 [hereafter referred as Dong].  
Regarding previously presented Claim 1,
 Wang teaches
(Previously Presented) A deep neural network system, comprising: 
a memory that stores computer executable components (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites a computer system containing instructions stored in memory. Wang teaches training a multi-view convolutional neural network (MV-CNN) using the CAFFE Toolkit to perform classification on computed tomography (CT) images from the LIDC-IDRI dataset to distinguish different types of nodule and background (non-nodule) voxels. A person having ordinary skill in the art would understand that the training of the MV-CNN using the CAFFE Toolkit would require a computer system that contains a processor and memory, with the memory containing executable instructions to process the medical images and to train the MV-CNN (Wang p.1752 col.2 1st -2nd paragraphs: “… we propose a multi-view convolutional neural networks (MV-CNN) [11] to distinguish nodule voxels from background voxels in CT imaging. Our model has learned nodule-sensitive features from 0.34 million voxel patches automatically and revealed appealing segmentation results for various type of lung nodules … 1) The proposed MV-CNN can segment lung nodules in CT images … 2; We propose a multi-scale patch strategy as the input of the MV-CNN to capture both detailed features and nodule shape information; 3) The MV-CNN integrates three branches that can learn deep features from three orthogonal image views …”; p.1753 Figure 1; p.1753 col.1 3rd paragraph-col.2 2nd paragraph; and p.1754 col.2 1st paragraph: “… After the model training was completed through CAFFE Toolkit [18], we reported segmentation results on the testing set.”).); 
a processor that executes computer executable components stored in the memory (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites a computer system containing a processor that executes the instructions stored in memory. As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) using the CAFFE Toolkit to perform classification on computed tomography (CT) images from the LIDC-IDRI dataset to distinguish different types of nodule and background (non-nodule) voxels. A person having ordinary skill in the art would understand that the training of the MV-CNN using the CAFFE Toolkit would require a computer system that contains a processor and memory, with the processor executing the instructions stored in memory to process the medical images and to train the MV-CNN (Wang p.1752 col.2 1st -2nd paragraphs; p.1753 Figure 1; p.1753 col.1 3rd paragraph-col.2 2nd paragraph; and p.1754 col.2 1st paragraph).), 
wherein the computer executable components comprise: 
a neural network training component that trains a neural network based on a data set to form a first neural network of a binary neural network architecture (Examiner’s note: Under its broadest reasonable interpretation, this limitation recites a computer-based component training a first neural network having a binary neural network architecture. As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images from the LIDC-IDRI dataset to distinguish different types of nodule and background (non-nodule) voxels, where the classification of the nodule vs. non-nodule voxels at its output layer represents a binary output, and hence the MV-CNN generating this output represents a network having a binary neural network architecture. Wang further teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, where each branch has the same convolutional neural network structure but focuses on a different view of the CT image to capture different characteristics (edges, solid parts) of the nodules found in the CT image, with one of these convolutional neural network branches representing a first neural network (Wang p.1752 col.2 1st -2nd paragraphs: “… we propose a multi-view convolutional neural networks (MV-CNN) [11] to distinguish nodule voxels from background voxels in CT imaging. Our model has learned nodule-sensitive features from 0.34 million voxel patches automatically and revealed appealing segmentation results for various type of lung nodules …”; pp.1752-1753 Section II 1st-2nd paragraphs: “… Given a voxel in CT image, we extract three multi-scale patches centered on this voxel as the input to the CNN model and predict if this voxel belongs to the nodule. … The proposed MV-CNN incorporates three branches that process voxel patches from axial, coronal and sagittal view CT images respectively. The three branches share the same structure that consists of six convolutional layers (C1 to C6), two max-pooling layers (Max pooling 1,2), and one fully connected layer (F7). The six convolutional layers in each CNN branch are divided into three blocks, where each block shares the exact same structure … At the end of the CNN model, the three branches are merged through a fully connected layer (F8) to outcome the voxel label …”; p.1753 Figure 1 including caption: “… This network contains three branches aiming at capturing features from axial, coronal, and sagittal image views … The bottom figure shows the feature maps of the C1 layer on three branches, indicated that the learned filters can capture different characteristics of nodules from input CT image (e.g., edge or solid part of a nodule).”; p.1753 col.1 3rd paragraph-col.2 2nd paragraph; p.1754 col.1 Section III.A. Dataset 1st paragraph: “We used the public … (LIDC-IDRI) [16] for experimental evaluation. All the nodules in this dataset are annotated … We train the MV-CNN on the training set …”; and p.1754 col.1-col.2 Section III.C. Model Training process).) …
… a neural network duplication component that trains a copy of the first neural network based on the data set to form a second neural network of the binary neural network architecture (Examiner’s note: Under its broadest reasonable interpretation, this limitation recites a computer-based component copying and training a second neural network having a binary neural network architecture, with the copying broadly reciting that the first and second neural networks have the same structure. As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images from the LIDC-IDRI dataset to distinguish different types of nodule and background (non-nodule) voxels, where the classification of the nodule vs. non-nodule voxels at its output layer represents a binary output, and hence the MV-CNN generating this output represents a network having a binary neural network architecture. As indicated earlier, Wang teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, where each branch has the same convolutional neural network structure but focuses on a different view of the CT image to capture different characteristics (edges, solid parts) of the nodules found in the CT image, with one of these convolutional neural network branches representing a second neural network (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1 including caption; p.1753 col.1 3rd paragraph-col.2 2nd paragraph; p.1754 col.1 Section III.A. Dataset 1st paragraph; and p.1754 col.1-col.2 Section III.C. Model Training process).) …
… wherein the neural network duplication component trains a copy of the second neural network based on the data set to form an Mth neural network of the binary neural network architecture (Examiner’s note: Under its broadest reasonable interpretation, this limitation recites a computer-based component copying and training a Mth neural network having a binary neural network architecture, with the copying broadly reciting that the second and Mth neural networks have the same structure. As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images from the LIDC-IDRI dataset to distinguish different types of nodule and background (non-nodule) voxels, where the classification of the nodule vs non-nodule voxels at its output layer represents a binary output, and hence the MV-CNN generating this output represents a network having a binary neural network architecture. As indicated earlier, Wang teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, where each branch has the same convolutional neural network structure but focuses on a different view of the CT image to capture different characteristics (edges, solid parts) of the nodules found in the CT image, with one of these convolutional neural network branches representing a Mth neural network (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1 including caption; p.1753 col.1 3rd paragraph-col.2 2nd paragraph; p.1754 col.1 Section III.A. Dataset 1st paragraph; and p.1754 col.1-col.2 Section III.C. Model Training process).) …
… wherein the neural network duplication component generates probability data by aggregating at least two inference models (Examiner’s note: As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images for distinguishing different types of nodule and background (non-nodule) voxels. As indicated earlier, Wang further teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, where each branch has the same convolutional neural network structure, and where the combined output from the three branches at the output layer is fed into a binary softmax function (representing an output layer) that generates probability distributions over the class labels (such that the output represents probability data aggregated by at least two inference models) (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1; and p.1753 col.1 3rd paragraph – col.2 2nd paragraph: “… In the case of the output layer (F8) … the activation values are fed into a binary softmax function that are converted into probability distributions over the class labels … The goal of network training is to maximize the probability of the correct class … the loss function is defined as: L(W) <see equation (2)> where                         
                            
                                
                                    
                                        
                                            y
                                        
                                        ^
                                    
                                
                                
                                    n
                                
                            
                        
                     represents the predicted probability from MV-CNN …”).) … 
… a visualization component to generate a multi-dimensional visualization based on a classification or a localization of the one or more diseases in an anatomical region represented in the medical imaging data (Examiner’s note: Wang teaches generating visualizations for six representative nodules from the LIDC-IDRI dataset, where these nodule visualizations represent nodules in lung tissues, and these visualizations are extracted from the CT image dataset (where this CT image dataset contains 2-dimensional medical images of patients’ lungs), and each identification of a representative nodule type represents a presence/detection of a nodule type corresponding to various multiple diseases in a patient’s lungs (such as L5: calcific nodules representing calcinosis; and L6: ground-glass opacity nodules representing opacity of a lung) (Wang p.1754 col.2-p.1755 col.1 Section IV.B. Visualization: “The segmentation results are visualized to allow the comparison of different approaches. We demonstrate six representative nodules from the LIDC-IDRI testing set (Fig. 3). … the proposed MV-CNN remains robust when segmenting such nodules … For cavitary (L4) and calcific (L5) nodules … the MV-CNN is able to reserve the complete nodule shape … ground-glass opacity (GGO) nodules (L6) … the proposed method performs reasonably well in capturing the nodule shape with GGO.”; and p.1755 Figure 3).) …
While Wang teaches training a MV-CNN containing multiple convolutional neural networks to perform classification of nodules from CT medical images to extract different solid and edge features from different views, Wang does not explicitly teach
… determine whether a first class exists …
… determine whether a second class exists …
… determine whether an Mth class exists …
… the probability data representing a probability of two or more diseases being located in medical imaging data, the two or more diseases comprising at least two of the following: a lung disease, a heart disease, a bone disease, a cancer, tuberculosis, cardiomegaly, hypoinflation of a lung, opacity of a lung, hyperdistension, a spine degenerative disease, or calcinosis …
Dong teaches
… determine whether a first class exists (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites using the neural network to determine one or more classes/labels. Dong teaches training a CNN-based model performing multi-class classification on multiple diseases, by including a loss layer containing a plurality of neurons, with each neuron producing a probability distribution for an identified disease label (based on a final diagnosis), and training the same model with the same last layer using a lung image dataset containing multiple diseases, resulting in the CNN-based model outputting a list of probabilities for multiple diseases representing different labels/classes, with one of these different labels/classes and its associated probability representing a first class (Dong p.52 col.1 2nd paragraph: “… we first automatically analyze the natural language report and extract the final diagnosis as disease labels. We allow multiple disease labels for each image. Then we train separate CNN models … 2) does this image has disease label X? and 3) what are all the disease labels of the image? …”; p.52 col.2 Figure 1 and p.52 Section III: “… Convolution layers … consists of several filters (aka kernels) that we want to learn during the training [phase]. … The pooling layer … performs a non-linear down-sampling operation (Figure 1(b)) … After several convolution and pooling layers, the network is ended by one or more fully-connected layers … The loss layer is used to train the neural network … softmax loss function is used for classification problem, and sigmoid cross entropy loss is used for predicting some independent probabilities …”; p.54 col.2 Task 2: Multi-class classification on images with single disease labels: “… As each image may indicate multiple diseases, we want the CNN model to predict the probability for each disease label … we modify the network, making the last layer of the CNN models to contain 10 neurons. Each neuron produces a probability distribution for a single disease label …”; p.54 col.2 Task 3: Classifying images with multiple disease labels: “There are 3,879 images with multiple disease labels. We find that the models trained on single disease cases are still useful in this case. We use the model to compute the probability of all labels, sort the probability in descending order and use all labels with probability above a threshold.”; p.54 Table I; pp.55-56 Section V.D. Task 3: Multiple Disease Detection: “In multi-disease detection cases, we use the images with at least two disease labels to test the models and see if the model can correctly predict all the disease labels. … The model generates four labels with non-zero probability, as the figure shows …”; and p.56 Figure 7).) …
… determine whether a second class exists (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites using the neural network to determine one or more classes/labels. As indicated earlier, Dong teaches training a CNN-based model performing multi-class classification on multiple diseases, by including a loss layer containing a plurality of neurons, with each neuron producing a probability distribution for an identified disease label (based on a final diagnosis), and training the same model with the same last layer using a lung image dataset containing multiple diseases, resulting in the CNN-based model outputting a list of probabilities for multiple diseases representing different labels/classes, with one of these different labels/classes and its associated probability representing a second class (Dong p.52 col.1 2nd paragraph; p.52 col.2 Figure 1 and p.52 Section III; p.54 col.2 Task 2: Multi-class classification on images with single disease labels; p.54 col.2 Task 3: Classifying images with multiple disease labels; p.54 Table I; pp.55-56 Section V.D. Task 3: Multiple Disease Detection; and p.56 Figure 7).) …
… determine whether an Mth class exists (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites using the neural network to determine one or more classes/labels. As indicated earlier, Dong teaches training a CNN-based model performing multi-class classification on multiple diseases, by including a loss layer containing a plurality of neurons, with each neuron producing a probability distribution for an identified disease label (based on a final diagnosis), and training the same model with the same last layer using a lung image dataset containing multiple diseases, resulting in the CNN-based model outputting a list of probabilities for multiple diseases representing different labels/classes, with one of these different labels/classes and its associated probability representing a Mth class (Dong p.52 col.1 2nd paragraph; p.52 col.2 Figure 1 and p.52 Section III; p.54 col.2 Task 2: Multi-class classification on images with single disease labels; p.54 col.2 Task 3: Classifying images with multiple disease labels; p.54 Table I; pp.55-56 Section V.D. Task 3: Multiple Disease Detection; and p.56 Figure 7).) …
… the probability data representing a probability of two or more diseases being located in medical imaging data, the two or more diseases comprising at least two of the following: a lung disease, a heart disease, a bone disease, a cancer, tuberculosis, cardiomegaly, hypoinflation of a lung, opacity of a lung, hyperdistension, a spine degenerative disease, or calcinosis (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites using medical imaging data containing various multiple diseases to generate corresponding probability data for the multiple disease classes/labels. As indicated earlier, Dong teaches training a CNN-based model performing multi-class classification on multiple diseases, where the identified multiple diseases and corresponding disease probabilities for a specified chest x-ray represent different diseases related to the heart (e.g., aortosclerosis in Dong p.56 Figure 7) and lungs (e.g., increased lung marking in Dong p.56 Figure 7) (Dong p.54 col.2 Task 3: Classifying images with multiple disease labels; p.54 Table I; pp.55-56 Section V.D. Task 3: Multiple Disease Detection; and p.56 Figure 7).) … 
Both Wang and Dong are analogous art since they both teach training convolutional neural networks with medical images.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the MV-CNN network that captures different pattern features as taught in Wang and enhance it with techniques including identifying multiple disease labels from the dataset and using a loss layer to determine probability distributions for these multiple disease labels as taught in Dong as a way to generate probability data to identify probabilities for multiple diseases in medical images. The motivation to combine is taught in Dong, as a way to analyze and improve disease diagnosis on datasets that may have low contrast or fewer details, thus helping to improve the prediction accuracy for identified diseases using these datasets (Dong p.51 col.1 4th paragraph: “… Reviewing chest X-rays heavily depends on the experience of radiologists since the image has no spatial information and the overlap of different body parts may hide diseased tissues. Also, many images are difficult to read when the lesions are in low contrast or overlap with large pulmonary vessels. …”; and p.52 col.1 3rd paragraph: “… Our preliminary results shows that … when we generate the top three most likely disease labels, we can predict the right label with over 97% accuracy. In the multi-disease detection task, we achieve a mean average precision of 0.829.”; and pp.55-56 Section V.C. Task 2: Single Disease Classification and Table II, and Section V.D. Task 3: Multiple Disease Detection and Figure 8).
Regarding original Claim 2, 
 
Wang in view of Dong teaches
 (Original) The deep neural network system of claim 1, wherein the first neural network generates mutually exclusive outputs (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s specification paragraph [0025], this limitation broadly recites training a neural network to generate mutually exclusive outputs, where these mutually exclusive outputs are related to a feature detector of the first neural network. Wang teaches training each convolutional neural network branch on different CT image views, and capturing different features/pattern characteristics (edges, solid parts) of nodules from those image views, where the learned kernel filters applied in each convolutional layer (representing feature detectors) capture the different pattern characteristics (edges, solid parts) of nodules, and hence the process of propagating these different CT image views in each of these convolutional neural network branches (one of which represents a first neural network) through their respective convolutional and pooling layers to capture different pattern characteristics as features corresponds to the generation of image view-specific features/pattern characteristics (edges, solid parts) representing mutually exclusive outputs (Wang p.1752 col.2 Section II. Method 2nd paragraph: “… The proposed MV-CNN incorporates three branches that process voxel patches from axial, coronal and sagittal view CT images respectively. The three branches share the same structure that consists of six convolutional layers (C1 to C6) … The six convolutional layers … are divided into three blocks, where each block shares the exact same structure including two convolutional layers of kernel size 3x3. Between each block, max pooling operation … is applied for feature selection.”; p.1753 Figure 1 including caption: “… This network contains three branches aiming at capturing features from axial, coronal, and sagittal image views … The bottom figure shows the feature maps of the C1 layer on three branches, indicated that the learned filters can capture different characteristics of nodules from input CT image (e.g., edge or solid part of a nodule).”).).
Regarding original Claim 3, 
 
Wang in view of Dong teaches
(Original) The deep neural network system of claim 1, wherein the second neural network generates mutually exclusive outputs (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s specification paragraph [0027], this limitation broadly recites training a neural network to generate mutually exclusive outputs, where these mutually exclusive outputs are related to a feature detector of the second neural network. As indicated earlier, Wang teaches training each convolutional neural network branch on different CT image views, and capturing different features/pattern characteristics (edges, solid parts) of nodules from those image views, where the learned kernel filters applied in each convolutional layer (representing feature detectors) capture the different pattern characteristics (edges, solid parts) of nodules, and hence the process of propagating these different CT image views in each of these convolutional neural network branches (one of which represents a second neural network) through their respective convolutional and pooling layers to capture different pattern characteristics as features corresponds to the generation of image view-specific features/pattern characteristics (edges, solid parts) representing mutually exclusive outputs (Wang p.1752 col.2 Section II. Method 2nd paragraph; p.1752 Figure 1 caption: “… This network contains three branches aiming at capturing features from axial, coronal, and sagittal image views … The bottom figure shows the feature maps of the C1 layer on three branches, indicated that the learned filters can capture different characteristics of nodules from input CT image (e.g., edge or solid part of a nodule).”).).
Regarding original Claim 4, 
 
Wang in view of Dong teaches
(Original) The deep neural network system of claim 1, wherein the Mth neural network generates mutually exclusive outputs (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s specification paragraph [0029], this limitation broadly recites training a neural network to generate mutually exclusive outputs, where these mutually exclusive outputs are related to a feature detector of the Mth neural network. As indicated earlier, Wang teaches training each convolutional neural network branch on different CT image views, and capturing different features/pattern characteristics (edges, solid parts) of nodules from those image views, where the learned kernel filters applied in each convolutional layer (representing feature detectors) capture the different pattern characteristics (edges, solid parts) of nodules, and hence the process of propagating these different CT image views in each of these convolutional neural network branches (one of which represents a Mth neural network) through their respective convolutional and pooling layers to capture different pattern characteristics as features corresponds to the generation of image view-specific features/pattern characteristics (edges, solid parts) representing mutually exclusive outputs (Wang p.1752 col.2 Section II. Method 2nd paragraph; p.1752 Figure 1 caption: “… This network contains three branches aiming at capturing features from axial, coronal, and sagittal image views … The bottom figure shows the feature maps of the C1 layer on three branches, indicated that the learned filters can capture different characteristics of nodules from input CT image (e.g., edge or solid part of a nodule).”).).
Regarding previously presented Claim 9, 
Wang teaches
 (Previously Presented) A method, comprising 
using a processor operatively coupled to memory to execute computer executable components (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites a computer system containing a processor that executes the instructions stored in memory. As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) using the CAFFE Toolkit to perform classification on computed tomography (CT) images from the LIDC-IDRI dataset to distinguish different types of nodule and background (non-nodule) voxels. A person having ordinary skill in the art would understand that the training of the MV-CNN using the CAFFE Toolkit would require a computer system that contains a processor and memory, with the processor coupled to memory in order to execute the instructions stored in memory to process the medical images and to train the MV-CNN (Wang p.1752 col.2 1st -2nd paragraphs; p.1753 Figure 1; p.1753 col.1 3rd paragraph-col.2 2nd paragraph; and p.1754 col.2 1st paragraph).) to perform the following acts: 
training a neural network based on an image data set to generate a first neural network of a binary neural network architecture (Examiner’s note: Under its broadest reasonable interpretation, this limitation recites a computer-based component training a first neural network having a binary neural network architecture. As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images from the LIDC-IDRI dataset to distinguish different types of nodule and background (non-nodule) voxels, where the classification of the nodule vs. non-nodule voxels at its output layer represents a binary output, and hence the MV-CNN generating this output represents a network having a binary neural network architecture. Wang further teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, where each branch has the same convolutional neural network structure but focuses on a different view of the CT image to capture different characteristics (edges, solid parts) of the nodules found in the CT image, with one of these convolutional neural network branches representing a first neural network (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1 including caption; p.1753 col.1 3rd paragraph-col.2 2nd paragraph; p.1754 col.1 Section III.A. Dataset 1st paragraph: “We used the public … (LIDC-IDRI) [16] for experimental evaluation. All the nodules in this dataset are annotated … We train the MV-CNN on the training set …; and p.1754 col.1-col.2 Section III.C. Model Training process).) …
… training a copy of the first neural network based on the image data set to generate a second neural network of the binary neural network architecture (Examiner’s note: Under its broadest reasonable interpretation, this limitation recites a computer-based component copying and training a second neural network having a binary neural network architecture, with the copying broadly reciting that the first and second neural networks have the same structure. As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images from the LIDC-IDRI dataset to distinguish different types of nodule and background (non-nodule) voxels, where the classification of the nodule vs. non-nodule voxels at its output layer represents a binary output, and hence the MV-CNN generating this output represents a network having a binary neural network architecture. As indicated earlier, Wang teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, where each branch has the same convolutional neural network structure but focuses on a different view of the CT image to capture different characteristics (edges, solid parts) of the nodules found in the CT image, with one of these convolutional neural network branches representing a second neural network (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1 including caption; p.1753 col.1 3rd paragraph-col.2 2nd paragraph; p.1754 col.1 Section III.A. Dataset 1st paragraph; and p.1754 col.1-col.2 Section III.C. Model Training process).) …
… training a copy of the second neural network based on the image data set to form an Mth neural network of the binary neural network architecture (Examiner’s note: Under its broadest reasonable interpretation, this limitation recites a computer-based component copying and training a Mth neural network having a binary neural network architecture, with the copying broadly reciting that the second and Mth neural networks have the same structure. As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images from the LIDC-IDRI dataset to distinguish different types of nodule and background (non-nodule) voxels, where the classification of the nodule vs non-nodule voxels at its output layer represents a binary output, and hence the MV-CNN generating this output represents a network having a binary neural network architecture. As indicated earlier, Wang teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, where each branch has the same convolutional neural network structure but focuses on a different view of the CT image to capture different characteristics (edges, solid parts) of the nodules found in the CT image, with one of these convolutional neural network branches representing a Mth neural network (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1 including caption; p.1753 col.1 3rd paragraph-col.2 2nd paragraph; p.1754 col.1 Section III.A. Dataset 1st paragraph; and p.1754 col.1-col.2 Section III.C. Model Training process).) …
… wherein M is an integer greater than or equal to three (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites that the number of neural networks is greater or equal to three. As indicated earlier, Wang teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, and hence the number of convolutional neural networks is three (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1).) … 
… generating a neural network architecture comprising the first neural network, the second neural network and the Mth neural network (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites a neural network architecture containing a minimum of three neural networks, each having the same structure. As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images, where the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks with the same convolutional neural network structure (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1 including caption; p.1754 col.1 Section III.A. Dataset 1st paragraph; and p.1754 col.1-col.2 Section III.C. Model Training process).) … 
… wherein the neural network architecture generates probability data by aggregating at least two inference models (Examiner’s note: As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images for distinguishing different types of nodule and background (non-nodule) voxels. As indicated earlier, Wang further teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, where each branch has the same convolutional neural network structure, and where the combined output from the three branches at the output layer is fed into a binary softmax function (representing an output layer) that generates probability distributions over the class labels (such that the output represents probability data aggregated by at least two inference models) (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1; and p.1753 col.1 3rd paragraph – col.2 2nd paragraph: “… In the case of the output layer (F8) … the activation values are fed into a binary softmax function that are converted into probability distributions over the class labels … The goal of network training is to maximize the probability of the correct class … the loss function is defined as: L(W) <see equation (2)> where                         
                            
                                
                                    
                                        
                                            y
                                        
                                        ^
                                    
                                
                                
                                    n
                                
                            
                        
                     represents the predicted probability from MV-CNN …”).) …
… the image data set representing at least one lung (Examiner’s note: As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) using the CAFFE Toolkit to perform classification on computed tomography (CT) images from the LIDC-IDRI dataset, where the LIDC-IDRI dataset contains medical images of patients’ lungs (Wang p.1754 col.1 Section III.A. Dataset 1st paragraph).) …
… generating a multi-dimensional visualization based on a classification or a localization of the one or more diseases in an anatomical region represented in the image data set (Examiner’s note: As indicated earlier, Wang teaches generating visualizations for six representative nodules from the LIDC-IDRI dataset, where these nodule visualizations represent nodules in lung tissues, and these visualizations are extracted from the CT image dataset (where this CT image dataset contains 2-dimensional medical images of patients’ lungs), and each identification of a representative nodule type represents a presence/detection of a nodule type corresponding to various multiple diseases in a patient’s lungs (such as L5: calcific nodules representing calcinosis; and L6: ground-glass opacity nodules representing opacity of a lung) (Wang p.1754 col.2-p.1755 col.1 Section IV.B. Visualization: “The segmentation results are visualized to allow the comparison of different approaches. We demonstrate six representative nodules from the LIDC-IDRI testing set (Fig. 3). … the proposed MV-CNN remains robust when segmenting such nodules … For cavitary (L4) and calcific (L5) nodules … the MV-CNN is able to reserve the complete nodule shape … ground-glass opacity (GGO) nodules (L6) … the proposed method performs reasonably well in capturing the nodule shape with GGO.”; and p.1755 Figure 3).).
While Wang teaches training a MV-CNN containing multiple convolutional neural networks to perform classification of nodules from CT medical images, Wang does not explicitly teach
… determine whether a first class exists …
… determine whether a second class exists …
… determine whether an Mth class exists …
… the probability data representing a probability of two or more diseases being located in the image data set … 
Dong teaches
… determine whether a first class exists (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites using the neural network to determine one or more classes/labels. As indicated earlier, Dong teaches training a CNN-based model performing multi-class classification on multiple diseases, by including a loss layer containing a plurality of neurons, with each neuron producing a probability distribution for an identified disease label (based on a final diagnosis), and training the same model with the same last layer using a lung image dataset containing multiple diseases, resulting in the CNN-based model outputting a list of probabilities for multiple diseases representing different labels/classes, with one of these different labels/classes and its associated probability representing a first class (Dong p.52 col.1 2nd paragraph; p.52 col.2 Figure 1 and p.52 Section III; p.54 col.2 Task 2: Multi-class classification on images with single disease labels; p.54 col.2 Task 3: Classifying images with multiple disease labels; p.54 Table I; pp.55-56 Section V.D. Task 3: Multiple Disease Detection; and p.56 Figure 7).) …
… determine whether a second class exists (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites using the neural network to determine one or more classes/labels. As indicated earlier, Dong teaches training a CNN-based model performing multi-class classification on multiple diseases, by including a loss layer containing a plurality of neurons, with each neuron producing a probability distribution for an identified disease label (based on a final diagnosis), and training the same model with the same last layer using a lung image dataset containing multiple diseases, resulting in the CNN-based model outputting a list of probabilities for multiple diseases representing different labels/classes, with one of these different labels/classes and its associated probability representing a second class (Dong p.52 col.1 2nd paragraph; p.52 col.2 Figure 1 and p.52 Section III; p.54 col.2 Task 2: Multi-class classification on images with single disease labels; p.54 col.2 Task 3: Classifying images with multiple disease labels; p.54 Table I; pp.55-56 Section V.D. Task 3: Multiple Disease Detection; and p.56 Figure 7).) …
… determine whether an Mth class exists (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites using the neural network to determine one or more classes/labels. As indicated earlier, Dong teaches training a CNN-based model performing multi-class classification on multiple diseases, by including a loss layer containing a plurality of neurons, with each neuron producing a probability distribution for an identified disease label (based on a final diagnosis), and training the same model with the same last layer using a lung image dataset containing multiple diseases, resulting in the CNN-based model outputting a list of probabilities for multiple diseases representing different labels/classes, with one of these different labels/classes and its associated probability representing a Mth class (Dong p.52 col.1 2nd paragraph; p.52 col.2 Figure 1 and p.52 Section III; p.54 col.2 Task 2: Multi-class classification on images with single disease labels; p.54 col.2 Task 3: Classifying images with multiple disease labels; p.54 Table I; pp.55-56 Section V.D. Task 3: Multiple Disease Detection; and p.56 Figure 7).) …
… the probability data representing a probability of two or more diseases being located in the image data set (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites using medical imaging data containing multiple diseases to generate corresponding probability data for the identified multiple disease classes/labels. As indicated earlier, As indicated earlier, Dong teaches training a CNN-based model performing multi-class classification on multiple diseases, where the identified multiple diseases and corresponding disease probabilities for a specified chest x-ray represent different diseases related to the heart (e.g., aortosclerosis in Dong p.56 Figure 7) and lungs (e.g., increased lung marking in Dong p.56 Figure 7) (Dong p.54 col.2 Task 3: Classifying images with multiple disease labels; p.54 Table I; pp.55-56 Section V.D. Task 3: Multiple Disease Detection: “… In multi-disease detection cases, we use the images with at least two disease labels to test the models and see if the model can correctly predict all the disease labels. … The model generates four labels with non-zero probability, as the figure shows …”; and p.56 Figure 7).) … 
Both Wang and Dong are analogous art since they both teach training convolutional neural networks with medical images.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the MV-CNN network that captures different pattern features as taught in Wang and enhance it with techniques including identifying multiple disease labels from the dataset and using a loss layer to determine probability distributions for these multiple disease labels as taught in Dong as a way to generate probability data to identify probabilities for multiple diseases in medical images. The motivation to combine is taught in Dong, as provided in the prior art claim mapping of Claim 1 recited above.
Regarding original Claim 10,
Claim 10 recites the method of claim 9, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 2, and hence is rejected under similar rationale provided by Wang in view of Dong as indicated in Claim 2, in view of rejections from Claim 9.
Regarding original Claim 11,
Claim 11 recites the method of claim 9, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 3, and hence is rejected under similar rationale provided by Wang in view of Dong as indicated in Claim 3, in view of rejections from Claim 9.
Regarding original Claim 12,
Claim 12 recites the method of claim 9, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 4, and hence is rejected under similar rationale provided by Wang in view of Dong as indicated in Claim 4, in view of rejections from Claim 9.
Regarding previously presented Claim 15, 
Wang teaches
(Previously Presented) A non-transitory computer readable storage device comprising instructions that, in response to execution, cause a system comprising a processor to perform operations (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites a computer system containing a processor that executes the instructions stored in memory (representing a non-transitory computer readable storage medium). As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) using the CAFFE Toolkit to perform classification on computed tomography (CT) images from the LIDC-IDRI dataset to distinguish different types of nodule and background (non-nodule) voxels. A person having ordinary skill in the art would understand that the training of the MV-CNN using the CAFFE Toolkit would require a computer system that contains a processor and memory (representing a non-transitory computer readable storage medium such as RAM or ROM), with the processor executing the instructions stored in this memory to process the medical images and to train the MV-CNN (Wang p.1752 col.2 1st -2nd paragraphs; p.1753 Figure 1; p.1753 col.1 3rd paragraph-col.2 2nd paragraph; and p.1754 col.2 1st paragraph).), comprising: 
training a neural network based on an image data set to generate a first neural network (Examiner’s note: Under its broadest reasonable interpretation, this limitation recites a computer-based component training a first neural network having a binary neural network architecture. As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images from the LIDC-IDRI dataset to distinguish different types of nodule and background (non-nodule) voxels, where the classification of the nodule vs. non-nodule voxels at its output layer represents a binary output, and hence the MV-CNN generating this output represents a network having a binary neural network architecture. Wang further teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, where each branch has the same convolutional neural network structure but focuses on a different view of the CT image to capture different characteristics (edges, solid parts) of the nodules found in the CT image, with one of these convolutional neural network branches representing a first neural network (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1 including caption; p.1753 col.1 3rd paragraph-col.2 2nd paragraph; p.1754 col.1 Section III.A. Dataset 1st paragraph: “We used the public … (LIDC-IDRI) [16] for experimental evaluation. All the nodules in this dataset are annotated … We train the MV-CNN on the training set …; and p.1754 col.1-col.2 Section III.C. Model Training process).) …
… training a copy of the first neural network based on the image data set to generate a second neural network (Examiner’s note: Under its broadest reasonable interpretation, this limitation recites a computer-based component copying and training a second neural network having a binary neural network architecture, with the copying broadly reciting that the first and second neural networks have the same structure. As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images from the LIDC-IDRI dataset to distinguish different types of nodule and background (non-nodule) voxels, where the classification of the nodule vs. non-nodule voxels at its output layer represents a binary output, and hence the MV-CNN generating this output represents a network having a binary neural network architecture. As indicated earlier, Wang teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, where each branch has the same convolutional neural network structure but focuses on a different view of the CT image to capture different characteristics (edges, solid parts) of the nodules found in the CT image, with one of these convolutional neural network branches representing a second neural network (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1 including caption; p.1753 col.1 3rd paragraph-col.2 2nd paragraph; p.1754 col.1 Section III.A. Dataset 1st paragraph; and p.1754 col.1-col.2 Section III.C. Model Training process).) … 
… training a copy of the second neural network based on the image data set to form an Mth neural network (Examiner’s note: Under its broadest reasonable interpretation, this limitation recites a computer-based component copying and training a Mth neural network having a binary neural network architecture, with the copying broadly reciting that the second and Mth neural networks have the same structure. As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images from the LIDC-IDRI dataset to distinguish different types of nodule and background (non-nodule) voxels, where the classification of the nodule vs non-nodule voxels at its output layer represents a binary output, and hence the MV-CNN generating this output represents a network having a binary neural network architecture. As indicated earlier, Wang teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, where each branch has the same convolutional neural network structure but focuses on a different view of the CT image to capture different characteristics (edges, solid parts) of the nodules found in the CT image, with one of these convolutional neural network branches representing a Mth neural network (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1 including caption; p.1753 col.1 3rd paragraph-col.2 2nd paragraph; p.1754 col.1 Section III.A. Dataset 1st paragraph; and p.1754 col.1-col.2 Section III.C. Model Training process).), 
… wherein M is an integer greater than or equal to three (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites that the number of neural networks is greater or equal to three. As indicated earlier, Wang teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, and hence the number of convolutional neural networks is three (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1).) …
… generating a neural network architecture that includes the first neural network, the second neural network and the Mth neural network (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites a neural network architecture containing a minimum of three neural networks, each having the same structure. As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images, where the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks with the same convolutional neural network structure (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1; p.1754 col.1 Section III.A. Dataset 1st paragraph; and p.1754 col.1-col.2 Section III.C. Model Training process).) …
… wherein the neural network architecture generates probability data by aggregating at least two inference models (Examiner’s note: As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images for distinguishing different types of nodule and background (non-nodule) voxels. As indicated earlier, Wang further teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, where each branch has the same convolutional neural network structure, and where the combined output from the three branches at the output layer is fed into a binary softmax function (representing an output layer) that generates probability distributions over the class labels (such that the output represents probability data aggregated by at least two inference models) (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1; and p.1753 col.1 3rd paragraph – col.2 2nd paragraph: “… In the case of the output layer (F8) … the activation values are fed into a binary softmax function that are converted into probability distributions over the class labels … The goal of network training is to maximize the probability of the correct class … the loss function is defined as: L(W) <see equation (2)> where                         
                            
                                
                                    
                                        
                                            y
                                        
                                        ^
                                    
                                
                                
                                    n
                                
                            
                        
                     represents the predicted probability from MV-CNN …”).) … 
While Wang teaches training a MV-CNN containing multiple convolutional neural networks to perform classification of nodules from CT medical images, Wang does not explicitly teach
… determine whether a first class exists …
… determine whether a second class exists …
… determine whether an Mth class exists …
… the probability data representing a probability of two or more diseases being located in the image data set, the two or more diseases comprising at least two of the following: a lung disease, a heart disease, a bone disease, a cancer, tuberculosis, cardiomegaly, hypoinflation of a lung, opacity of a lung, hyperdistension, a spine degenerative disease, or calcinosis.
Dong teaches
… determine whether a first class exists (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites using the neural network to determine one or more classes/labels. As indicated earlier, Dong teaches training a CNN-based model performing multi-class classification on multiple diseases, by including a loss layer containing a plurality of neurons, with each neuron producing a probability distribution for an identified disease label (based on a final diagnosis), and training the same model with the same last layer using a lung image dataset containing multiple diseases, resulting in the CNN-based model outputting a list of probabilities for multiple diseases representing different labels/classes, with one of these different labels/classes and its associated probability representing a first class (Dong p.52 col.1 2nd paragraph; p.52 col.2 Figure 1 and p.52 Section III; p.54 col.2 Task 2: Multi-class classification on images with single disease labels; p.54 col.2 Task 3: Classifying images with multiple disease labels; p.54 Table I; pp.55-56 Section V.D. Task 3: Multiple Disease Detection; and p.56 Figure 7).) …
… determine whether a second class exists (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites using the neural network to determine one or more classes/labels. As indicated earlier, Dong teaches training a CNN-based model performing multi-class classification on multiple diseases, by including a loss layer containing a plurality of neurons, with each neuron producing a probability distribution for an identified disease label (based on a final diagnosis), and training the same model with the same last layer using a lung image dataset containing multiple diseases, resulting in the CNN-based model outputting a list of probabilities for multiple diseases representing different labels/classes, with one of these different labels/classes and its associated probability representing a second class (Dong p.52 col.1 2nd paragraph; p.52 col.2 Figure 1 and p.52 Section III; p.54 col.2 Task 2: Multi-class classification on images with single disease labels; p.54 col.2 Task 3: Classifying images with multiple disease labels; p.54 Table I; pp.55-56 Section V.D. Task 3: Multiple Disease Detection; and p.56 Figure 7).) …
… determine whether an Mth class exists (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites using the neural network to determine one or more classes/labels. As indicated earlier, Dong teaches training a CNN-based model performing multi-class classification on multiple diseases, by including a loss layer containing a plurality of neurons, with each neuron producing a probability distribution for an identified disease label (based on a final diagnosis), and training the same model with the same last layer using a lung image dataset containing multiple diseases, resulting in the CNN-based model outputting a list of probabilities for multiple diseases representing different labels/classes, with one of these different labels/classes and its associated probability representing a Mth class (Dong p.52 col.1 2nd paragraph; p.52 col.2 Figure 1 and p.52 Section III; p.54 col.2 Task 2: Multi-class classification on images with single disease labels; p.54 col.2 Task 3: Classifying images with multiple disease labels; p.54 Table I; pp.55-56 Section V.D. Task 3: Multiple Disease Detection; and p.56 Figure 7).) …
… the probability data representing a probability of two or more diseases being located in the image data set, the two or more diseases comprising at least two of the following: a lung disease, a heart disease, a bone disease, a cancer, tuberculosis, cardiomegaly, hypoinflation of a lung, opacity of a lung, hyperdistension, a spine degenerative disease, or calcinosis (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites using medical imaging data containing multiple diseases to generate corresponding probability data for the identified multiple disease classes/labels. As indicated earlier, As indicated earlier, Dong teaches training a CNN-based model performing multi-class classification on multiple diseases, where the identified multiple diseases and corresponding disease probabilities for a specified chest x-ray represent different diseases related to the heart (e.g., aortosclerosis in Dong p.56 Figure 7) and lungs (e.g., increased lung marking in Dong p.56 Figure 7) (Dong p.54 col.2 Task 3: Classifying images with multiple disease labels; p.54 Table I; pp.55-56 Section V.D. Task 3: Multiple Disease Detection: “… In multi-disease detection cases, we use the images with at least two disease labels to test the models and see if the model can correctly predict all the disease labels. … The model generates four labels with non-zero probability, as the figure shows …”; and p.56 Figure 7).).
Both Wang and Dong are analogous art since they both teach training convolutional neural networks with medical images.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the MV-CNN network that captures different pattern features as taught in Wang and enhance it with techniques including identifying multiple disease labels from the dataset and using a loss layer to determine probability distributions for these multiple disease labels as taught in Dong as a way to generate probability data to identify probabilities for multiple diseases in medical images. The motivation to combine is taught in Dong, as provided in the prior art claim mapping of Claim 1 recited above.
Regarding previously presented Claim 16,
Claim 16 recites the non-transitory computer readable storage device of claim 15, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 2, and hence is rejected under similar rationale provided by Wang in view of Dong as indicated in Claim 2, in view of rejections from Claim 15.
Regarding previously presented Claim 17,
Claim 17 recites the non-transitory computer readable storage device of claim 15, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 3, and hence is rejected under similar rationale provided by Wang in view of Dong as indicated in Claim 3, in view of rejections from Claim 15.
Regarding previously presented Claim 18,
Claim 18 recites the non-transitory computer readable storage device of claim 15, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 4, and hence is rejected under similar rationale provided by Wang in view of Dong as indicated in Claim 4, in view of rejections from Claim 15.
Regarding previously presented Claim 19, 
Wang in view of Dong teaches
(Previously Presented) The non-transitory computer readable storage device of claim 15, wherein the probability data comprises a first probability indicating a likelihood of a negative prognosis for the one or more diseases and a second probability indicating a likelihood of a positive prognosis for the one or more diseases (Examiner’s note: Under its broadest reasonable interpretation, the term “probability indicating a likelihood of a negative prognosis for the one or more diseases” broadly recites the smallest probability for a presence of the disease among the list of multiple disease probabilities, while the term “probability indicating a likelihood of a positive prognosis for the one or more diseases” broadly recites the  highest probability for a presence of the disease among the list of multiple disease probabilities. As indicated earlier, Dong teaches training a CNN-based model performing multi-class classification on multiple diseases, where the identified multiple diseases and corresponding disease probabilities for a specified chest x-ray represent different diseases related to the heart and lungs. For the example shown in Dong p.56 Figure 7, Dong teaches that the probability for “increased lung marking” (corresponding to a lung disease) is 78.5%, which is the highest probability among the list of top 4 diseases identified in the chest x-ray (hence representing a probability indicating a likelihood of a positive prognosis for the one or more diseases), with the probability for “increased heart shadow” (corresponding to a heart disease) is 0.2%, which is the smallest probability among the list of top 4 diseases identified in the chest x-ray (hence representing a probability indicating a likelihood of a negative prognosis for the one or more diseases) (Dong p.54 col.2 Task 3: Classifying images with multiple disease labels; p.54 Table I; pp.55-56 Section V.D. Task 3: Multiple Disease Detection; and p.56 Figure 7).).
Regarding previously presented Claim 20, 
Wang in view of Dong teaches
(Previously Presented) The non-transitory computer readable storage device of claim 15, wherein the operations further comprise 
generating a multi-dimensional visualization based on a classification or a localization of the one or more diseases in an anatomical region represented in the image data set (Examiner’s note: As indicated earlier, Wang teaches generating visualizations for six representative nodules from the LIDC-IDRI dataset, where these nodule visualizations represent nodules in lung tissues, and these visualizations are extracted from the CT image dataset (where this CT image dataset contains 2-dimensional medical images of patients’ lungs), and each identification of a representative nodule type represents a presence/detection of a nodule type corresponding to various multiple diseases in a patient’s lungs (such as L5: calcific nodules representing calcinosis; and L6: ground-glass opacity nodules representing opacity of a lung) (Wang p.1754 col.2-p.1755 col.1 Section IV.B. Visualization; and p.1755 Figure 3).), 
wherein the multi-dimensional visualization comprises one or more visual characteristics comprising a color, a size, a hue, a shading, or a combination thereof (Examiner’s note: As indicated earlier, Wang teaches generating visualizations for six representative nodules from the LIDC-IDRI dataset, where these nodule visualizations represent nodules in lung tissues, where the size and the shadings of the nodules are shown in the visualizations in Wang p.1755 Figure 3 (Wang p.1754 col.2-p.1755 col.1 Section IV.B. Visualization; and p.1755 Figure 3).).
Claims 5-6 and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over 
Wang et al., A Multi-view Deep Convolutional Neural Networks for Lung Nodule Segmentation, 2017 EMBC July 11-15, 2017, published September 14 2017 [hereafter referred as Wang] in view of Dong et al., Learning to Read Chest X-Ray Images from 16000+ Examples Using CNN, 2017 IEEE/ACM CHASE July 17-19 2017, published August 17 2017 [hereafter referred as Dong] as applied to Claims 1 and 9; in further view of Shen et al., Multi-scale Convolutional Neural Networks for Lung Nodule Classification, 24th International Conference IPMI 2015 June 28-July 3 2015 [hereafter referred as Shen].  
Regarding original Claim 5, 
 
Wang in view of Dong as applied to Claim 1 teaches
(Original) The deep neural network system of claim 1, 
wherein training of the neural network with respect to the first class (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites training a first neural network. As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images from the LIDC-IDRI dataset to distinguish different types of nodule and background (non-nodule) voxels. Wang further teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, where each branch has the same convolutional neural network structure, where one of the branches represent a first neural network (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1; p.1754 col.1 Section III.A. Dataset 1st paragraph; and p.1754 col.1-col.2 Section III.C. Model Training process). As indicated earlier, Dong teaches training a CNN-based model performing multi-class classification on multiple diseases, by updating the last layer (loss layer, representing an output layer) with a layer of neurons, with each neuron producing a probability distribution for a disease label, and training the same model with the same last layer using an lung image dataset containing multiple diseases, resulting in the CNN-based model outputting a list of probabilities for multiple diseases representing different labels/classes, with one of these different labels/classes and its associated probability representing a first class (Dong p.52 col.1 2nd paragraph; p.52 col.2 Figure 1 and p.52 Section III; p.54 col.2 Task 2: Multi-class classification on images with single disease labels; p.54 col.2 Task 3: Classifying images with multiple disease labels; p.54 Table I; pp.55-56 Section V.D. Task 3: Multiple Disease Detection; and p.56 Figure 7). Hence, the combination of the Wang and Dong references teach the training of a first neural network with respect to a first class.) and
training of the copy of the neural network with respect to the second class (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites training a second neural network. As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images for distinguishing different types of nodule and background (non-nodule) voxels. As indicated earlier, Wang further teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, where each branch has the same convolutional neural network structure, where one of the branches represents a second neural network (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1; p.1754 col.1 Section III.A. Dataset 1st paragraph; and p.1754 col.1-col.2 Section III.C. Model Training process). As indicated earlier, Dong teaches training a CNN-based model performing multi-class classification on multiple diseases, by updating the last layer (loss layer, representing an output layer) with a layer of neurons, with each neuron producing a probability distribution for a disease label, and training the same model with the same last layer using an lung image dataset containing multiple diseases, resulting in the CNN-based model outputting a list of probabilities for multiple diseases representing different labels/classes, with one of these different labels/classes and its associated probability representing a second class (Dong p.52 col.1 2nd paragraph; p.52 col.2 Figure 1 and p.52 Section III; p.54 col.2 Task 2: Multi-class classification on images with single disease labels; p.54 col.2 Task 3: Classifying images with multiple disease labels; p.54 Table I; pp.55-56 Section V.D. Task 3: Multiple Disease Detection; and p.56 Figure 7). Hence, the combination of the Wang and Dong references teach the training of a second neural network with respect to a second class.) …
While Wang in view of Dong teaches merging the output of the three convolutional neural network branches (Dong p.1753 Figure 1), Wang in view of Dong does not explicitly teach
… training … are performed in a concatenating manner.
Shen teaches
… training … are performed in a concatenating manner (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites training a first neural network and second neural network (which is a copy of the first neural network) in a concatenating manner. Shen teaches training a multi-scale convolutional neural network architecture (MCNN) involving three parallel CNNs (that share the same parameters) that receive different scaled image patches of lung nodule patches to produce a concatenated output from the three CNN, where this concatenated output is a final discriminative feature vector that is fed to a final classifier to generate the final output, and where this training is based on the training objective function from deeply-supervised nets (DSN), where “companion objectives” representing the feature vectors from each CNN are concatenated, such that this training process according to this combined objective function using the objective function (and corresponding feature vectors) from the three CNNs to produce a final discriminative feature vector represents training a first neural network and a second neural network in a concatenating manner (Shen pp.592-593 Section 2.2 Multi-scale Nodule Representation: “… In the proposed MCNN architecture, three CNN that take nodule patches from different scales (as shown in Fig.3) as inputs are assembled in parallel. … In order to reduce the parameters of the MCNN, we follow the setting in [6] to share parameters among all the CNN. The resulting output of our MCNN is the concatenation of the three CNN outputs, forming the final discriminative feature vector, which will be directly fed to the final classifier without any feature reduction. … Unlike the traditional objective function in CNN, DSN introduced “companion objectives” [12] into the final objective function to alleviate the vanishing gradients problem so the training process can be fast and stable. The entire objective function is thus represented as F(W)=P(W)+Q(W) … In our work, P(W)=LOSS(W,                         
                            
                                
                                    w
                                
                                
                                    (
                                    o
                                    u
                                    t
                                    )
                                
                            
                        
                    ) is the overall hinge loss function for the concatenated feature layer, and Q(W)=                        
                            
                                
                                    ∑
                                    
                                        m
                                        =
                                        1
                                    
                                    
                                        M
                                    
                                
                                
                                    
                                        
                                            α
                                        
                                        
                                            m
                                        
                                    
                                    l
                                    o
                                    s
                                    s
                                    (
                                    W
                                    ,
                                    
                                        
                                            w
                                        
                                        
                                            
                                                
                                                    m
                                                
                                            
                                        
                                    
                                    )
                                
                            
                        
                     is the sum of the companion hinge loss functions from all CNN.                         
                            
                                
                                    α
                                
                                
                                    m
                                
                            
                        
                     is the coefficient for the mth CNN. W denotes the combination of the weights from all of the CNN, while                         
                            
                                
                                    w
                                
                                
                                    
                                        
                                            m
                                        
                                    
                                
                            
                        
                     and                         
                            
                                
                                    w
                                
                                
                                    (
                                    o
                                    u
                                    t
                                    )
                                
                            
                        
                     are the weights of the feature layer of the mth CNN and the weights of the final concatenated feature layer respectively.”).).
Both Wang in view of Dong and Shen are analogous art since they both teach training convolutional neural networks with medical images.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the training of the MV-CNN taught in Wang in view of Dong and enhance it to apply the training objective function of the MCNN taught in Shen as a way to concatenate the features from each CNN into a final discriminative feature vector. The motivation to combine is taught in Shen, as this training objective function is shown to produce feature vectors with reduced redundant information from the original images, as well as alleviating the vanishing gradients problem, resulting in the generation of an optimized CNN model that is computationally fast and stable (Shen pp.592-593 Section 2.2 Multi-scale Nodule Representation: “… In the proposed MCNN architecture, three CNN that take nodule patches from different scales (as shown in Fig.3) as inputs are assembled in parallel. … Unlike the traditional objective function in CNN, DSN introduced “companion objectives” [12] into the final objective function to alleviate the vanishing gradients problem so the training process can be fast and stable. The entire objective function is thus represented as F(W)=P(W)+Q(W) … In our work, P(W)=LOSS(W,                         
                            
                                
                                    w
                                
                                
                                    (
                                    o
                                    u
                                    t
                                    )
                                
                            
                        
                    ) is the overall hinge loss function for the concatenated feature layer, and Q(W)=                        
                            
                                
                                    ∑
                                    
                                        m
                                        =
                                        1
                                    
                                    
                                        M
                                    
                                
                                
                                    
                                        
                                            α
                                        
                                        
                                            m
                                        
                                    
                                    l
                                    o
                                    s
                                    s
                                    (
                                    W
                                    ,
                                    
                                        
                                            w
                                        
                                        
                                            
                                                
                                                    m
                                                
                                            
                                        
                                    
                                    )
                                
                            
                        
                     is the sum of the companion hinge loss functions from all CNN.                         
                            
                                
                                    α
                                
                                
                                    m
                                
                            
                        
                     is the coefficient for the mth CNN. … In this way, F(W) keeps each network optimized and also makes the assembly sensible. Figure 4 shows the concatenated features projected into a 2-D subspace. It shows that the proposed MCNN model is able to remove the redundant information in the original images and extract discriminative features.”).
Regarding original Claim 6, 
 
Wang in view of Dong as applied to Claim 1 teaches
(Original) The deep neural network system of claim 1, 
wherein training of the copy of the first neural network with respect to the second class (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites training a second neural network. As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images for distinguishing different types of nodule and background (non-nodule) voxels. As indicated earlier, Wang further teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, where each branch has the same convolutional neural network structure, where one of the branches represents a second neural network (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1; p.1754 col.1 Section III.A. Dataset 1st paragraph; and p.1754 col.1-col.2 Section III.C. Model Training process). As indicated earlier, Dong teaches training a CNN-based model performing multi-class classification on multiple diseases, by updating the last layer (loss layer, representing an output layer) with a layer of neurons, with each neuron producing a probability distribution for a disease label, and training the same model with the same last layer using an lung image dataset containing multiple diseases, resulting in the CNN-based model outputting a list of probabilities for multiple diseases representing different labels/classes, with one of these different labels/classes and its associated probability representing a second class (Dong p.52 col.1 2nd paragraph; p.52 col.2 Figure 1 and p.52 Section III; p.54 col.2 Task 2: Multi-class classification on images with single disease labels; p.54 col.2 Task 3: Classifying images with multiple disease labels; p.54 Table I; pp.55-56 Section V.D. Task 3: Multiple Disease Detection; and p.56 Figure 7). Hence, the combination of the Wang and Dong references teach the training of a second neural network with respect to a second class.) and
training of the copy of the second neural network with respect to the Mth class (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites training a Mth neural network. As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images for distinguishing different types of nodule and background (non-nodule) voxels. As indicated earlier, Wang further teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, where each branch has the same convolutional neural network structure, where one of the branches represents a Mth neural network (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1; p.1754 col.1 Section III.A. Dataset 1st paragraph; and p.1754 col.1-col.2 Section III.C. Model Training process). As indicated earlier, Dong teaches training a CNN-based model performing multi-class classification on multiple diseases, by updating the last layer (loss layer, representing an output layer) with a layer of neurons, with each neuron producing a probability distribution for a disease label, and training the same model with the same last layer using an lung image dataset containing multiple diseases, resulting in the CNN-based model outputting a list of probabilities for multiple diseases representing different labels/classes, with one of these different labels/classes and its associated probability representing a Mth class (Dong p.52 col.1 2nd paragraph; p.52 col.2 Figure 1 and p.52 Section III; p.54 col.2 Task 2: Multi-class classification on images with single disease labels; p.54 col.2 Task 3: Classifying images with multiple disease labels; p.54 Table I; pp.55-56 Section V.D. Task 3: Multiple Disease Detection; and p.56 Figure 7). Hence, the combination of the Wang and Dong references teach the training of a Mth neural network with respect to a Mth class.) …
While Wang in view of Dong teaches merging the output of the three convolutional neural network branches (Dong p.1753 Figure 1), Wang in view of Dong does not explicitly teach
… training … are performed in a concatenating manner.
Shen teaches
… training … are performed in a concatenating manner (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites training a second neural network (which is a copy of the first neural network) and a Mth neural network (which is a copy of the second neural network) in a concatenating manner. As indicated earlier, Shen teaches training a multi-scale convolutional neural network architecture (MCNN) involving three parallel CNNs (that share the same parameters) that receive different scaled image patches of lung nodule patches to produce a concatenated output from the three CNN, where this concatenated output is a final discriminative feature vector that is fed to a final classifier to generate the final output, and where this training is based on the training objective function from deeply-supervised nets (DSN), where “companion objectives” representing the feature vectors from each CNN are concatenated, such that this training process according to this combined objective function using the objective function (and corresponding feature vectors) from the three CNNs to produce a final discriminative feature vector represents training a second neural network and a Mth neural network in a concatenating manner (Shen pp.592-593 Section 2.2 Multi-scale Nodule Representation).).
Both Wang in view of Dong and Shen are analogous art since they both teach training convolutional neural networks with medical images.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the training of the MV-CNN taught in Wang in view of Dong and enhance it to apply the training objective function of the MCNN taught in Shen as a way to concatenate the features from each CNN into a final discriminative feature vector. The motivation to combine is taught in Shen, as provided in the prior art claim mapping of Claim 5 recited above.
Regarding original Claim 13, 
Wang in view of Dong as applied to Claim 9 teaches
(Original) The method of claim 9, 
wherein the training the copy of the first neural network comprises training the copy of the first neural network with respect to the training of the neural network (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites training a first neural network and training a second neural network (which is a copy of the first neural network). As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images for distinguishing different types of nodule and background (non-nodule) voxels. As indicated earlier, Wang further teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, where each branch has the same convolutional neural network structure, where one of the branches represents a first neural network and another branch represents a second neural network (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1; p.1754 col.1 Section III.A. Dataset 1st paragraph; and p.1754 col.1-col.2 Section III.C. Model Training process).) …
While Wang in view of Dong teaches merging the output of the three convolutional neural network branches (Dong p.1753 Figure 1), Wang in view of Dong does not explicitly teach
… training … in a concatenating manner.
Shen teaches
… training … in a concatenating manner (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites training a first neural network and training a second neural network (which is a copy of the first neural network) in a concatenating manner. As indicated earlier, Shen teaches training a multi-scale convolutional neural network architecture (MCNN) involving three parallel CNNs (that share the same parameters) that receive different scaled image patches of lung nodule patches to produce a concatenated output from the three CNN, where this concatenated output is a final discriminative feature vector that is fed to a final classifier to generate the final output, and where this training is based on the training objective function from deeply-supervised nets (DSN), where “companion objectives” representing the feature vectors from each CNN are concatenated, such that this training process according to this combined objective function using the objective function (and corresponding feature vectors) from the three CNNs to produce a final discriminative feature vector represents training a first neural network and a second neural network in a concatenating manner (Shen pp.592-593 Section 2.2 Multi-scale Nodule Representation).).
Both Wang in view of Dong and Shen are analogous art since they both teach training convolutional neural networks with medical images.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the training of the MV-CNN taught in Wang in view of Dong and enhance it to apply the training objective function of the MCNN taught in Shen as a way to concatenate the features from each CNN into a final discriminative feature vector. The motivation to combine is taught in Shen, as provided in the prior art claim mapping of Claim 5 recited above.
Regarding original Claim 14, 
Wang in view of Dong as applied to Claim 9 teaches
(Original) The method of claim 9, 
wherein the training the copy of the second neural network comprises training the copy of the second neural network with respect to the training of the copy of the first neural network (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites training a second neural network (which is a copy of the first neural network) and training a Mth neural network (which is a copy of the second neural network). As indicated earlier, Wang teaches training a multi-view convolutional neural network (MV-CNN) to perform classification on computed tomography (CT) images for distinguishing different types of nodule and background (non-nodule) voxels. As indicated earlier, Wang further teaches the multi-view convolutional neural network (MV-CNN) contains three branches of convolutional neural networks, where each branch has the same convolutional neural network structure, where one of the branches represents a second neural network and another branch represents a Mth neural network (Wang p.1752 col.2 1st -2nd paragraphs; pp.1752-1753 Section II 1st-2nd paragraphs; p.1753 Figure 1; p.1754 col.1 Section III.A. Dataset 1st paragraph; and p.1754 col.1-col.2 Section III.C. Model Training process).) …
While Wang in view of Dong teaches merging the output of the three convolutional neural network branches (Dong p.1753 Figure 1), Wang in view of Dong does not explicitly teach
… training … in a concatenating manner.
Shen teaches
… training … in a concatenating manner (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites training a second neural network (which is a copy of the first neural network) and training a Mth neural network (which is a copy of the second neural network) in a concatenating manner. As indicated earlier, Shen teaches training a multi-scale convolutional neural network architecture (MCNN) involving three parallel CNNs (that share the same parameters) that receive different scaled image patches of lung nodule patches to produce a concatenated output from the three CNN, where this concatenated output is a final discriminative feature vector that is fed to a final classifier to generate the final output, and where this training is based on the training objective function from deeply-supervised nets (DSN), where “companion objectives” representing the feature vectors from each CNN are concatenated, such that this training process according to this combined objective function using the objective function (and corresponding feature vectors) from the three CNNs to produce a final discriminative feature vector represents training a second neural network and a Mth neural network in a concatenating manner (Shen pp.592-593 Section 2.2 Multi-scale Nodule Representation).).
Both Wang in view of Dong and Shen are analogous art since they both teach training convolutional neural networks with medical images.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the training of the MV-CNN taught in Wang in view of Dong and enhance it to apply the training objective function of the MCNN taught in Shen as a way to concatenate the features from each CNN into a final discriminative feature vector. The motivation to combine is taught in Shen, as provided in the prior art claim mapping of Claim 5 recited above.
Claims 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over 
Wang et al., A Multi-view Deep Convolutional Neural Networks for Lung Nodule Segmentation, 2017 EMBC July 11-15, 2017, published September 14 2017 [hereafter referred as Wang] in view of Dong et al., Learning to Read Chest X-Ray Images from 16000+ Examples Using CNN, 2017 IEEE/ACM CHASE July 17-19 2017, published August 17 2017 [hereafter referred as Dong] as applied to Claim 1; in further view of Ronneberger et al., U-net: Convolutional Networks for Biomedical Image Segmentation, May 18 2015 [hereafter referred as Ronneberger].  
Regarding original Claim 7, 
 
Wang in view of Dong as applied to Claim 1 teaches
(Original) The deep neural network system of claim 1, 
where the first neural network performs a plurality of … sequential and/or parallel downsampling … of the data set associated with convolutional layers of the first neural network (Examiner’s note: Dong teaches a plurality of pooling layers in a convolutional neural network, where the pooling layers in the convolutional neural network are arranged in a series fashion and perform down-sampling operations on the feature maps (Dong p.52  Figure 1; p.53 Figure 2 including caption: “… The network consists of 5 convolution layers, 3 pooling layers …”; and p.53 col.1 2nd paragraph: “… The pooling layer is another important concept of CNN, and it performs a non-linear down-sampling operations (Figure 1(b)). Max-pooling the most commonly used pooling operation …”). Wang also teaches two max-pooling layers in each convolutional neural network branch (where one of these branches represents a first neural network). A person having ordinary skill in the art would understand that these max-pooling layers arranged in a series fashion represent sequential layers that perform max-pooling operations described in the Dong reference (and hence represent a plurality of down-sampling operations) (Wang p.1752 col.2 Section II. 2nd paragraph; and p.1753 Figure 1 including caption: “… Each CNN branch includes six convolutional layers (C1 to C6), two max-pooling layers …”).) …
However, Wang in view of Dong does not teach
… a plurality of sequential and/or parallel … upsampling of the data set associated with convolutional layers …
Ronneberger teaches
… a plurality of sequential and/or parallel … upsampling of the data set associated with convolutional layers (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites a convolutional neural network structure containing a plurality of downsampling and upsampling layers arranged in a series or a parallel fashion. Ronneberger teaches a plurality of max-pool 2x2 layers and up-conv 2x2 layers, where these layers are arranged within a convolutional neural network in a series fashion, with the plurality of max-pool layers representing down-sampling operations, and the plurality of up-conv layers representing up-sampling operations (Ronneberger p.2 Figure 1; p.2 3rd paragraph – p.3 2nd paragraph; and p.4 Section 2 Network Architecture: “The network architecture is illustrated in Figure 1. It consists of a contracting path (left side) and an expansive path (right side). The contracting path follows the typical architecture of a convolutional network. It consists of the repeated application of two 3x3 convolutions … each followed by … a 2x2 max pooling operation with stride 2 for downsampling … Every step in the expansive path consists of an upsampling of the feature map followed by a 2x2 convolution (“up-convolution”) that halves the number of feature channels …”).) …
Both Wang in view of Dong and Ronneberger are analogous art since they both teach training convolutional neural networks with medical images.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take each of the three convolutional neural network branches containing max-pooling layers taught in Wang in view of Dong and enhance them by applying corresponding expansive paths containing corresponding up-convolutional layers taught in Ronneberger as a way to propagate context information to higher resolution layers to improve output localization/classification of classes assigned to each pixel region in an image. The motivation to combine is taught in Ronneberger, since the expansive path containing the corresponding up-convolutional layers also contain a large number of feature channels, which allow the network to propagate context information to higher resolution layers, thus making the prediction of the pixels in border regions of an image more accurate, and thus improving the localization prediction/classification of an image using the convolutional neural network structure as well as improving the computational speed for prediction/classification (Ronneberger p.1 Abstract: “… The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method … Moreover, the network is fast …; p.2 3rd paragraph – p.3 2nd paragraph: “… In this paper, we build upon a more elegant architecture, the so-called “fully convolutional network” [9]. We modify and extend this architecture such that it works with very few training images and yields more precise segmentations; see Figure 1. … In order to localize, high resolution features from the contracting path are combined with the upsampled output. A successive convolution layer can then learn to assemble a more precise output based on this information. … in the upsampling part we have also a large number of feature channels, which allow the network to propagate context information to higher resolution layers. … This strategy allows the seamless segmentation of arbitrarily large images by an overlap-tile strategy (see Figure 2). To predict the pixels in the border region of the image, the missing context is extrapolated by mirroring the input image. This tiling strategy is important to apply the network to large images …”; and pp.6-8 Section 4 Experiments and Section 5 Conclusion).
Regarding original Claim 8, 
Wang in view of Dong as applied to Claim 1 teaches
(Original) The deep neural network system of claim 1, 
where the second neural network performs a plurality of sequential and/or parallel … downsampling … of the data set associated with convolutional layers of the second neural network (Examiner’s note: Dong teaches a plurality of pooling layers in a convolutional neural network, where the pooling layers in the convolutional neural network are arranged in a series fashion and perform down-sampling operations on the feature maps (Dong p.52  Figure 1; p.53 Figure 2 including caption; and p.53 col.1 2nd paragraph). Wang also teaches two max-pooling layers in each convolutional neural network branch (where one of these branches represents a second neural network). A person having ordinary skill in the art would understand that these max-pooling layers arranged in a series fashion represent sequential layers that perform max-pooling operations described in the Dong reference (and hence represent a plurality of down-sampling operations) (Wang p.1752 col.2 Section II. 2nd paragraph; and p.1753 Figure 1 including caption).) …
However, Wang in view of Dong does not teach
… a plurality of sequential and/or parallel … upsampling of the data set associated with convolutional layers …
Ronneberger teaches
… a plurality of sequential and/or parallel … upsampling of the data set associated with convolutional layers (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites a convolutional neural network structure containing a plurality of downsampling and upsampling layers arranged in a series or a parallel fashion. Ronneberger teaches a plurality of max-pool 2x2 layers and up-conv 2x2 layers, where these layers are arranged within a convolutional neural network in a series fashion, with the plurality of max-pool layers representing down-sampling operations, and the plurality of up-conv layers representing up-sampling operations (Ronneberger p.2 Figure 1; p.2 3rd paragraph – p.3 2nd paragraph; and p.4 Section 2 Network Architecture).)…
Both Wang in view of Dong and Ronneberger are analogous art since they both teach training convolutional neural networks with medical images.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take each of the three convolutional neural network branches containing max-pooling layers taught in Wang in view of Dong and enhance them by applying corresponding expansive paths containing corresponding up-convolutional layers taught in Ronneberger as a way to propagate context information to higher resolution layers to improve output localization/classification of classes assigned to each pixel region in an image. The motivation to combine is taught in Ronneberger, as provided in the prior art claim mapping of Claim 7 recited above.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Christodoulidis et al., Multi-source Transfer Learning with Convolutional Neural Networks for Lung Pattern Analysis, arXiv:1612.02589v1, December 8 2016, where Christodoulidis teaches an ensemble of CNNs trained via transfer learning on a target domain of interstitial lung disease CT images, and where the outputs from each fine-tuned CNN are aggregated through averaging (Christodoulidis p.5 Figure 2; and pp.5-6 Section III.C. Multi-source Transfer Learning).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332. The examiner can normally be reached Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/WILLIAM WAI YIN KWAN/Examiner, Art Unit 2121      



/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121