DETAILED ACTION
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

Claims 1-10 are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the independent claim 1 limitation(s) uses ‘a training sample extracting module, a deep layer feature extracting network, a feature embedding network, and a spatial-temporal recurrent attention target detection module”, wherein these “…module”, and “network” as a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure {such as a computer, or a processor} as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the 

















Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, and 7 is rejected under 35 U.S.C. 103 as being unpatentable over Fuchs (US 20190295252 A1); and in view of Min (“Deep learning in bioinformatics”, Briefings in Bioinformatics, 18(5), 2017, 851–869); and further in view of BenTaieb (“Predicting Cancer with a Recurrent Visual Attention Model for Histopathology Images”, Medical Image Computing and Computer Assisted Intervention – MICCAI 2018 pp 129-137).
Re Claim 1, Fuchs discloses A CT lymph node detection system based on spatial-temporal recurrent attention mechanism (see Fuchs: e.g., -- Computer vision algorithms may be used to recognize and detect various features on digital images.--, in [0002]-[0003], and, -- an attention mechanism to generate a slide-level embedding, which was shown to be efficient and useful, especially in data-deprived domains.--, in [0111] ]-[0014], and [0116]; and, -- MIL approach to state-of-the-art fully supervised learning for breast metastasis detection in lymph nodes… Analyzing the mismatches between the predictions on Aperio slides and their matching Philips slides, revealed a perceived difference in brightness, contrast and sharpness that could affect the prediction performance.  In practice, an effective solution to reduce the generalization error even further could be training on a mixed dataset, or fine-tuning the model on data from the new scanner…. The same model, trained under full supervision on CAMELYON16, was applied to the MSK test set of the axillary lymph nodes dataset and resulted in an AUC of 0.727, constituting a 20% drop compared to its performance on the CAMELYON16 test set (as seen on FIG. 24(b), right panel). --, in [0136]-[0138]; and, -- Each transform layer may be of a predefined size to generate the feature maps of a predefined size.  In some embodiments, the inference model 3212 may be a convolutional neural network (CNN) and a deep convolutional network (DCN), among others, with the set of transform layers.--, in [0170]; and, -- the aggregation model 3214 may be a recurrent neural network (RNN), an echo state network (ESN), a long/short term memory (LSTM) network, a deep residual network (DRN), and gated recurrent units (GRU), among others, with the set of transform layers.  For example, the aggregation model 3214 may be the recurrent neural network--, in [0195]),
comprising a training sample extracting module (see Fuchs: e.g., --training models for classifying biomedical images.  An image classifier executing on one or more processors may generate a plurality of tiles from each biomedical image of a plurality of biomedical images.--, in [0003]; --the system may include a model trainer executable on the one or more processors.  The model trainer may generate a plurality of tiles from each biomedical image of the plurality of biomedical images.--, in [0014], [0017], and, -- 50 tiles were sampled from each test slide, in addition to its top-ranked tile, and extracted the final feature embedding before the classification layer….Other top-ranked tiles in negative slides contain edges and inked regions.  The model trained only with the weak MIL assumption was still able to extract features that embed visually.--, in [0090]), 
a deep layer feature extracting network (see Fuchs: e.g., --establishing the inference system may include initializing the inference system comprising a convolutional neural network.  The convolutional neural network may have one or more parameters.--, in [0007]-[0010]; and, --a Deep Multiple Instance Learning (MIL) framework where only the whole slide class is needed to train a convolutional neural network capable of classifying digital slides on a large scale.--, in [0067]-[0068], and [0077]; and, --extracted the final feature embedding before the classification layer, ….Other top-ranked tiles in negative slides contain edges and inked regions.  The model trained only with the weak MIL assumption was still able to extract features that embed visually.--, in [0087]-[0090], and [0093]; and,-- (ii) The "double-sum" model has two parallel feature extractor, one for the 20.times. image and one for the 5.times.  image.  The features are then added element-wise and fed to a classifier.  (iii) The "double-cat" model is very similar to the "double-sum" model but the features coming from the two streams are concatenated instead of added.--, in [0106]),
a feature embedding network (see Fuchs: e.g., --an attention mechanism to generate a slide-level embedding, which was shown to be efficient and useful, especially in data-deprived domains. …the quality of staining varied from the standardized H&E staining protocols used on slides from formalin-fixed, paraffin-embedded tissue.  The dataset also included patients treated with neoadjuvant chemotherapy, which may be diagnostically challenging in routine pathology practice (i.e. small volume of metastatic tumor, therapy-related change in tumor morphology) and are known to lead to high false negative rates. --, in [0116]-[0118], and, -- We have described models trained with the weak supervisory signal coming from the MIL assumption.  These models rely on a representation that is rich enough to obtain high slide classification accuracy on a held-out test set.  The representation learned can be inspected by visualizing a projection of the feature space in two dimensions using dimensionality reduction techniques, such as t-distributed stochastic neighbor embedding (t-SNE). Hundred tiles were sampled from each test slide of the prostate dataset, in addition to its top-ranked tile, and extracted the final feature embedding before the classification layer.  --, in [0124]-[0125]),
and recurrent attention target detection module (see Fuchs: e.g., --establishing the aggregation system may include initializing the aggregation system comprising a recurrent neural network.  The recurrent neural network may have one or more parameters.  Each parameter of the one or more parameters may be set to a random value.  In some embodiments, the image classifier may maintain the aggregation system responsive to determining that a second classification result from the aggregation system for a second subset of tiles from a second biomedical image matches a second label for the second biomedical image.--, in [0009]-[001], and in [0116]-[0118], [0127], [0154], [0195] and [0213]), 
Fuchs however does not explicitly disclose a spatial-temporal recurrent attention target detection module,
Min teaches a spatial-temporal recurrent attention target detection module (see Min: e.g., -- CNNs are architectures that have succeeded particularly in image recognition and consist of convolution layers, non-linear layers and pooling layers. RNNs are designed to utilize sequential information of input data with cyclic connections among building blocks like perceptrons, long short-term memory units (LSTMs) [36, 37] or gated recur-rent units (GRUs) [19]. In addition, many other emergent deep learning architectures have been suggested, such as deep spatio-temporal neural networks (DST-NNs) [38], multi-dimensional recurrent neural networks (MD-RNNs).--, in pages 852-853, and, -- In anomaly classification [125–132], Roth et al.[125] applied CNNs to three different CT image datasets to classify sclerotic metastases, lymph nodes and colonic polyps. Additionally, Ciresan et al. [128] used CNNs to detect mitosis in breast cancer histopathology images, --, in pages 859-860, and, -- o interpretation through visualization, attention mechanisms designed to focus expli-citly on salient points and the mathematical rationale behind deep learning are being studied --, in pages 862-863);
Fuchs and Min are combinable as they are in the same field of endeavor:  using deep learning neural network and module to extract temporal and spatial features from medical images to detect particular cells such as lymph nodes. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Fuchs’s system using Min’s teachings by including a spatial-temporal recurrent attention target detection module to Fuchs’s a recurrent attention target detection module in order to classify and detect lymph nodes from medical images (see Min: e.g. in pages 852-853, and in pages 859-863);
Fuchs as modified by Min further disclose wherein a detection process includes the following steps:
Step 1, extracting a slice image block sequence (see Fuchs: e.g., -- Understanding what features the model uses to classify a tile is an important bottle-neck of current clinical applications of deep learning.  One can gain insight by visualizing a projection of the feature space in two dimensions using dimensionality reduction techniques such as PCA.  50 tiles were sampled from each test slide, in addition to its top-ranked tile… The model trained only with the weak MIL assumption was still able to extract features that embed visually.--, in [0090], [0106]-[0108], [0124]-[0126], and [0138]);
Fuchs as modified by Min however do not explicitly disclose marking position coordinate information for the obtained lymph node dcm-format file and a corresponding lymph node by use of the training sample extracting module, and extracting a CT slice image block sequence/; (i =1,2, ..., L), /, £ WxH , with CT slice image blocks being length    width W and height H for each lymph node by using a pydicom module in python,
BenTaieb teaches marking position coordinate information for the obtained lymph node dcm-format file and a corresponding lymph node by use of the training sample extracting module, and extracting a CT slice image block sequence/; (i =1,2, ..., L), /, £ WxH , with CT slice image blocks being length    width W and height H for each lymph node by using a pydicom module in python (see BenTaieb: e.g., -- The attention network is the recurrent component of the model and uses information from the glimpses and their corresponding location parameters to update its internal representation of the input and outputs the next location parameters. Figure 1 is a graphical representation of this sequential procedure.Spatial Attention: The spatial attention mechanism consists of extracting a glimpse xp from a tissue slide and is a modiﬁcation of the read mechanism introduced in [8]. Given an input tissue slide X ∈RH×W×3 of size H × W--, in pages 130-133);
Fuchs (as modified by Min) and BenTaieb are combinable as they are in the same field of endeavor:  using deep learning neural network and module to extract temporal and spatial features from medical images to detect particular cells such as lymph nodes. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Fuchs (as modified by Min)’s system using BenTaieb’s teachings by including marking position coordinate information for the obtained lymph node dcm-format file and a corresponding lymph node by use of the training sample extracting module, and extracting a CT slice image block sequence/; (i =1,2, ..., L), /, £ WxH , with CT slice image blocks being length    width W and height H for each lymph node by using a pydicom module in python to Fuchs’s extracting a slice image block sequence in order to identify a set of locations and sequentially predicts a class label {e.g., presence or absence of metastatic lymph nodes (see BenTaieb: e.g. in pages 852-853, and in pages 131-133);
Fuchs as modified by Min and BenTaieb further disclose extracting a high-level spatial feature map sequence corresponding to the CT slice image block sequence of each lymph node by using a VGG-16 model pre-trained by a natural image according to the deep layer feature extracting network and denoting the high-level spatial feature map sequence as {xo,..., XL} (see Fuchs: e.g., -- Each transform layer may be of a predefined size to generate the feature maps of a predefined size.  In some embodiments, the inference model 3212 may be a convolutional neural network (CNN) and a deep convolutional network (DCN), among others, with the set of transform layers.--, in [0170]; and, -- the aggregation model 3214 may be a recurrent neural network (RNN), an echo state network (ESN), a long/short term memory (LSTM) network, a deep residual network (DRN), and gated recurrent units (GRU), among others, with the set of transform layers.  For example, the aggregation model 3214 may be the recurrent neural network--, in [0195]; and, --At least five training runs were completed for each condition.  Minimum balanced error on the validation set for each run was used to decide the best condition in each experiment.  Briefly, ResNet34 achieved the best results over other architectures tested (AlexNet, VGG11, VGG16, ResNet18, ResNet101, DenseNet201); using a class-weighted loss led to better performance overall, and weights were adopted in the range of 0.8-0.95 in subsequent experiments--, in [0121], and also see BenTaieb: e.g., -- The attention network is the recurrent component of the model and uses information from the glimpses and their corresponding location parameters to update its internal representation of the input and outputs the next location parameters. Figure 1 is a graphical representation of this sequential procedure.Spatial Attention: The spatial attention mechanism consists of extracting a glimpse xp from a tissue slide and is a modiﬁcation of the read mechanism introduced in [8]. Given an input tissue slide X ∈RH×W×3 of size H × W--, in pages 130-133);
at step 3, constructing the feature embedding network to perform dimension reduction for the input high-level feature map sequence and outputting a feature map Ai (see Fuchs: e.g., --These models rely on a representation that is rich enough to obtain high slide classification accuracy on a held-out test set.  The representation learned can be inspected by visualizing a projection of the feature space in two dimensions using dimensionality reduction techniques, such as t-distributed stochastic neighbor embedding (t-SNE).  Hundred tiles were sampled from each test slide of the prostate dataset, in addition to its top-ranked tile, and extracted the final feature embedding before the classification layer.--, in [0124]; and, -- Each transform layer may be of a predefined size to generate the feature maps of a predefined size.  In some embodiments, the inference model 3212 may be a convolutional neural network (CNN) and a deep convolutional network (DCN), among others, with the set of transform layers.--, in [0170]; and, -- the aggregation model 3214 may be a recurrent neural network (RNN), an echo state network (ESN), a long/short term memory (LSTM) network, a deep residual network (DRN), and gated recurrent units (GRU), among others, with the set of transform layers.  For example, the aggregation model 3214 may be the recurrent neural network--, in [0195]); 
at step 4, constructing a spatial-temporal recurrent attention frame, and performing a spatial attention mechanism based on a recurrent neural network and the Gaussian Kernel Function to obtain a spatial attention result gS(t) (see BenTaieb: e.g., -- The attention network is the recurrent component of the model and uses information from the glimpses and their corresponding location parameters to update its internal representation of the input and outputs the next location parameters. Figure 1 is a graphical representation of this sequential procedure. Spatial Attention: The spatial attention mechanism consists of extracting a glimpse xp from a tissue slide and is a modiﬁcation of the read mechanism introduced in [8]. Given an input tissue slide X ∈RH×W×3 of size H × W, we apply two grids (one for each axis of the image) of two-dimensional Gaussian ﬁlters, where each ﬁlter response corresponds to a pixel in the resulting glimpse xp ∈Rh×w×3 of size h × w. The attention mechanism is represented by parameters l = {μw,μh,σ2w,σh2,δw,δh} that describe the centers of the Gaussians (i.e. the grid center coordinates), their variances (i.e. amount of blurring to apply), and strides between the Gaussian centers (i.e. the scale of the glimpse). … the Gaussian grid matrices applied on each axis of the original mage X. To integrate the entire context of a given tissue slide, we initialize the ﬁrst location parameters l0 such that the resulting glimpse x0 corresponds to a coarse representation of the tissue slide (i.e. lowest magniﬁcation) re-sized to the desired glimpse size h × w--, in pages 130-133); 
at step 5, performing a temporal attention mechanism for the spatial attention result gS(t)  obtained at step 4 to obtain a spatial-temporal attention feature g^(t) (see Fuchs: e.g., -- the aggregation model 3214 may have internal state memory, and may exhibit temporally or sequentially dynamic behavior.  In this manner, information may be integrated across the selected tiles 3238 from the inference model 3212 to determine the classification result for the overall biomedical image 3232.--. In [0195]-[0196], also see Min: e.g., DST-NNs …. The key aspect of the structure, progressive refinement, con-siders local correlations and is performed via input feature compositions in each layer: spatial features and temporal features. Spatial features refer to the original inputs for the whole DST-NN and are used identically in every layer. However, temporal features are gradually altered so as to progress to the upper layers--, in pages 856-857);
at step 6, predicting a lymph node positive score yt of the recurrent attention iteration step by using the recurrent neural network constructed at step 4 in combination with the spatial-temporal attention feature g^(t) obtained at step 5 (see Fuchs: e.g., --To overcome this bottleneck, a dataset including 44,732 whole slides from 15,187 patients was gathered across three different cancer types.  Proposed is a novel deep-learning system under the multiple instance learning (MIL) assumption, where only the overall slide diagnosis is necessary for training, thus avoiding all the expensive pixel-wise annotations that are usually part of supervised learning.  The proposed method works at scale and requires no dataset curation at any stage.  This framework was evaluated on prostate cancer, basal cell carcinoma (BCC) and breast cancer metastases to axillary lymph nodes.  It is demonstrated that classification performance with area under the curve (AUC) above 0.98 for all cancer types.--, in [0109]-[0114]; and, -- the aggregation model 3214 may have internal state memory, and may exhibit temporally or sequentially dynamic behavior.  In this manner, information may be integrated across the selected tiles 3238 from the inference model 3212 to determine the classification result for the overall biomedical image 3232.--. In [0195]-[0196], and, --The MIL assumption in the context of WSI classification states that for negative slides, all its tiles are of negative class; for positive slides, there must exist one or more positive tiles, sometimes also referred to as discriminant tiles.  The MIL assumption can be applied to deep learning as follows: given a model that predicts the probability of being class positive for a small tile, a full inference pass through the dataset is performed.  Within each slide, the tiles are ranked according to their probability of being positive.  The top most probable tiles for each slide are then used for training the model (FIG. 19).  The top-ranking tiles from positive slides should have a probability of being positive close to 1.  Conversely, top-ranking tiles from negative slides should have a probability of being positive close to 0.  Hence, the model can be trained on the top-ranking tiles using a standard cross-entropy loss by assigning the slide level target to its respective tile.  At prediction time, the MIL assumption determines that if one positive tile is found, the slide is predicted positive.  An in-depth description is given in the Methods section.--, in [0120]; also see BenTaieb: e.g., --Comparing the attended areas to the ground truth masks of metastatic tissues (columns 3 and 2 respectively) shows that the attention mechanism is able to identify discriminative patterns and solely focus on those regions. The last column in Fig. 2 shows glimpses with the highest prediction score for each WSI class and demonstrates that the system learns patterns from diﬀerent scales. The last row in Fig. 2 shows a failure example on a challenging case of micro-metastases. In this case, the model was correctly able to identify discriminative patterns (the yellow overlay on images of column 3 shows the attention areas used to predict the slide label) but unable to predict the correct slide level class.--, in page 136; also see: --We tested the performance of the system using diﬀerent numbers of glimpses (i.e., 1, 3 or 5 glimpses per tile). On average, after background removal, we obtain ∼14 tiles per tissue slide. Thus, the ﬁnal performance results reported in Table 1 correspond to an aggregation of 14 (case of 1 glimpse per tile) to 70 glimpses. In contrast, all other automatic systems were trained with thousands of patches. We obtained best results using 3 glimpses (i.e., 85% AUC vs 68%and 83% for 1 and 5 glimpses when training with Lc only). We also observed that using 1 glimpse (i.e., 14 attention patches per slide) resulted in a 4% drop in AUC only. Note that this is most likely speciﬁc to this particular dataset in which macro-metastatic tissues contain large amounts of abnormality and are thus easily discriminated from benign tissues. However, this also shows the utility of identifying discriminative locations when training prediction systems. We also tested the impact of the diﬀerent loss terms in Eq. (1). In general, the patch-level loss Lp resulted in improving the attention on positive cases which is reﬂected by the improved recall scores (i.e., from 64% to 78% with 3 glimpses). Finally, adding the attention regularization terms La and Ll primarily helped facilitate convergence (i.e. reduced the convergence time by ∼15%) and improved the ﬁnal AUC, precision and recall….Comparing the attended areas to the ground truth masks of metastatic tissues (columns 3 and 2 respectively) shows that the attention mechanism is able to identify discriminative patterns and solely focus on those regions. The last column in Fig. 2 shows glimpses with the highest prediction score for each WSI class and demonstrates that the system learns patterns from diﬀerent scales. The last row in Fig. 2 shows a failure example on a challenging case of micro-metastases. In this case, the model was correctly able to identify discriminative patterns (the yellow overlay on images of column 3 shows the attention areas used to predict the slide label)--, in page 135, and further see Min: e.g., DST-NNs …. The key aspect of the structure, progressive refinement, con-siders local correlations and is performed via input feature compositions in each layer: spatial features and temporal features. Spatial features refer to the original inputs for the whole DST-NN and are used identically in every layer. However, temporal features are gradually altered so as to progress to the upper layers--, in pages 856-857);
at step 7, constructing a loss function of the model to perform steps 4-6 for T times, and performing supervised training for the model by using a gradient back propagation algorithm (see Fuchs: e.g., --One can gain insight by visualizing a projection of the feature space in two dimensions using dimensionality reduction techniques such as PCA.  50 tiles were sampled from each test slide, in addition to its top-ranked tile, and extracted the final feature embedding before the classification layer.  Shown in FIG. 17A are the results of the ResNet34 model.  From the 2D projection, a clear decision boundary between positively and negatively classified tiles can be seen.  Interestingly, most of the points are clustered at the top left region where tiles are rarely top-ranked in a slide.  By observing examples in this region of the PCA space, it can be determined that they are tiles containing stroma.  Tiles containing glands extend along the second principal component axis, where there is a clear separation between benign and malignant glands.--, in [0090], [0124], [0154] and [0158]; and, --Given the unbalanced frequency of classes, weights w.sub.0 and w.sub.1, for negative and positive classes, respectively, can be used to give more importance to the underrepresented examples.  The final loss is the weighted average of the losses over a mini-batch.  Minimization of the loss is achieved via stochastic gradient descent (SGD)--, in [0073], [0076], [0150] and [0203]; and see Min: e.g., --To minimize the training error, the backward pass uses the chain rule to back-propagate error signals and compute gradients with respect to all weights throughout the neural network [46]. Finally, the weight parameters are updated using optimization algorithms based on stochastic gradient descent (SGD)--, in page 853; and further see BenTaieb: e.g., --The recurrent component of the system aggregates information extracted from all individual glimpses and their corresponding locations. It receives as input the joint spatial and appearance representation (i.e. gp) and maintains an internal state summarizing information extracted from the sequence of past glimpses. At each step p, the recurrent attention network updates its internal state (formed by the hidden units of the network) based on the incoming feature representation gp and outputs a prediction for the next location lp+1 to focus on at time step p + 1. The spatial attention parameters lp are formed as a linear function of the internal state of the network. Objective Function: The system is trained by minimizing a loss function com-prised of a classiﬁcation loss term and auxiliary regularization terms that guide the attention mechanism.--, in pages 132-133); 
at step 8, performing iterative training for the model by repeating steps 3-7, until a trained model is obtained at the end of training (see Fuchs: e.g., --the image classifier may apply a third subset of tiles from a plurality of tiles for a third biomedical image of the plurality of biomedical images to an aggregation system to train the aggregation system based on a comparison on a label of the third biomedical image with a classification result from applying the aggregation system to third subset.--, in [0007], and [0015]; and, --the model corrector 3222 may change the number of transform layers in the aggregation model 3214 using the error measure.  In modifying the parameters, the model corrector 3222 may perform regularization on the set of transform layers in the inference model 3212.  The regularization may include, for example, dropout, drop connect, stochastic pooling, or max pooling, among others.  In some embodiments, the model corrector 3222 may modify the aggregation model 3214 using the error measures in accordance with an iterative optimization algorithm, such as a gradient descent or stochastic gradient descent.--, in [0202]-[0204]);
at step 9, inputting the lymph node CT sequence to be detected to perform a model reasoning process, and taking a positive score yT output by the final recurrent attention as a CT lymph node detection result (see Fuchs: e.g., --To overcome this bottleneck, a dataset including 44,732 whole slides from 15,187 patients was gathered across three different cancer types.  Proposed is a novel deep-learning system under the multiple instance learning (MIL) assumption, where only the overall slide diagnosis is necessary for training, thus avoiding all the expensive pixel-wise annotations that are usually part of supervised learning.  The proposed method works at scale and requires no dataset curation at any stage.  This framework was evaluated on prostate cancer, basal cell carcinoma (BCC) and breast cancer metastases to axillary lymph nodes.  It is demonstrated that classification performance with area under the curve (AUC) above 0.98 for all cancer types.--, in [0109]-[0114]; and, -- the aggregation model 3214 may have internal state memory, and may exhibit temporally or sequentially dynamic behavior.  In this manner, information may be integrated across the selected tiles 3238 from the inference model 3212 to determine the classification result for the overall biomedical image 3232.--. In [0195]-[0196], and, --The MIL assumption in the context of WSI classification states that for negative slides, all its tiles are of negative class; for positive slides, there must exist one or more positive tiles, sometimes also referred to as discriminant tiles.  The MIL assumption can be applied to deep learning as follows: given a model that predicts the probability of being class positive for a small tile, a full inference pass through the dataset is performed.  Within each slide, the tiles are ranked according to their probability of being positive.  The top most probable tiles for each slide are then used for training the model (FIG. 19).  The top-ranking tiles from positive slides should have a probability of being positive close to 1.  Conversely, top-ranking tiles from negative slides should have a probability of being positive close to 0.  Hence, the model can be trained on the top-ranking tiles using a standard cross-entropy loss by assigning the slide level target to its respective tile.  At prediction time, the MIL assumption determines that if one positive tile is found, the slide is predicted positive.  An in-depth description is given in the Methods section.--, in [0120]; also see BenTaieb: e.g., --Comparing the attended areas to the ground truth masks of metastatic tissues (columns 3 and 2 respectively) shows that the attention mechanism is able to identify discriminative patterns and solely focus on those regions. The last column in Fig. 2 shows glimpses with the highest prediction score for each WSI class and demonstrates that the system learns patterns from diﬀerent scales. The last row in Fig. 2 shows a failure example on a challenging case of micro-metastases. In this case, the model was correctly able to identify discriminative patterns (the yellow overlay on images of column 3 shows the attention areas used to predict the slide label) but unable to predict the correct slide level class.--, in page 136; also see: --We tested the performance of the system using diﬀerent numbers of glimpses (i.e., 1, 3 or 5 glimpses per tile). On average, after background removal, we obtain ∼14 tiles per tissue slide. Thus, the ﬁnal performance results reported in Table 1 correspond to an aggregation of 14 (case of 1 glimpse per tile) to 70 glimpses. In contrast, all other automatic systems were trained with thousands of patches. We obtained best results using 3 glimpses (i.e., 85% AUC vs 68%and 83% for 1 and 5 glimpses when training with Lc only). We also observed that using 1 glimpse (i.e., 14 attention patches per slide) resulted in a 4% drop in AUC only. Note that this is most likely speciﬁc to this particular dataset in which macro-metastatic tissues contain large amounts of abnormality and are thus easily discriminated from benign tissues. However, this also shows the utility of identifying discriminative locations when training prediction systems. We also tested the impact of the diﬀerent loss terms in Eq. (1). In general, the patch-level loss Lp resulted in improving the attention on positive cases which is reﬂected by the improved recall scores (i.e., from 64% to 78% with 3 glimpses). Finally, adding the attention regularization terms La and Ll primarily helped facilitate convergence (i.e. reduced the convergence time by ∼15%) and improved the ﬁnal AUC, precision and recall….Comparing the attended areas to the ground truth masks of metastatic tissues (columns 3 and 2 respectively) shows that the attention mechanism is able to identify discriminative patterns and solely focus on those regions. The last column in Fig. 2 shows glimpses with the highest prediction score for each WSI class and demonstrates that the system learns patterns from diﬀerent scales. The last row in Fig. 2 shows a failure example on a challenging case of micro-metastases. In this case, the model was correctly able to identify discriminative patterns (the yellow overlay on images of column 3 shows the attention areas used to predict the slide label)--, in page 135, and further see Min: e.g., DST-NNs …. The key aspect of the structure, progressive refinement, con-siders local correlations and is performed via input feature compositions in each layer: spatial features and temporal features. Spatial features refer to the original inputs for the whole DST-NN and are used identically in every layer. However, temporal features are gradually altered so as to progress to the upper layers--, in pages 856-857).

Re Claim 2, Fuchs as modified by Min and BenTaieb further disclose The CT lymph node detection system according to claim 1, wherein the step 4 specifically comprises the following steps:
at step 4.1, constructing a long short-term memory network (LSTM) of two layers (see Fuchs: e.g., -- Each transform layer may include at least one of the one or more parameters to convert the set of tiles 3238 to a set of feature maps and to determine the classification result for the entire biomedical image 3232.  Each transform layer may be of a predefined size to generate the feature maps of a predefined size.  In some embodiments, the aggregation model 3214 may be a recurrent neural network (RNN), an echo state network (ESN), a long/short term memory (LSTM) network, a deep residual network (DRN), and gated recurrent units (GRU), among others, with the set of transform layers.--, in [0195]);
at step 4.2, initializing the state of the long short-term memory network by constructing an encoding process of the feature map (see Fuchs: e.g., -- Section B describes systems and methods of using two-dimensional slicing in training an encoder-decoder model for reconstructing biomedical images and applying the encoder-decoder model to reconstruct biomedical images.--, in [0060]; also see BenTaieb: e.g., -- to improve patch-based representations. Mainly, these works present diﬀerent aggregation strategies and encode global context. For instance, weakly-supervised models based on multiple instance learning [7] or structured latent representations [3] have been proposed to show the importance of identifying discriminative regions when training a prediction model.--, in page 130);
at step 4.3, predicting a spatial attention position within a range of the feature map by using a sending network (see Min: e.g., Fig. 9, and, -- Spatial features refer to the original inputs for the whole DST-NN and are used identically in every layer. However, tem-poral features are gradually altered so as to progress to the upper layers. Except for the first layer, to compute each hidden unit in the current layer, only the adjacent hidden units of the same coordinate in the layer below are used so that local correl-ations are reflected progressively. MD-RNNs [39] are designed to apply the capabilities of RNNs to non-sequential multi-dimensional data by treating them as groups of sequential data. For instance, two-dimensional data are treated as groups of horizontal and vertical sequence data. Similar to BRNNs which use contexts in both directions in one-dimensional data, MD-RNNs use contexts in all possible direc-tions in the multi-dimensional data (Figure 10). In the example of a two-dimensional dataset, four contexts that vary with the order of data processing are reflected in the computation of four hidden units for each position in the hidden layer. The hidden units are connected to a single output layer--, in page 857; similarly, also see Fuchs: e.g., -- Model Training: The model is a function f.sub..theta.  with current parameters .theta.  that maps input tiles b.sub.i,j to class probabilities for "negative" and "positive" classes.  Given bags B a list of vectors O=[o.sub.i,: i=1, 2, .  . . , n] was obtained, one for each slide s.sub.i containing the probabilities of class "positive" for each tile b.sub.i,j: j=1, 2, .  . . , m in B.sub.s.sub.i.  The index k.sub.i of the tile was obtained within each slide which shows the highest probability of being "positive" k.sub.i=argmax(o.sub.i).  The highest ranking tile in bag B.sub.s.sub.i is then b.sub.i,k.  The output of the network [tilde over (y)].sub.i=f.sub..theta.(b.sub.i,k) can be compared to y.sub.i, the target of slide s.sub.i, thorough the cross-entropy loss--, in [0075]);
at step 4.4, constructing an attention matrix based on a two-dimension Gaussian Kernel Function (see BenTaieb: e.g., -- The attention network is the recurrent component of the model and uses information from the glimpses and their corresponding location parameters to update its internal representation of the input and outputs the next location parameters. Figure 1 is a graphical representation of this sequential procedure. Spatial Attention: The spatial attention mechanism consists of extracting a glimpse xp from a tissue slide and is a modiﬁcation of the read mechanism introduced in [8]. Given an input tissue slide X ∈RH×W×3 of size H × W, we apply two grids (one for each axis of the image) of two-dimensional Gaussian ﬁlters, where each ﬁlter response corresponds to a pixel in the resulting glimpse xp ∈Rh×w×3 of size h × w. The attention mechanism is represented by parameters l = {μw,μh,σ2w,σh2,δw,δh} that describe the centers of the Gaussians (i.e. the grid center coordinates), their variances (i.e. amount of blurring to apply), and strides between the Gaussian centers (i.e. the scale of the glimpse). … the Gaussian grid matrices applied on each axis of the original mage X. To integrate the entire context of a given tissue slide, we initialize the ﬁrst location parameters l0 such that the resulting glimpse x0 corresponds to a coarse representation of the tissue slide (i.e. lowest magniﬁcation) re-sized to the desired glimpse size h × w--, in pages 130-133); and
at step 4.5,    is multiplied by A, element by element and then added up so as to obtain the spatial attention result gS(t) (see BenTaieb: e.g., Fig. 1, and caption contains: --The model includes three primary components composed of dense (rectangular boxes) or convolutional (trape-zoid) layers. X is an input whole slide image, {x0,...,xP } is the sequence of glimpses with their corresponding location parameters {l0,...,lp}. The system contains three main components parameterized by θx, θl and θa. 􀀁 represents the Hadamard product and  is a matrix multiplication. The model sequentially predicts a class label yˆ for the tissue slide given the sequence of glimpses.--, in pages 131-132).

Re Claim 3, claim 3 is rejected as see above discussions with regarding to claim 2, and further (see BenTaieb: e.g., --We denote the appearance-based features obtained for a given glimpse by fx(xp; θx) and the features computed for the corresponding location parame-ters by fl(lp; θl). We used a CNN to represent fx and a fully connected layer for fl. The outputs of both networks are fused to obtain a joint representation that captures spatial and appearance features using gp = σ(fl(lp; θl) 􀀁 fx(xp; θx)), where gp is the output joint feature vector, σ corresponds to the logistic sigmoid function, and 􀀁 is the Hadamard product. By combining appearance and spatial features, the system integrates features related to “where” and“what” patterns to seek for when predicting the next glimpse location parameters.
Recurrent Attention: The recurrent component of the system aggregates information extracted from all individual glimpses and their corresponding locations. It receives as input the joint spatial and appearance representation (i.e. gp) and maintains an internal state summarizing information extracted from the sequence of past glimpses. At each step p, the recurrent attention network updates its internal state (formed by the hidden units of the network) based on the incoming feature representation gp and outputs a prediction for the next location lp+1 to focus on at time step p + 1. The spatial attention parameters lp are formed as a linear function of the internal state of the network.--, in page 132; also see Min: e.g., Fig. 4, and, --The basic structure of DNNs consists of an input layer, multiple hidden layers and an output layer (Figure 4). Once input data are given to the DNNs, output values are computed sequentially along the layers of the network. At each layer, the input vector comprising the output values of each unit in the layer below is multiplied by the weight vector for each unit in the current layer to produce the weighted sum.--, in pages 854-855).

Re Claim 4, Fuchs as modified by Min and BenTaieb further disclose wherein the step 4.2 specifically comprises the following steps:
at step 4.2.1, constructing a new double-layer long short-term memory network having the same structure as formula (1) (see Fuchs: e.g., -- Each transform layer may include at least one of the one or more parameters to convert the set of tiles 3238 to a set of feature maps and to determine the classification result for the entire biomedical image 3232.  Each transform layer may be of a predefined size to generate the feature maps of a predefined size.  In some embodiments, the aggregation model 3214 may be a recurrent neural network (RNN), an echo state network (ESN), a long/short term memory (LSTM) network, a deep residual network (DRN), and gated recurrent units (GRU), among others, with the set of transform layers.--, in [0195]; also see Min: e.g., Fig. 4, and, Fig. 5, and, --Figure 5. Unsupervised layer-wise pre-training process in SAE and DBN [29]. First, weight vector W1 is trained between input units x and hidden units h1 in the ﬁrst hid-den layer as an RBM or AE. After the W1 is trained, another hidden layer is stacked, and the obtained representations in h1 are used to train W2 between hidden units h1 and h2 as another RBM or AE. The process is repeated for the desired number of layers--, in pages 855-856);
at step 4.2.2, dividing the feature map Amid corresponding to the exact center of a CT slice sequence of each lymph node at step 3 according to a spatial neighborhood; specifically, dividing 8x8x200 into 16 sub-feature blocks with 2x2x200 based on adjacent four positions as one group (see Fuchs: e.g., --a tiling method was developed to extract tiles containing tissue from both inside and outside the annotated regions at MSK's 20.times.  equivalent magnification (0.5 .mu.m/pixel) to enable direct comparison with the datasets.  The method generates a grid of possible tiles--, in [0154]-[0155]; also see BenTaieb: e.g., Fig. 2, and, --processing images at the intermediate 20x magniﬁcation using tiles covering as much context as possible. A tile size of 5000 × 5000 pixels (Fig. 2) was the largest we could process. To predict a class label for a slide--, in pages 134-135, and, -- The attention network is the recurrent component of the model and uses information from the glimpses and their corresponding location parameters to update its internal representation of the input and outputs the next location parameters. Figure 1 is a graphical representation of this sequential procedure. Spatial Attention: The spatial attention mechanism consists of extracting a glimpse xp from a tissue slide and is a modiﬁcation of the read mechanism introduced in [8]. Given an input tissue slide X ∈RH×W×3 of size H × W, we apply two grids (one for each axis of the image) of two-dimensional Gaussian ﬁlters, where each ﬁlter response corresponds to a pixel in the resulting glimpse xp ∈Rh×w×3 of size h × w. The attention mechanism is represented by parameters l = {μw,μh,σ2w,σh2,δw,δh} that describe the centers of the Gaussians (i.e. the grid center coordinates), their variances (i.e. amount of blurring to apply), and strides between the Gaussian centers (i.e. the scale of the glimpse). … the Gaussian grid matrices applied on each axis of the original mage X. To integrate the entire context of a given tissue slide, we initialize the ﬁrst location parameters l0 such that the resulting glimpse x0 corresponds to a coarse representation of the tissue slide (i.e. lowest magniﬁcation) re-sized to the desired glimpse size h × w--, in pages 130-133);
at step 4.2.3, inputting the 16 sub-feature blocks into the new double-layer long short-term memory network sequentially clockwise from outside to inside to go through 16 cycles and obtain a cell state c'j2) corresponding to the second layer of the LSTM at the last moment so as to initialize the cell state cj,2)of the second layer of the long short-term memory network at step 4.1 (see  Fuchs: e.g., --a tiling method was developed to extract tiles containing tissue from both inside and outside the annotated regions at MSK's 20.times.  equivalent magnification (0.5 .mu.m/pixel) to enable direct comparison with the datasets.  The method generates a grid of possible tiles--, in [0154]-[0155]; also see BenTaieb: e.g., Fig. 2, and, --processing images at the intermediate 20x magniﬁcation using tiles covering as much context as possible. A tile size of 5000 × 5000 pixels (Fig. 2) was the largest we could process. To predict a class label for a slide--, in pages 134-135).

Re Claim 5, Fuchs as modified by Min and BenTaieb further disclose The CT lymph node detection system according to claim 2, wherein the step 4.3 specifically comprises the following steps:
at step 4.3.1, concatenating a feature vector ht(2) output by the first hidden layer of the long short-term memory network and a feature result gs(t)center corresponding to the center of the slice sequence in the recurrent attention iteration step to obtain [ht(2), gs(t)center] (see BenTaieb: e.g., -- The attention network is the recurrent component of the model and uses information from the glimpses and their corresponding location parameters to update its internal representation of the input and outputs the next location parameters. Figure 1 is a graphical representation of this sequential procedure. Spatial Attention: The spatial attention mechanism consists of extracting a glimpse xp from a tissue slide and is a modiﬁcation of the read mechanism introduced in [8]. Given an input tissue slide X ∈RH×W×3 of size H × W, we apply two grids (one for each axis of the image) of two-dimensional Gaussian ﬁlters, where each ﬁlter response corresponds to a pixel in the resulting glimpse xp ∈Rh×w×3 of size h × w. The attention mechanism is represented by parameters l = {μw,μh,σ2w,σh2,δw,δh} that describe the centers of the Gaussians (i.e. the grid center coordinates), their variances (i.e. amount of blurring to apply), and strides between the Gaussian centers (i.e. the scale of the glimpse). … the Gaussian grid matrices applied on each axis of the original mage X. To integrate the entire context of a given tissue slide, we initialize the ﬁrst location parameters l0 such that the resulting glimpse x0 corresponds to a coarse representation of the tissue slide (i.e. lowest magniﬁcation) re-sized to the desired glimpse size h × w--, in pages 130-133);
At step 4.3.2, inputting [ht(2), gs(t)center] to a sending network composed of one fully-connected layer to perform regression for the spatial attention position of the next recurrent iteration step   (see Fuchs: e.g., --Given a model f trained at a particular resolution, and a WSI, a heat-map of tumor probability can be obtained over the slide.  Several features can then be extracted from the heat-map to train a slide aggregation model.  For example, one approach used the count of tiles in each class to train a logistic regression model.  Here, that approach was extended by adding several global and local features and train a random forest to emit a slide diagnosis.  The features extracted are: 1) total count of tiles with probability &gt;=0.5; 2-11) 10-bin histogram of tile probability; 22-30) count of connected components for a probability threshold of 0.1 of size in ranges 1-10, 11-15, 16-20, 21-25, 26-30, 31-40, 41-50, 51-60, 61-70 and &gt;70 respectively; 31-40) 10-bin local histogram with window size 3.times.3 aggregated by max-pooling; 41-50) 10-bin local histogram with window size 3.times.3 aggregated by averaging;--, in [0153]; also see --establishing the aggregation system may include initializing the aggregation system comprising a recurrent neural network.  The recurrent neural network may have one or more parameters.  Each parameter of the one or more parameters may be set to a random value.  In some embodiments, the image classifier may maintain the aggregation system responsive to determining that a second classification result from the aggregation system for a second subset of tiles from a second biomedical image matches a second label for the second biomedical image.--, in [0009]-[001], and in [0116]-[0118], [0127], [0154], [0195] and [0213]; and further see BenTaieb: e.g., -- The attention network is the recurrent component of the model and uses information from the glimpses and their corresponding location parameters to update its internal representation of the input and outputs the next location parameters. Figure 1 is a graphical representation of this sequential procedure.Spatial Attention: The spatial attention mechanism consists of extracting a glimpse xp from a tissue slide and is a modiﬁcation of the read mechanism introduced in [8]. Given an input tissue slide X ∈RH×W×3 of size H × W--, in pages 130-133).

Re Claim 7, Fuchs as modified by Min and BenTaieb further disclose wherein the step 5 specifically comprises the following steps:
at step 5.1, constructing a mixture density network to predict an attention position pp of a slice direction (see BenTaieb: e.g., -- The attention network is the recurrent component of the model and uses information from the glimpses and their corresponding location parameters to update its internal representation of the input and outputs the next location parameters. Figure 1 is a graphical representation of this sequential procedure.Spatial Attention: The spatial attention mechanism consists of extracting a glimpse xp from a tissue slide and is a modiﬁcation of the read mechanism introduced in [8]. Given an input tissue slide X ∈RH×W×3 of size H × W--, in pages 130-133), 
at step 5.2, obtaining an attention weight vector l’(t) based on Gaussian Mixture Distribution (see Fuchs: e.g., -- We have described models trained with the weak supervisory signal coming from the MIL assumption.  These models rely on a representation that is rich enough to obtain high slide classification accuracy on a held-out test set.  The representation learned can be inspected by visualizing a projection of the feature space in two dimensions using dimensionality reduction techniques, such as t-distributed stochastic neighbor embedding (t-SNE). Hundred tiles were sampled from each test slide of the prostate dataset, in addition to its top-ranked tile, and extracted the final feature embedding before the classification layer.  --, in [0124]-[0125]; -- Each transform layer may be of a predefined size to generate the feature maps of a predefined size.  In some embodiments, the inference model 3212 may be a convolutional neural network (CNN) and a deep convolutional network (DCN), among others, with the set of transform layers.--, in [0170]; and, -- the aggregation model 3214 may be a recurrent neural network (RNN), an echo state network (ESN), a long/short term memory (LSTM) network, a deep residual network (DRN), and gated recurrent units (GRU), among others, with the set of transform layers.  For example, the aggregation model 3214 may be the recurrent neural network--, in [0195]; and, --At least five training runs were completed for each condition.  Minimum balanced error on the validation set for each run was used to decide the best condition in each experiment.  Briefly, ResNet34 achieved the best results over other architectures tested (AlexNet, VGG11, VGG16, ResNet18, ResNet101, DenseNet201); using a class-weighted loss led to better performance overall, and weights were adopted in the range of 0.8-0.95 in subsequent experiments--, in [0121]; also see Min: e.g., Fig. 4, and, Fig. 5, and, --Figure 5. Unsupervised layer-wise pre-training process in SAE and DBN [29]. First, weight vector W1 is trained between input units x and hidden units h1 in the ﬁrst hid-den layer as an RBM or AE. After the W1 is trained, another hidden layer is stacked, and the obtained representations in h1 are used to train W2 between hidden units h1 and h2 as another RBM or AE. The process is repeated for the desired number of layers--, in pages 855-856); and
at step 5.3, multiplying l’(t) by the input feature gs(t) element by element and
performing addition to obtain the spatial-temporal attention feature g^(t) ((see BenTaieb: e.g., Fig. 1, and caption contains: --The model includes three primary components composed of dense (rectangular boxes) or convolutional (trape-zoid) layers. X is an input whole slide image, {x0,...,xP } is the sequence of glimpses with their corresponding location parameters {l0,...,lp}. The system contains three main components parameterized by θx, θl and θa. 􀀁 represents the Hadamard product and   is a matrix multiplication. The model sequentially predicts a class label yˆ for the tissue slide given the sequence of glimpses.--, in pages 131-132).


Claims 6 is rejected under 35 U.S.C. 103 as being unpatentable over Fuchs as modified by Min and BenTaieb, and further in view of Kooi (“Classifying symmetrical differences and temporal change for the detection of malignant masses in mammography using deep neural networks”, J Med Imaging (Bellingham). 2017 Oct; 4(4): 044501., pages 1-21).
Re Claim 6, Fuchs as modified by Min and BenTaieb however do not explicitly disclose applying a softmax function,
Kooi teaches matrix is constructed based on the two-dimension Gaussian Kernel Function and softmax (see Kooi: e.g., -- It employs five features based on first- and second-order Gaussian kernels, two designed to spot the center of a focal mass and two looking for spiculation patterns, characteristic of malignant lesions. A final feature indicates the size of optimal response in scale-space.--, in page 4, and, -- we introduce a form of data augmentation by mapping each location in the image in question to 64 different points in the comparison mammogram by sampling the location from a Gaussian with zero mean and 10 pixel standard deviation.--, … onvolutional layers are generally alternated with pooling layers that subsample the resulting feature maps, generating some translation invariance and reducing the dimensionality as information flows through the architecture. After these layers, the final tensor of feature maps is flattened to a vector xl and several fully connected layers are typically added, where weights are no longer shared The posterior distribution over a class variable y , given input patch X is acquired by feeding the last level of activations x to either a logistic sigmoid for single class or a softmax function for multiclass…. The parameters in the network are generally learned using maximum likelihood estimation or maximum a-posteriori, when employing regularization and default backpropagation. Increasing depth up to some point seems to improve efficiency and reduce the amount of parameters that need to be learned, without sacrificing performance or even increases overall performance.55–57 The gradient of the error of each training sample is dispersed among parameters in every layer during backpropagation and hence becomes smaller (or in rare cases explodes), which is referred to as the fading gradient problem.--, in page 9); 
Fuchs (as modified by Min and BenTaieb) and Kooi are combinable as they are in the same field of endeavor:  using deep learning neural network and module to extract temporal and spatial features from medical images to detect particular cells such as lymph nodes. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Fuchs (as modified by Min and BenTaieb)’s system using Kooi’s teachings by including matrix is constructed based on the two-dimension Gaussian Kernel Function and softmax to Fuchs (as modified by Min and BenTaieb)’s feature map and thereof classificaton in order to acquire the posterior distribution over a class variable y given input patch X  (see Kooi: e.g. in pages 9-10).


Allowable Subject Matter
Claims 8-10 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.



















Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WEI WEN YANG whose telephone number is (571)270-5670.  The examiner can normally be reached on 8:00 - 5:00 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on 571-272-7778.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 

If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/WEI WEN YANG/Primary Examiner, Art Unit 2667