DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitations are: 
a generator (G)…; a feature extractor…; a discriminator (D)…; a label encoder…;  and a keywords reconstructor… [Claims 1 and 6].

Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-10 are rejected under 35 U.S.C. 103 as being unpatentable over Comaniciu et al., International Publication Number WO 2018/192672 [hereinafter Comaniciu] in view of Malhotra et al., U.S. Publication No. 2018/0211010 [hereinafter Malhotra].

Regarding Claim 1, Comaniciu discloses …A system to classify a plurality of clinical records into International Classification of Diseases (ICD) codes, the system comprising: one or more processor(s) (Comaniciu, ¶ 7, a medical image system comprises a non-transitory, machine readable storage medium storing program instructions and medical image data; and a programmed processor coupled to the storage medium. The programmed processor is configured by the program instructions for: inputting medical image data to a variational autoencoder configured to reduce a dimensionality of the medical image data to a latent space having one or more latent variables with latent variable values, such that the latent variable values corresponding to an image with no tissue of a target type fit within one or more clusters of the values of the latent variables
and 5a memory communicatively coupled to the processor(s), wherein the memory stores instructions executed by the processor, wherein the memory comprising: a generator (G) to generate one or more synthetic features… (Id., ¶ 38, Computer system 103 may also include a main memory 104 (e.g., a random access memory (RAM)), and a secondary memory 108. The main memory 104 and/or the secondary memory 108 comprise a dynamic random access memory (DRAM)… The removable storage unit 116 may include a computer readable storage medium having tangibly stored therein (or embodied thereon) data and/or computer software instructions, e.g., for causing the processor(s) to perform various operations), (Id., ¶ 85, To compensate for the discrepancies in protocols between scanners, institutions, vendors, and/or models, system 800 uses adversarial training to create features for classification (discloses synthetic features) that are robust and nearly invariant to protocols. That is, the system 800 can classify a given input image as normal or abnormal regardless of the acquisition protocol. The configuration in FIG. 8A branches out a discriminator network (also referred to as "discriminator") 840 from the latent space. The discriminator 840 can identify the protocol being used (or the institution the image comes from, or the vendor and/or model). The training is performed using at least two different batches of data. The first batch of data trains a generative network (also referred to as a generator) 830 (discloses generator) for the normal/abnormal classification, using labeled (normal and abnormal) images for supervised learning. All of the training images in the first batch are acquired using the same protocol as each other (e.g., referred to below as "domain A"));

    PNG
    media_image1.png
    313
    566
    media_image1.png
    Greyscale

a feature extractor to extract one or more real latent features from a plurality of clinical documents and generates one or more real features by training a plurality of generative adversarial networks (GANs), wherein the generator (G) generates synthesized features after the GANs are trained and calibrate a binary code classifier with the real latent features generated by the feature extractor… wherein the generator (G) generates one or more code-specific latent features conditioned on… by using a Wasserstein GAN with gradient penalty (WGAN-GP), wherein the Wasserstein GAN with gradient penalty (WGAN-GP) generates a latent feature vector (f) (Id., ¶ 158, In this method, a generator as described in FIGS. 4 and 9 learns to classify images as normal or abnormal and generates synthesized images. The discriminator (not shown) used for FIG. 16A differs from the discriminator of FIG. 9. Whereas the discriminator of FIG. 9 (discloses feature extractor) tries to determine whether latent variable (discloses latent features) values correspond to images acquired from Domain A or Domain B (discloses binary code classifier); the discriminator of FIG. 16A is trained to classify images as either real input images or synthesized images (from the generator)), (Id., ¶ 121, FIG. 10-13 show the configuration of the GAN 800 (discloses generative adversarial networks) during the training and test phases. The first batch of training input images 1001 are input to the encoder 880 of the generator 830. The first batch of training input images 1001 include both images with normal (not novel) tissue and images with abnormal (novel) tissues. The images are labeled to indicate whether the images are normal. All of the first batch of images are from a single domain (e.g., domain A)), (Id., ¶ 27, FIGS. 16A-16D compares the speed of training for three systems, using generative adversarial training (GAN), Wasserstein GAN (WGAN), and domain adaptation), (Id., ¶ 116, The skip connections (not shown) feed forward the high resolution information in each convolution layer 812, 816, 822 of the encoder 880 to the corresponding deconvolution layer 836, 842, 846 in the decoder 890. Because back propagation (to update the higher encoder layers) is based on gradient functions that may tend to zero, keeping the high resolution information from the skip connections (not shown) allows deeper training without loss of high frequency details (discloses gradient penalty)), (Id., ¶ 117, At the smallest (deepest) level, also referred to as the "bottleneck" of the network, the latent space has one manifolds 828 corresponding to one latent variable vector (discloses latent feature vector));
a discriminator (D) to distinguish between the synthesized features generated by the generator (G) and the real features generated by the feature extractor and determines whether the features are the real features generated by the feature extractor or the synthetic features generated by the generator (G) (Id., ¶ 158, In this method, a generator as described in FIGS. 4 and 9 learns to classify images as normal or abnormal and generates synthesized images. The discriminator (not shown) used for FIG. 16A differs from the discriminator of FIG. 9. Whereas the discriminator of FIG. 9 tries to determine whether latent variable (discloses latent features) values correspond to images acquired from Domain A or Domain B; the discriminator of FIG. 16A (discloses discriminator) is trained to classify images as either real input images or synthesized images (from the generator));
a label encoder to encode… into a sequence of one or more hidden state sequences… (Id., ¶ 32, Variational autoencoders (VAEs) (discloses encoder) can represent input MR data in a latent space whose parameters are learned during encoding. A VAE can capture shape variability, and has generative capability to synthesize images of tissue (e.g., brain images) given the underlying latent space (or manifold) coordinates. An autoencoder is a feedforward, nonrecurrent neural network having an input layer, an output layer and one or more hidden layers (discloses hidden state sequences) connecting the input and output layers. The output layer has the same number of nodes as the input layer.
and a keywords reconstructor to reconstruct the keywords extracted from the clinical 5documents associated with a code l to ensure the latent feature vector (f)… (Id., ¶ 47, the VAE can include an encoder network 202 and a decoder network 204. The encoder network 202 has a plurality of layers 210, 220, and the decoder network 204 has a plurality of layers 240, 250. The layers 210, 220, 240 and 250 are described below with reference to FIGS. 3 and 4, below… The decoder network 204 reconstructs (discloses reconstructor) the input images 262, 264 from the latent variables 230 and computes the loss based on the input image data 110 and the output images 262, 264).
While suggested in at least Fig 1B and related text, Comaniciu does not explicitly disclose … corresponding to one or more ICD code descriptions; …for a low-shot ICD code l, wherein the GANs improve the low-shot ICD code l by generating a plurality of pseudo data examples in a latent feature space of the clinical documents for the low-shot ICD codes l… a textual description of each ICD code descriptions…; …a sequence of a plurality of keywords in the ICD code description… by using a long short-term memory (LSTM); …captures a semantic meaning of a code l…
However, Malhotra discloses …corresponding to one or more ICD code descriptions (Malhotra, ¶ 50, Method 10 also includes a step 16 of grouping diagnosis and procedure codes. Most of raw healthcare datasets have diagnosis and medical procedures coded by standard systems of classification such as the International Classification of Diseases and Related Health Problems (ICD) and Current Procedural Terminology (CPT). Both CPT and ICD-9 codes help in communicating uniform information to the physicians and payers for administrative and financial purposes but for analytics these codes are grouped into clinically significant and broader codes presented by another scheme of classification named Clinical Classification Software (CCS) maintained by Healthcare Cost and Utilization Project (HCUP). The single level scheme consists of approximately 285 mutually exclusive diagnosis categories and 241 procedure categories. Step 16 includes mapping all the ICD-9 and CPT codes in the raw dataset to corresponding CCS codes for use in constructing appropriate features for the model…), (Id., Table 8, Table depicts ICD code descriptions)

    PNG
    media_image2.png
    415
    428
    media_image2.png
    Greyscale


Through KSR Rationale D (See MPEP 2141(III)(D)), the combination of Comaniciu and Malhotra discloses …for a low-shot ICD code l, wherein the GANs improve the low-shot ICD code l by generating a plurality of pseudo data examples in a latent feature space of the clinical documents for the low-shot ICD codes l… a textual description of each ICD code descriptions…
First, Comaniciu discloses generating pseudo data examples using generative adversarial networks (Comaniciu, ¶ 121, FIG. 10-13 show the configuration of the GAN 800 (discloses generative adversarial network) during the training and test phases. The first batch of training input images 1001 are input to the encoder 880 of the generator 830. The first batch of training input images 1001 include both images with normal (not novel) tissue and images with abnormal (novel) tissues. The images are labeled to indicate whether the images are normal. All of the first batch of images are from a single domain (e.g., domain A)), (Id., ¶ 158, In this method, a generator as described in FIGS. 4 and 9 learns to classify images as normal or abnormal and generates synthesized images. The discriminator (not shown) used for FIG. 16A differs from the discriminator of FIG. 9. Whereas the discriminator of FIG. 9 tries to determine whether latent variable (discloses latent features) values correspond to images acquired from Domain A or Domain B).
Further, Malhotra discloses ICD code descriptions and the number of features generated to represent various codes (Malhotra, ¶ 54, as shown in FIG. 5, step 20 may include a substep 20a of filtering patients based on defined epilepsy diagnosis criteria to filter out non-epileptic patients. For example, to be included within the cohort, the patient must have at least one diagnosis claim of 345 (ICD-9 code for epilepsy diagnosis) or at least two claims of 780.39 (ICD-9 code for convulsions) at any time in the timeline of the patient. This criteria ensures the exclusion of all the patients which have not been diagnosed with any form of epilepsy and may have had one or less convulsions, thereby there is not substantial evidence to categorize the patient as an epileptic patient), (Id., Table 4, table depicts frequency associated with various features including ICD codes).

    PNG
    media_image3.png
    249
    412
    media_image3.png
    Greyscale

One of ordinary skill in the art would have recognized that applying the known technique of Comaniciu would have yielded predictable results and resulted in an improved system. It would have been recognized that applying the generative adversarial networks technique of Comaniciu to the ICD code and classification teachings of Malhotra would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such classification features into similar systems. Further, applying machine learning techniques with classifier codes to Malhotra with ICD codes stored accordingly, would have been recognized by those of ordinary skill in the art as resulting in an improved system that would allow more accurate reporting of medical records according to specific disease classification codes. Thus, through KSR Rationale D, the combination of Comaniciu and Malhotra discloses …for a low-shot ICD code l, wherein the GANs improve the low-shot ICD code l by generating a plurality of pseudo data examples in a latent feature space of the clinical documents for the low-shot ICD codes l… a textual description of each ICD code descriptions…
Malhotra further discloses…a sequence of a plurality of keywords in the ICD code description… by using a long short-term memory (LSTM); …captures a semantic meaning of a code l… (Malhotra, ¶ 6, machine learning has seen the rise of neural networks with many layers. These are commonly referred to as deep neural networks (DNN). Recurrent Neural Network (RNN) is an important class of DNN. A unique aspect of RNN is the folding out in time operation, where each time-step corresponds to a layer in a feedforward network. RNN's show great performance in modeling variable length sequential data, particularly those with gated activation units such as Long Short-Term Memory (LSTM), as described in Hochreiter et al., “Long short-term memory,” Neural Comput. 9, 1735-1780 (1997), and Gated Recurrent Units (GRU), as described in Chung et al., “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412, 3555 (2014). RNNs have achieved state-of-the-art results in machine translation, as described in Cho et al…. RNNs have also been applied to several clinical applications recently. Lipton et al used LSTM RNN to recognize patterns in multivariate time series of clinical measurements gathered from an intensive care unit (ICU), as described in Lipton et al., “Learning to Diagnose with LSTM Recurrent Neural Networks,” arXiv [cs.LG] (2015). Choi et al developed an application of RNN with GRU to jointly forecast the future disease diagnosis and medication prescription along with their timing as continuous multi-label predictions, as described in Choi et al., “Doctor AI: Predicting Clinical Events via Recurrent Neural Networks,” arXiv [cs.LG] (2015)), (Id., ¶ 84, In the example shown in FIG. 14, one RNN includes a pre-trained embedding layer and another RNN includes a randomly initialized embedding layer. An embedding layer is a type of layer that usable in deep neural networks and used in Natural Language Processing (NLP) applications), (Id., ¶ 85, The Med2Vec model is an advanced variation of the Word2Vec model that is based on the fact that the nature of medical data is similar with that of natural languages. For example, each single medical code acts as word in natural languages. In other embodiments, Word2Vec and GloVe models can be used to train the embedding layer, as for example described in Mikolov et al., Advances in Neural Information Processing Systems 26, Curran Associates, Inc., 2013, pp. 3111-3119 (Word2Vec) and Pennington et al., “Glove: Global Vectors for Word Representation,” EMNLP (2014) (GloVe), respectively).

    PNG
    media_image4.png
    378
    594
    media_image4.png
    Greyscale


At the time the invention was filed it would have been obvious to a person of ordinary skill in the art to have modified the generative adversarial network and feature generation elements of Comaniciu to include the international classification of diseases elements of Malhotra in the analogous art of predictive modeling for classifying patients [Malhotra, Abstract].
 The motivation for doing so would have been to implement “improve the ability to analyze a “patient cohort to identify a subset of the features that are predictive for refractoriness for inclusion in a predictive model configured for classifying patients” (Malhotra, ¶ 9), wherein such improvements would benefit Comaniciu’s method of “diagnostic classification (e.g., healthy vs. diseased) for a variety of brain conditions (e.g., MS, stroke, mTBI, tumors, etc.)” [Malhotra, ¶ 9; Comaniciu, ¶ 80].

	
	Regarding claim 2, the combination of Comaniciu and Malhotra discloses …The system according to claim 1.
Comaniciu further discloses …wherein the label encoder obtains a fixed-sized encoding vector (el) by performing a dimension-wise max-pooling over the hidden state sequences (Comaniciu, ¶ 50, The skip connections 301, 302, and 303 feed forward the high resolution information in each convolution layer of the encoder 202 to the corresponding deconvolution layer in the decoder. Because back propagation (to update the higher encoder layers) is based on gradient functions that may tend to zero, keeping the high resolution information from the skip connections 301, 302, 303 allows deeper training without loss of high frequency details. f(x)=max(0,x) (1) where x is the input to the layer), (Id., ¶ 55, Each dense block 210 also includes skip connections 411-413 between layers. Each lower layer 412 or 413 receives input from the adjacent higher layer 411 or 412 and the original input. For example, the input of block 413 is the output of the block 411 and block 412 concatenated to one input. This provides a more accurate result using feedforward, to keep high resolution information when pooling. The pooling provides an average or maximum of the data, and replaces a neighborhood with a single value, approximating the higher resolution data. So pooling could potentially lose information. By passing the result of the previous layers to the next layer along with the pooled data, the high resolution information is still propagated).

Regarding claim 3, the combination of Comaniciu and Malhotra discloses …The system according to claim 1.
Comaniciu further discloses …wherein the label encoder obtains… of the code / by concatenating the fixed-sized encoding vector (el) with… produced by a graph encoding network (Comaniciu, ¶ 55, Each dense block 210 also includes skip connections 411-413 between layers. Each lower layer 412 or 413 receives input from the adjacent higher layer 411 or 412 and the original input. For example, the input of block 413 is the output of the block 411 and block 412 concatenated to one input. This provides a more accurate result using feedforward, to keep high resolution information when pooling. The pooling provides an average or maximum of the data, and replaces a neighborhood with a single value, approximating the higher resolution data. So pooling could potentially lose information. By passing the result of the previous layers to the next layer along with the pooled data, the high resolution information is still propagated), (Id., ¶ 50, The skip connections 301, 302, and 303 feed forward the high resolution information in each convolution layer of the encoder 202 to the corresponding deconvolution layer in the decoder. Because back propagation (to update the higher encoder layers) is based on gradient functions that may tend to zero, keeping the high resolution information from the skip connections 301, 302, 303 allows deeper training without loss of high frequency details. f(x)=max(0,x) (1) where x is the input to the layer).
	While suggested in at least Fig 1B and related text, Comaniciu does not explicitly disclose …an eventual embedding 10(cl = elflgl)… an ICD tree hierarchy (gl) which is the embedding of the code…
However, Malhotra discloses …an eventual embedding 10(cl = elflgl)… an ICD tree hierarchy (gl) which is the embedding of the code… (Malhotra, ¶ 54, as shown in FIG. 5, step 20 may include a substep 20a of filtering patients based on defined epilepsy diagnosis criteria to filter out non-epileptic patients. For example, to be included within the cohort, the patient must have at least one diagnosis claim of 345 (ICD-9 code for epilepsy diagnosis) or at least two claims of 780.39 (ICD-9 code for convulsions) (discloses ICD hierarchy) at any time in the timeline of the patient. This criteria ensures the exclusion of all the patients which have not been diagnosed with any form of epilepsy and may have had one or less convulsions, thereby there is not substantial evidence to categorize the patient as an epileptic patient), (Id., ¶ 73, Method 10 further includes a step 26 of training the predictive model. In one preferred embodiment, the predictive model is a RNN 150 including the architecture shown in FIG. 9, including an input layer 152, an embedding layer 154, two hidden layers 156, 158—recurrent layers with GRUs, a decision layer 160 including a logistic regression classifier and an output layer 162).
At the time the invention was filed it would have been obvious to a person of ordinary skill in the art to have modified the generative adversarial network and feature generation elements of Comaniciu to include the international classification of diseases elements of Malhotra in the analogous art of predictive modeling for classifying patients for the same reasons as stated for claim 1.

Regarding claim 4, the combination of Comaniciu and Malhotra discloses …The system according to claim 3.
While suggested in at least Fig 1A and related text, Comaniciu does not explicitly disclose …wherein the eventual embedding (cl) comprises a latent semantics of the description (in el) and the ICD tree hierarchy (in gl).
However, Malhotra discloses …wherein the eventual embedding (cl) comprises a latent semantics of the description (in el) and the ICD tree hierarchy (in gl) (Malhotra, ¶ 84, In the example shown in FIG. 14, one RNN includes a pre-trained embedding layer and another RNN includes a randomly initialized embedding layer. An embedding layer is a type of layer that usable in deep neural networks and used in Natural Language Processing (NLP) applications. An embedding layer is a kind of matrix and an input vector of the deep neural network, which is a one-hot or multi-hot vector in NLP in preferred embodiments, is multiplied by this matrix. One preferred embodiment uses either a matrix initialized with some random numbers or a matrix of which values are trained by other deep neural network. In one preferred embodiment, the pre-trained embedding layer is a Med2Vec embedding layer pre-trained using the Med2Vec technique, as described in E. Choi, A. Schuetz, W. F. Stewart, J. Sun, Medical Concept Representation Learning from Electronic Health Records and its Application on Heart Failure PredictionarXiv [cs.LG] (2016) (available at http://arxiv.org/abs/1602.03686), but further modified to fit the current architecture), (Id., ¶ 111, The event data can have a format in which the prefix indicates the type of data elements. For the FHIR data elements, for example, medical conditions are mapped from ICD-9 or ICD-10 codes, medical procedures are mapped from CPT code and the drugs prescribed can be mapped from the NDC's general name with all spaces replaced with “_”. The mapped data elements are then passed through the feature construction and predictive model of tool 126).
At the time the invention was filed it would have been obvious to a person of ordinary skill in the art to have modified the generative adversarial network and feature generation elements of Comaniciu to include the international classification of diseases elements of Malhotra in the analogous art of predictive modeling for classifying patients for the same reasons as stated for claim 1.

Regarding claim 5, the combination of Comaniciu and Malhotra discloses …The system according to claim 1.
While suggested in at least Fig 7 and related text, Comaniciu does not explicitly disclose …wherein the binary code classifier is encoded by a graph gated recurrent neural networks (GRNN).
However, Malhotra discloses …wherein the binary code classifier is encoded by a graph gated recurrent neural networks (GRNN) (Malhotra, ¶ 79, Each layer of the RNN includes a plurality of RNN units. For example, a general hidden layer has many—10s or 100s even 1000s—hidden units. Similarly, a RNN layer of the RNN (i.e., recurrent hidden layer) is composed by multiple RNN units (i.e., recurrent units) such as GRUs. The RNN units used can be simple RNN units as described in Le et al., “A Simple Way to Initialize Recurrent Networks of Rectified Linear Units,” arXiv preprint arXiv:1504. 00941 (2015) or more complex recurrent units such as Long ShortTerm Memory (LSTM) described in Hochreiter et al., “Long short-term memory, Neural Comput. 9, pages 1735-1780 (1997) and Graves et al.,” A novel connectionist system for unconstrained handwriting recognition, I EEE Trans. Pattern Anal. Mach. Intell. 31, pages 855-868 (2009) or Gated Recurrent Units (GRU) described in Chung et al., “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412. 3555 (2014). Multiple units of RNNs can be stacked on top of each other to increase the representative power of the network. In one preferred embodiment, the RNNs are implemented with GRUs and the ADADELTA algorithm described in Zeiler, “ADADELTA: An Adaptive Learning Rate Method” arXiv [cs.LG] (2012) is an optimization algorithm used to train the network).
At the time the invention was filed it would have been obvious to a person of ordinary skill in the art to have modified the generative adversarial network and feature generation elements of Comaniciu to include the gated recurrent neural network elements of Malhotra in the analogous art of predictive modeling for classifying patients for the same reasons as stated for claim 1.

Regarding Claim 6, Comaniciu discloses …A method for classifying a plurality of clinical records into International Classification of Diseases (ICD) codes, the method comprising steps of: generating, by one or more processors, one or more synthetic features corresponding to one or more ICD code descriptions through a generator (G) (Comaniciu, ¶ 38, Computer system 103 may also include a main memory 104 (e.g., a random access memory (RAM)), and a secondary memory 108. The main memory 104 and/or the secondary memory 108 comprise a dynamic random access memory (DRAM)… The removable storage unit 116 may include a computer readable storage medium having tangibly stored therein (or embodied thereon) data and/or computer software instructions, e.g., for causing the processor(s) to perform various operations), (Id., ¶ 85, To compensate for the discrepancies in protocols between scanners, institutions, vendors, and/or models, system 800 uses adversarial training to create features for classification (discloses synthetic features) that are robust and nearly invariant to protocols. That is, the system 800 can classify a given input image as normal or abnormal regardless of the acquisition protocol. The configuration in FIG. 8A branches out a discriminator network (also referred to as "discriminator") 840 from the latent space. The discriminator 840 can identify the protocol being used (or the institution the image comes from, or the vendor and/or model). The training is performed using at least two different batches of data. The first batch of data trains a generative network (also referred to as a generator) 830 (discloses generator) for the normal/abnormal classification, using labeled (normal and abnormal) images for supervised learning. All of the training images in the first batch are acquired using the same protocol as each other (e.g., referred to below as "domain A"));

    PNG
    media_image1.png
    313
    566
    media_image1.png
    Greyscale

extracting, by the processors, one or more real latent features from a plurality of clinical documents and generating one or more real features by training a plurality of generative adversarial networks (GANs) through a feature extractor, wherein the generator (G) generates synthesized features after the GANs are trained and calibrates a binary code classifier with the real latent features generated by the feature extractor… wherein the generator (G) generates one or more code-specific latent features conditioned on… by using a Wasserstein GAN with gradient penalty (WGAN-GP), wherein the Wasserstein GAN with gradient penalty (WGAN-GP) generates a latent feature vector (f) (Id., ¶ 158, In this method, a generator as described in FIGS. 4 and 9 learns to classify images as normal or abnormal and generates synthesized images. The discriminator (not shown) used for FIG. 16A differs from the discriminator of FIG. 9. Whereas the discriminator of FIG. 9 (discloses feature extractor) tries to determine whether latent variable (discloses latent features) values correspond to images acquired from Domain A or Domain B (discloses binary code classifier); the discriminator of FIG. 16A is trained to classify images as either real input images or synthesized images (from the generator)), (Id., ¶ 121, FIG. 10-13 show the configuration of the GAN 800 (discloses generative adversarial networks) during the training and test phases. The first batch of training input images 1001 are input to the encoder 880 of the generator 830. The first batch of training input images 1001 include both images with normal (not novel) tissue and images with abnormal (novel) tissues. The images are labeled to indicate whether the images are normal. All of the first batch of images are from a single domain (e.g., domain A)), (Id., ¶ 27, FIGS. 16A-16D compares the speed of training for three systems, using generative adversarial training (GAN), Wasserstein GAN (WGAN), and domain adaptation), (Id., ¶ 116, The skip connections (not shown) feed forward the high resolution information in each convolution layer 812, 816, 822 of the encoder 880 to the corresponding deconvolution layer 836, 842, 846 in the decoder 890. Because back propagation (to update the higher encoder layers) is based on gradient functions that may tend to zero, keeping the high resolution information from the skip connections (not shown) allows deeper training without loss of high frequency details (discloses gradient penalty)), (Id., ¶ 117, At the smallest (deepest) level, also referred to as the "bottleneck" of the network, the latent space has one manifolds 828 corresponding to one latent variable vector (discloses latent feature vector));
distinguishing, by the processors, between the synthesized features generated by the generator (G) and the real features generated by the feature extractor and determining whether the features are the real features generated by the feature extractor or the synthetic features generated by the generator (G) through a discriminator (D) (Id., ¶ 158, In this method, a generator as described in FIGS. 4 and 9 learns to classify images as normal or abnormal and generates synthesized images. The discriminator (not shown) used for FIG. 16A differs from the discriminator of FIG. 9. Whereas the discriminator of FIG. 9 tries to determine whether latent variable (discloses latent features) values correspond to images acquired from Domain A or Domain B; the discriminator of FIG. 16A (discloses discriminator) is trained to classify images as either real input images or synthesized images (from the generator));
encoding, by the processors… into a sequence of one or more hidden state sequences… through a label encoder (Id., ¶ 32, Variational autoencoders (VAEs) (discloses encoder) can represent input MR data in a latent space whose parameters are learned during encoding. A VAE can capture shape variability, and has generative capability to synthesize images of tissue (e.g., brain images) given the underlying latent space (or manifold) coordinates. An autoencoder is a feedforward, nonrecurrent neural network having an input layer, an output layer and one or more hidden layers (discloses hidden state sequences) connecting the input and output layers. The output layer has the same number of nodes as the input layer.
and reconstructing, by the processors, the keywords extracted from the clinical documents associated with a code / for ensuring the latent feature vector (f)… through a keywords reconstructor (Id., ¶ 47, the VAE can include an encoder network 202 and a decoder network 204. The encoder network 202 has a plurality of layers 210, 220, and the decoder network 204 has a plurality of layers 240, 250. The layers 210, 220, 240 and 250 are described below with reference to FIGS. 3 and 4, below… The decoder network 204 reconstructs (discloses reconstructor) the input images 262, 264 from the latent variables 230 and computes the loss based on the input image data 110 and the output images 262, 264).
While suggested in at least Fig 1B and related text, Comaniciu does not explicitly disclose … corresponding to one or more ICD code descriptions; …for a low-shot ICD code l, wherein the GANs improve the low-shot ICD code l by generating a plurality of pseudo data examples in a latent feature space of the clinical documents for the low-shot ICD codes l… a textual description of each ICD code descriptions…; …a sequence of a plurality of keywords in the ICD code description… by using a long short-term memory (LSTM); …captures a semantic meaning of a code l…
However, Malhotra discloses …corresponding to one or more ICD code descriptions (Malhotra, ¶ 50, Method 10 also includes a step 16 of grouping diagnosis and procedure codes. Most of raw healthcare datasets have diagnosis and medical procedures coded by standard systems of classification such as the International Classification of Diseases and Related Health Problems (ICD) and Current Procedural Terminology (CPT). Both CPT and ICD-9 codes help in communicating uniform information to the physicians and payers for administrative and financial purposes but for analytics these codes are grouped into clinically significant and broader codes presented by another scheme of classification named Clinical Classification Software (CCS) maintained by Healthcare Cost and Utilization Project (HCUP). The single level scheme consists of approximately 285 mutually exclusive diagnosis categories and 241 procedure categories. Step 16 includes mapping all the ICD-9 and CPT codes in the raw dataset to corresponding CCS codes for use in constructing appropriate features for the model…), (Id., Table 8, Table depicts ICD code descriptions)

    PNG
    media_image2.png
    415
    428
    media_image2.png
    Greyscale


Through KSR Rationale D (See MPEP 2141(III)(D)), the combination of Comaniciu and Malhotra discloses …for a low-shot ICD code l, wherein the GANs improve the low-shot ICD code l by generating a plurality of pseudo data examples in a latent feature space of the clinical documents for the low-shot ICD codes l… a textual description of each ICD code descriptions…
First, Comaniciu discloses generating pseudo data examples using generative adversarial networks (Comaniciu, ¶ 121, FIG. 10-13 show the configuration of the GAN 800 (discloses generative adversarial network) during the training and test phases. The first batch of training input images 1001 are input to the encoder 880 of the generator 830. The first batch of training input images 1001 include both images with normal (not novel) tissue and images with abnormal (novel) tissues. The images are labeled to indicate whether the images are normal. All of the first batch of images are from a single domain (e.g., domain A)), (Id., ¶ 158, In this method, a generator as described in FIGS. 4 and 9 learns to classify images as normal or abnormal and generates synthesized images. The discriminator (not shown) used for FIG. 16A differs from the discriminator of FIG. 9. Whereas the discriminator of FIG. 9 tries to determine whether latent variable (discloses latent features) values correspond to images acquired from Domain A or Domain B).
Further, Malhotra discloses ICD code descriptions and the number of features generated to represent various codes (Malhotra, ¶ 54, as shown in FIG. 5, step 20 may include a substep 20a of filtering patients based on defined epilepsy diagnosis criteria to filter out non-epileptic patients. For example, to be included within the cohort, the patient must have at least one diagnosis claim of 345 (ICD-9 code for epilepsy diagnosis) or at least two claims of 780.39 (ICD-9 code for convulsions) at any time in the timeline of the patient. This criteria ensures the exclusion of all the patients which have not been diagnosed with any form of epilepsy and may have had one or less convulsions, thereby there is not substantial evidence to categorize the patient as an epileptic patient), (Id., Table 4, table depicts frequency associated with various features including ICD codes).

    PNG
    media_image3.png
    249
    412
    media_image3.png
    Greyscale

One of ordinary skill in the art would have recognized that applying the known technique of Comaniciu would have yielded predictable results and resulted in an improved system. It would have been recognized that applying the generative adversarial networks technique of Comaniciu to the ICD code and classification teachings of Malhotra would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such classification features into similar systems. Further, applying machine learning techniques with classifier codes to Malhotra with ICD codes stored accordingly, would have been recognized by those of ordinary skill in the art as resulting in an improved system that would allow more accurate reporting of medical records according to specific disease classification codes. Thus, through KSR Rationale D, the combination of Comaniciu and Malhotra discloses …for a low-shot ICD code l, wherein the GANs improve the low-shot ICD code l by generating a plurality of pseudo data examples in a latent feature space of the clinical documents for the low-shot ICD codes l… a textual description of each ICD code descriptions…
Malhotra further discloses…a sequence of a plurality of keywords in the ICD code description… by using a long short-term memory (LSTM); …captures a semantic meaning of a code l… (Malhotra, ¶ 6, machine learning has seen the rise of neural networks with many layers. These are commonly referred to as deep neural networks (DNN). Recurrent Neural Network (RNN) is an important class of DNN. A unique aspect of RNN is the folding out in time operation, where each time-step corresponds to a layer in a feedforward network. RNN's show great performance in modeling variable length sequential data, particularly those with gated activation units such as Long Short-Term Memory (LSTM), as described in Hochreiter et al., “Long short-term memory,” Neural Comput. 9, 1735-1780 (1997), and Gated Recurrent Units (GRU), as described in Chung et al., “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412, 3555 (2014). RNNs have achieved state-of-the-art results in machine translation, as described in Cho et al…. RNNs have also been applied to several clinical applications recently. Lipton et al used LSTM RNN to recognize patterns in multivariate time series of clinical measurements gathered from an intensive care unit (ICU), as described in Lipton et al., “Learning to Diagnose with LSTM Recurrent Neural Networks,” arXiv [cs.LG] (2015). Choi et al developed an application of RNN with GRU to jointly forecast the future disease diagnosis and medication prescription along with their timing as continuous multi-label predictions, as described in Choi et al., “Doctor AI: Predicting Clinical Events via Recurrent Neural Networks,” arXiv [cs.LG] (2015)), (Id., ¶ 84, In the example shown in FIG. 14, one RNN includes a pre-trained embedding layer and another RNN includes a randomly initialized embedding layer. An embedding layer is a type of layer that usable in deep neural networks and used in Natural Language Processing (NLP) applications), (Id., ¶ 85, The Med2Vec model is an advanced variation of the Word2Vec model that is based on the fact that the nature of medical data is similar with that of natural languages. For example, each single medical code acts as word in natural languages. In other embodiments, Word2Vec and GloVe models can be used to train the embedding layer, as for example described in Mikolov et al., Advances in Neural Information Processing Systems 26, Curran Associates, Inc., 2013, pp. 3111-3119 (Word2Vec) and Pennington et al., “Glove: Global Vectors for Word Representation,” EMNLP (2014) (GloVe), respectively).

    PNG
    media_image4.png
    378
    594
    media_image4.png
    Greyscale

At the time the invention was filed it would have been obvious to a person of ordinary skill in the art to have modified the generative adversarial network and feature generation elements of Comaniciu to include the international classification of diseases elements of Malhotra in the analogous art of predictive modeling for classifying patients for the same reasons as stated for claim 1.

Regarding claims 7-10, these claims recite limitations substantially similar to those in claims 2-5, respectively, and are rejected for the same reasons as stated above.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Heinrich et al., U.S. Patent No. 10,224,119, discloses a system and method of prediction through the use of latent semantic indexing.
Campbell, U.S. Publication No. 2012/0166212, discloses a system and method for machine based medical diagnostic code identification, accumulation, analysis and automatic claim process adjudication.
Orciuoli et al., U.S. Publication No. 2020/0373015, discloses a computer implemented method for classifying a patient based on codes of at least one predetermined patient classification and computerized system to carry it out.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS D BOLEN whose telephone number is (408)918-7631. The examiner can normally be reached Monday - Friday 8:00 AM - 5:00 PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Patty Munson can be reached on (571) 270-5396. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NICHOLAS D BOLEN/               Examiner, Art Unit 3624                                                                                                                                                                                         /PATRICIA H MUNSON/Supervisory Patent Examiner, Art Unit 3624