DETAILED ACTION

	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Status of Claims

In the amendment filed on 01 November 2021, the following has occurred:  Claims 1, 5, 10, and 17 have been amended. 
Claims 1-7 and 10-20 are currently pending and have been examined.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35
U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention 

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148
USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-2, 4-7, and 10-11, 13-18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Friedlander et al. (US7809660B2) in view of Hsieh et al. (US20180144465A1), Ong et al. (US20110224565A1), and further in view of Trask et al. (US20160247061A1). 
Regarding claim 1, Friedlander discloses a memory that stores computer executable components; and a processor that executes the computer executable components stored in the memory
([Col.3 lines 25-30] “In the depicted example, data processing system 200 employs a hub architecture including a north bridge and memory controller hub (MCH) 202 and a south bridge and input/output (I/O) controller hub (ICH) 204. Processor 206, main memory 208, and graphics processor 210 are coupled to north bridge and memory controller hub 202.”)
a grouping component that: based on an outcome of interest of the one or more outcomes of interest, assigns, using the one or more probabilistic classifiers, patients to groups according to phenotyping features associated with the outcome of interest ([Col. 6 lines 24-34] “Clusters are natural groupings of patient records based on the specified features or attributes. For example, a user may request that data mining application 308 generate eight clusters in a maximum often passes. The main task of neural clustering is to find a center for each cluster. The center is also called the cluster prototype. Scores are generated based on the distance between each patient record and each of the cluster prototypes. Scores closer to Zero have a higher degree of similarity to the cluster prototype. The higher the score, the more dissimilar the record is from the cluster prototype.”)
and based on the assigning of the patients based on the phenotyping features, filters out the phenotyping feature from being utilized in determining a representative patient of the patients in a group for the outcome of interest ([Col. 8 lines 28-39] “Cluster system 600 may be used to perform step 904 of FIG. 9. Cluster system 600 includes treatment cohort records 602, filter 604, clustering algorithm 606, cluster assignment criteria 608, and clustered records from treatment cohort 610. Filter 604 is used to eliminate any patient records that have significant co-morbidities that would by itself eliminate inclusion in a drug trial……For example, it may be desirable to exclude results from persons with more than one stroke from the statistical analysis of a new heart drug.”)
 wherein the neural network component further: in response to the assigning the patients into groups, determines the representative patient of the patients in a group that has a minimal distance to other patients in the group based on respective values associated with the phenotyping features for the patients in the group ([Col. 6 lines 17-19] “Data mining application 308 may use a clustering technique or model known as a Kohonen feature map neural network or neural clustering.” [Col. 7 lines 24-31] “For each record in the input patient data set, the neural clustering 25 data mining algorithm computes the cluster prototype that is the closest to the records. For example, patient record A 414, patient record B 416, and patient record C 418 are grouped into cluster 1406. Additionally, patient record X 420, patient record Y 422, and patient record Z 424 are grouped into 30 cluster 4412.” [Col. 7 lines 7-12] “Feature map 400 may include as many dimensions as there are features, such as age, gender, and severity of illness. Feature map 400 also includes cluster 1406, cluster 2 408, cluster 3 410, and cluster 4412. The clusters are the result of 10 using feature map 400 to group individual patients based on the features.” ([Col. 7 lines 44-51] “For example, patient B 416 is scored into the cluster prototype or center of cluster 1406, cluster 2408, cluster 3 410 and cluster 4412. A Euclidean distance between patient B 416 and cluster 1406, cluster 2408, cluster 3410 and cluster 4 412 is shown. In this example, distance 1426, separating patient B 416 from cluster 1406, is the closest. Distance 3 428, separating patient B 
a weighting component that: selects a second weight value for an event ([Col. 7 lines 14-18] “When a training sample of patients is analyzed by data mining application 308 of FIG.3, each patient is grouped into clusters where the clusters are weighted functions that best represent natural divisions of all patients based on the specified features.” In Figure 3 Cluster 2 represents the second weighted function for an event.)


Friedlander does not explicitly disclose however Hsieh teaches a neural network component that: employs patient data to train one or more probabilistic classifiers to predict a probability distribution over one or more outcomes of interest associated with phenotyping features ([0069] “An example deep learning neural network can be trained on a set of expert classified data, for example. This set of data builds the first parameters for the neural network, and this would be the stage of supervised learning.” [0225] “In other examples, image quality is generated for computer analysis as a change in probabilistic values of image classification. On a scale of 1-5, for example, a 3 indicates the image is diagnosable (e.g., is of diagnostic quality), a 5 indicates a perfect image (e.g., probably at too high of a dose), and a 1 indicates the image data is not usable for diagnosis. As a result, a preferred score is 3-4. The DDLD 1532 can generate an IQI based on acquired image data by mimicking radiologist behavior and the 1-5 scale.  Using image data attributes, the DDLD 1532 can analyze an image and determine features (e.g., a small lesion) and evaluate diagnostic quality of each feature in the image data.” [0226] “For example, an image can be categorized as belonging to class 4 with an associated probability of 90%, a 9% probability that the image belongs to class 5, and a 1% probability that the image belongs to class 3.”)
wherein the neural network component employs a stochastic gradient descent with back propagation algorithm ([0139] “Backpropagation or backward propagation of errors can be used in batches (e.g., mini-batches, etc.) involving pre-determined sets (e.g., small sets) of randomly selected data from the learning data set using stochastic gradient descent (SGD) to minimize or otherwise reduce a pre-determined cost function while trying to prevent over-training by regularization (e.g., dropouts, batch normalization of mini-batches prior to non-linearities, etc.) in the auto-encoder network.”) 
to train a selection component to associate a phenotyping feature with one or more outcomes of interest ([0177] “Each deep learning network can be trained using curated data with associated outcome results. For example, data regarding stroke (e.g., data from onset to 90 days post-treatment, etc.) can be used to train a neural network to drive to predictive stroke outcomes.”)
wherein the second weight value is based on a second time value associated with a time of occurrence of the event ([0085] “If weights assigned to nodes in the DLN 420 are examined, there are likely many 

Note: weights are based on time such that the nodes assigned with the small/low weight values are associated with a faster neural network runtime.

and wherein the weighting component adjusts the second weight value ([0289] “Weights and/or biases associated with nodes, connections, etc., can also be modified by patterns, relationships, values, presence or absence of values, etc., found by the model(s) 1581-1587 in the input data, for example.”)
is selected based on executions of the neural network component ([0233] “In a convolutional layer of an example deep convolutional network, an initial layer includes a plurality of feature maps in which node weights are initialized using parameterized normal random variables.”)

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to expand Friedlander’s optimization of cohorts with Hsieh’s techniques for employing stochastic gradient descent. The motivation for the combination of Friedlander and Hsieh is to improve quality care for patients (See Hsieh, Background).

wherein the second weight value is selected to filter out the phenotyping feature from being utilized in the determination of the representative patient ([0199] “To summarize FIGS. 6 and 7, extracting the heart rate variability data, in embodiments of the invention, comprises filtering the ECG signal to remove noise and artifacts, locating a QRS complex within the filtered ECG signal.”)

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to expand Friedlander’s examination of hospitalization rates and Hsieh’s techniques for employing stochastic gradient descent with Ong’s techniques for filtering out the phenotyping feature. The motivation for the combination of Friedlander, Hsieh, and Ong is to use the most representative data for the patient (See Ong, Background).

Friedlander in view of Hsieh and Ong does not explicitly disclose however Trask teaches wherein the neural network component encodes phenotype feature data with a first time value, wherein phenotype feature sequence data is generated in a first language, wherein the phenotype feature sequence data is translated in a second language by associating a first weight value of the phenotype feature in the first language with the first weight value of a same phenotype feature in the second language ([0028] “Each input also has a weight associated with it, with the collective set of input weights comprising the vector. w k =[w 0 ,w 1 ,w 2 . . . w m].” [0066] " This suggests that the neural language pre-training map words into vector space in such a way that language becomes another dimension that the distributed representations model. In some embodiments, this property of the embeddings enables a sentiment model trained in one language to predict on another. In some embodiments, a sentiment trained in one language will predict on another better where there is a shared vocabulary between languages. In some embodiments, during pre-training, the neural language models can construct a hidden layer using neurons sampled according to vocabulary frequencies. In this way, the hidden layer construction for each language will be similar despite having completely different input layer representations.”)

Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to expand Friedlander’s examination of hospitalization rates, Hsieh’s techniques for employing stochastic gradient descent, and Ong’s techniques for filtering out the phenotyping feature with Trask’s techniques for translating data from a first language into a second language. The motivation for the combination of Friedlander, Hsieh, Ong, and Trask is to efficiently learn distributed 
Regarding claim 2, Friedlander discloses wherein the outcome of interest is associated with a prediction of a target disease ([Col. 5 lines 37-39] “Attributes define features, variables, and characteristics of each patient. The most common attributes may include gen der, age, disease or illness, and state of the disease.” [Col. 6 lines 1-16] “For example, data mining application 308 may be able to group patient records to show the effect of a new sepsis blood infection medicine. Currently, about 35 percent of all patients with the diagnosis of sepsis die. Patients entering an emergency department of a hospital who receive a diagnosis of sepsis, and who are not responding to classical treatments, may be recruited to participate in a drug trial. A statistical control cohort of similarly ill patients could be developed by cohort system 300, using records from historical patients, patients from another similar hospital, and patients who choose not to participate. Potential features to produce a clustering model could include age, co-morbidities, gender, Surgical procedures, number of days of current hospitalization, O2 blood saturation, blood pH, blood lactose levels, bilirubin levels, blood pressure, respiration, mental acuity tests, and urine output.”)
Regarding claim 4, Friedlander discloses wherein the grouping component recursively assigns the patients to the groups according to a similarity between the phenotyping features and based on the outcome of interest ([Col. 6 lines 1-34] “For example, data mining 
Regarding claim 5, Friedlander discloses wherein the phenotyping feature is selected based on an association with the outcome of interest, resulting in removed data
Regarding claim 6, Friedlander discloses wherein the phenotyping feature is a first phenotyping feature, and wherein a second phenotyping feature of the phenotyping features is removed ([Col. 8 lines 32-39] “Filter 604 is used to eliminate any patient records that have significant co-morbidities that would by itself eliminate inclusion in a drug trial. Co-morbidities are other diseases, illnesses, or conditions in addition to the desired features. For example, it may be desirable to exclude results from persons with more than one stroke from the statistical analysis of a new heart drug.”)
in response to a condition associated with the outcome of interest being determined to have been satisfied ([Col. 8 lines 17-22] “Feature selection 510 is the features and variables that are most important for a control cohort to mirror the treatment cohort. For example, based on the treatment cohort, the variables in feature selection 510 most important to match in the treatment cohort may be age 402 and severity of seizure 404 as shown in FIG. 4.”)
Regarding claim 7, Friedlander discloses a comparison component that: based on the outcome of interest, determines a similarity between a first phenotyping feature and a second phenotyping feature of the phenotyping features associated with the patients ([Col. 4 lines 49-61] “To demonstrate a cause and effect relationship, an experiment must be designed to show that a phenomenon occurs after a certain treatment is given to a subject and that the phenomenon does not occur in the absence of the 
Regarding claims 10, Friedlander discloses a computer readable
storage medium having program instructions embodied therewith
([Col. 10 lines 61-65] “Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.”)
based on an outcome of interest of the one or more outcomes of interest, assign, using the one or more probabilistic classifiers, patients to groups according to phenotyping features associated with the outcome of interest ([Col. 6 lines 24-34] “Clusters are natural groupings of patient records based on the specified features or attributes. For example, a user may request that data mining application 308 generate eight clusters in a maximum often passes. The main task of neural clustering is to find a center for each cluster. The center is also called the cluster prototype. Scores are generated based on the distance between each patient record and each of the cluster prototypes. Scores closer to Zero have a higher degree of 
based on the assigning of the patients based on the phenotyping features, filters out the phenotyping feature from being utilized in determining a representative patient of the patients in a group for the outcome of interest ([Col. 8 lines 28-39] “Cluster system 600 may be used to perform step 904 of FIG. 9. Cluster system 600 includes treatment cohort records 602, filter 604, clustering algorithm 606, cluster assignment criteria 608, and clustered records from treatment cohort 610. Filter 604 is used to eliminate any patient records that have significant co-morbidities that would by itself eliminate inclusion in a drug trial……For example, it may be desirable to exclude results from persons with more than one stroke from the statistical analysis of a new heart drug.”)
in response to the assigning the patients into groups, determine the representative patient has a minimal distance to other patients in the group based on respective values associated with the phenotyping features for the patients in the group ([Col. 6 lines 17-19] “Data mining application 308 may use a clustering technique or model known as a Kohonen feature map neural network or neural clustering.” [Col. 7 lines 24-31] “For each record in the input patient data set, the neural clustering 25 data mining algorithm computes the cluster prototype that is the closest to the records. For example, patient record A 414, patient record B 416, and patient record C 418 are grouped into cluster 1406. Additionally, patient record X 


Friedlander does not explicitly disclose however Hsieh teaches employ patient data to train one or more probabilistic classifiers to predict a probability distribution over one or more outcomes of interest associated with phenotyping features ([0069] “An example deep learning neural network can be trained on a set of expert classified data, for example. This set of data builds the first parameters for the neural network, and this would be the stage of supervised learning.” [0225] “In other examples, image quality is generated for computer analysis as a change in probabilistic values of image classification. On a scale of 1-5, for example, a 3 indicates the image is diagnosable (e.g., is of diagnostic quality), a 5 indicates a perfect image (e.g., probably at too high of a dose), and a 1 indicates the image data is not usable for diagnosis. As a result, a preferred score is 3-4. The DDLD 1532 can generate an IQI based on acquired image data by mimicking radiologist behavior and the 1-5 scale.  Using image data attributes, the DDLD 1532 can analyze an image and determine features (e.g., a small lesion) and evaluate diagnostic quality of each feature in the image data.” [0226] “For example, an image can be categorized as belonging to class 4 with an associated probability of 90%, a 9% probability that the image belongs to class 5, and a 1% probability that the image belongs to class 3.”)
wherein employ a stochastic gradient descent with back propagation algorithm ([0139] “Backpropagation or backward propagation of errors can be used in batches (e.g., mini-batches, etc.) involving pre-determined sets (e.g., small sets) of randomly selected data from the learning data set using stochastic gradient descent (SGD) to minimize or otherwise reduce a pre-determined cost function while trying to prevent over-training by regularization (e.g., dropouts, batch normalization of mini-batches prior to non-linearities, etc.) in the auto-encoder network.”) 
to train a selection component to associate a phenotyping feature of the phenotyping features with the one or more outcomes of interest ([0177] “Each deep learning network can be trained using curated data with 
and adjust a weighted value assigned to the one or more phenotyping features ([0230] “A classifier 2350 (e.g., a softmax classifier, etc.) associates weights with nodes representing features of interest.” [0289] “Weights and/or biases associated with nodes, connections, etc., can also be modified by patterns, relationships, values, presence or absence of values, etc., found by the model(s) 1581-1587 in the input data, for example.”)
based on executions of a recurrent neural network component ([0114] “The AI catalog 1326 can include one or more AI models such as …recurrent neural network (RNN), long short-term memory (LS™), generative adversarial network (GAN), etc.), paradigms (e.g., supervised, unsupervised, reinforcement, etc.), etc.”)
wherein the weighted patient value is based in a second time value associated with a time occurrence of the event  ([0085] “If weights assigned to nodes in the DLN 420 are examined, there are likely many connections and nodes with very low weights.” [0243] “Randomly decreasing values can reduce runtime by processing inputs that have little effect on image quality, such as by eliminating redundant nodes, redundant connections, etc., in the network.”)

Note: weights are based on time such that the nodes assigned with the small/low weight values are associated with a faster neural network runtime.


It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to expand Friedlander’s optimization of cohorts with Hsieh’s techniques for employing stochastic gradient descent. The motivation for the combination of Friedlander and Hsieh is to improve quality care for patients (See Hsieh, Background).

Friedlander in view of Hsieh does not explicitly disclose however Ong teaches wherein the weighted patient data value is adjusted to filter out the phenotyping feature from being utilized in the determination of the representative patient ([0199] “To summarize FIGS. 6 and 7, extracting the heart rate variability data, in embodiments of the invention, comprises filtering the ECG signal to remove noise and artifacts, locating a QRS complex within the filtered ECG signal.”)

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to expand Friedlander’s examination of hospitalization rates and Hsieh’s techniques for employing stochastic gradient descent with Ong’s techniques for filtering out the phenotyping feature. The motivation for the combination of Friedlander, 

Friedlander in view of Hsieh and Ong does not explicitly disclose however Trask teaches encode phenotype feature data with a first time value, wherein phenotype feature sequence data is generated in a first language: translate the phenotype feature sequence data in a second language by associating a weight value of the phenotype feature in the first language with the weight value of a same phenotype feature in the second language ([0028] “Each input also has a weight associated with it, with the collective set of input weights comprising the vector. w k =[w 0 ,w 1 ,w 2 . . . w m].” [0066] " This suggests that the neural language pre-training map words into vector space in such a way that language becomes another dimension that the distributed representations model. In some embodiments, this property of the embeddings enables a sentiment model trained in one language to predict on another. In some embodiments, a sentiment trained in one language will predict on another better where there is a shared vocabulary between languages. In some embodiments, during pre-training, the neural language models can construct a hidden layer using neurons sampled according to vocabulary frequencies. In this way, the hidden layer construction for each language will be similar despite having completely different input layer representations.”)

Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to expand Friedlander’s examination of hospitalization rates, Hsieh’s techniques for employing stochastic gradient descent, and Ong’s techniques for filtering out the phenotyping feature with Trask’s techniques for translating data from a first language into a second language. The motivation for the combination of Friedlander, Hsieh, Ong, and Trask is to efficiently learn distributed representations of words based on their vector embedding (See Trask, Background).
Regarding claim 11 the limitations are rejected for the same reasons as stated above for claim 2.
Regarding claim 13 the limitations are rejected for the same reasons as stated above for claim 4.
 Regarding claim 14 the limitations are rejected for the same reasons as stated above for claim 5.
Regarding claim 15 the limitations are rejected for the same reasons as stated above for claim 6.
Regarding claim 16 the limitations are rejected for the same reasons as stated above for claim 7.
Regarding claim 17, Friedlander discloses based on an outcome of interest, assigning, by the device using the one or more probabilistic classifiers, patients to groups according to phenotyping features associated with the outcome of interest ([Col. 6 lines 24-34] “Clusters are natural groupings of patient records based on the specified features or attributes. For example, a user may request that data mining application 308 generate eight clusters in a maximum often passes. The main task of neural clustering is to find a center for each cluster. The center is also called the cluster prototype. Scores are generated based on the distance between each patient record and each of the cluster prototypes. Scores closer to Zero have a higher degree of similarity to the cluster prototype. The higher the score, the more dissimilar the record is from the cluster prototype.”)
based on the assigning of the patients based on the phenotyping features, filters out the phenotyping feature from being utilized in determining a representative patient of the patients in a group for the outcome of interest ([Col. 8 lines 28-39] “Cluster system 600 may be used to perform step 904 of FIG. 9. Cluster system 600 includes treatment cohort records 602, filter 604, clustering algorithm 606, cluster assignment criteria 608, and clustered records from treatment cohort 610. Filter 604 is used to eliminate any patient records that have significant co-morbidities that would by itself eliminate inclusion in a drug trial……For example, it may be desirable to exclude results from persons with more than one stroke from the statistical analysis of a new heart drug.”)
in response to the assigning the patients into groups, determining by the device the representative patient has a minimal distance to other patients in the group based on respective values associated with the phenotyping features for the patients in the group ([Col. 6 lines 17-19] “Data mining application 308 may use a clustering technique or model known as a Kohonen feature map neural network or neural clustering.” [Col. 7 lines 24-31] “For each record in the input patient data set, the neural clustering 25 data mining algorithm computes the cluster prototype that is the closest to the records. For example, patient record A 414, patient record B 416, and patient record C 418 are grouped into cluster 1406. Additionally, patient record X 420, patient record Y 422, and patient record Z 424 are grouped into 30 cluster 4412.” [Col. 7 lines 7-12] “Feature map 400 may include as many dimensions as there are features, such as age, gender, and severity of illness. Feature map 400 also includes cluster 1406, cluster 2 408, cluster 3 410, and cluster 4412. The clusters are the result of 10 using feature map 400 to group individual patients based on the features.” ([Col. 7 lines 44-51] “For example, patient B 416 is scored into the cluster prototype or center of cluster 1406, cluster 2408, cluster 3 410 and cluster 4412. A Euclidean distance between patient B 416 and cluster 1406, cluster 2408, cluster 3410 and cluster 4 412 is shown. In this example, distance 1426, separating patient B 416 from cluster 1406, is the closest. Distance 3 428, separating patient B 416 from cluster 3 410, is the furthest. These distances indicate that cluster 1406 is the best fit.”)


Friedlander does not explicitly disclose however Hsieh teaches employ, by a device coupled to a processor, patient data to train one or more probabilistic classifiers to predict a probability distribution over one or more outcomes of interest associated with one or more phenotyping features ([0118] “The components of the healthcare system 1400 can be implemented using one or more processors executing hardcoded configuration, firmware configuration, software instructions in conjunction with a memory, etc.” [0069] “An example deep learning neural network can be trained on a set of expert classified data, for example. This set of data builds the first parameters for the neural network, and this would be the stage of supervised learning.” [0225] “In other examples, image quality is generated for computer analysis as a change in probabilistic values of image classification. On a scale of 1-5, for example, a 3 indicates the image is diagnosable (e.g., is of diagnostic quality), a 5 indicates a perfect image (e.g., probably at too high of a dose), and a 1 indicates the image data is not usable for diagnosis. As a result, a preferred score is 3-4. The DDLD 1532 can generate an IQI based on acquired image data by mimicking radiologist behavior and the 1-5 scale.  Using image data attributes, the DDLD 1532 can analyze an image and determine features (e.g., a small lesion) and evaluate diagnostic quality of each feature in the image data.” [0226] “For example, an image can be categorized as belonging to class 4 with an associated probability of 90%, a 9% probability that the image belongs to class 5, and a 1% probability that the image belongs to class 3.”)
wherein employ a stochastic gradient descent with back propagation algorithm ([0139] “Backpropagation or backward propagation of errors can be used in batches (e.g., mini-batches, etc.) involving pre-determined sets (e.g., small sets) of randomly selected data from the learning data set using stochastic gradient descent (SGD) to minimize or otherwise reduce a pre-determined cost function while trying to prevent over-training by regularization (e.g., dropouts, batch normalization of mini-batches prior to non-linearities, etc.) in the auto-encoder network.”) 
to train a selection component to associate a phenotyping feature of the one or more phenotyping features with the one or more outcomes of interest ([0177] “Each deep learning network can be trained using curated data with associated outcome results. For example, data regarding stroke (e.g., data from onset to 90 days post-treatment, etc.) can be used to train a neural network to drive to predictive stroke outcomes.”)
and adjusting a weighted patient data value assigned to one or more phenotyping features ([0230] “A classifier 2350 (e.g., a softmax classifier, etc.) associates weights with nodes representing features of interest.” [0289] “Weights and/or biases associated with nodes, connections, etc., can also be modified by patterns, relationships, values, presence or absence of values, etc., found by the model(s) 1581-1587 in the input data, for example.”)
based on executions of the neural network component ([0114] “The AI catalog 1326 can include one or more AI models such as …recurrent neural network (RNN), long short-term memory (LS™), generative adversarial network (GAN), etc.), paradigms (e.g., supervised, unsupervised, reinforcement, etc.), etc.”)

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to expand Friedlander’s optimization of cohorts with Hsieh’s techniques for employing stochastic gradient descent. The motivation for the combination of Friedlander and Hsieh is to improve quality care for patients (See Hsieh, Background).

Friedlander in view of Hsieh does not explicitly disclose however Ong teaches wherein the weight value is selected to filter out the phenotyping feature from being utilized in the determination of the representative patient ([0199] “To summarize FIGS. 6 and 7, extracting the heart rate variability data, in embodiments of the invention, comprises filtering the ECG signal to remove noise and artifacts, locating a QRS complex within the filtered ECG signal.”)

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to expand Friedlander’s examination of hospitalization rates and Hsieh’s techniques for employing 

Friedlander in view of Hsieh and Ong does not explicitly disclose however Trask teaches encoding, by the device, phenotype feature data with a time-stamp, wherein phenotype feature sequence data is generated in a first language; translate, by the device, the phenotype feature sequence data in a second language by associating a weight value of the phenotype feature in the first language with the weight value of a same phenotype feature in the second language ([0028] “Each input also has a weight associated with it, with the collective set of input weights comprising the vector. w k =[w 0 ,w 1 ,w 2 . . . w m].” [0066] " This suggests that the neural language pre-training map words into vector space in such a way that language becomes another dimension that the distributed representations model. In some embodiments, this property of the embeddings enables a sentiment model trained in one language to predict on another. In some embodiments, a sentiment trained in one language will predict on another better where there is a shared vocabulary between languages. In some embodiments, during pre-training, the neural language models can construct a hidden layer using neurons sampled according to vocabulary frequencies. In this way, the hidden layer 

Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to expand Friedlander’s examination of hospitalization rates, Hsieh’s techniques for employing stochastic gradient descent, and Ong’s techniques for filtering out the phenotyping feature with Trask’s techniques for translating data from a first language into a second language. The motivation for the combination of Friedlander, Hsieh, Ong, and Trask is to efficiently learn distributed representations of words based on their vector embedding (See Trask, Background).
Regarding claim 18 the limitations are rejected for the same reasons as stated above for claim 2.
Regarding claim 20 the limitations are rejected for the same reasons as stated above for claim 5.

Claim 3, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Friedlander et al. (US7809660B2) in view of Hsieh et al. (US20180144465A1), Ong et al. (US20110224565A1), Trask et al. (US20160247061A1), and further in view of Widdows et al. (Reasoning with Vectors: A Continuous Model for Fast Robust Inference). 
Regarding claim 3, Friedlander in view of Hsieh, Ong, and Trask does not explicitly disclose however Widdows teaches an encoding component that: encodes a sentence associated with the phenotyping features into a vector product to predict the target disease. [[pg. 13] For example, encoding a single instance of the predication “Insulin TREATS Diabetes Mellitus” is accomplished as follows: 
 
    PNG
    media_image1.png
    43
    382
    media_image1.png
    Greyscale

(The symbol “+=” is used here in the computing sense of “add the right hand side to the left hand side”.)]
	
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to expand Friedlander’s examination of hospitalization rates, Hsieh’s techniques for employing stochastic gradient descent, and Ong’s techniques for filtering out the phenotyping feature with Widdows’ techniques for encoding a sentence. The motivation for the combination of Friedlander, Hsieh, Ong, Trask, and Widdows is to support fast, approximate but robust inference and hypothesis generation (See Widdows, Background).
Regarding claim 12 the limitations are rejected for the same reasons as stated above for claim 3.
Regarding claim 19 the limitations are rejected for the same reasons as stated above for claim 3.

Response to Arguments

Applicant’s arguments filed 01 November 2021 have been fully considered but are not fully persuasive.

Regarding the 103 rejection, applicant’s arguments have been considered but do not apply to the newly cited reference. The dependent claims are also still rejected under USC 103.

Conclusion
                                                                                                                                              
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jan Mooneyham can be reached on (571)-272-6805. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/W.F./Examiner, Art Unit 3626                                                                                                                                                                                                        
/JOSHUA B BLANCHETTE/Primary Examiner, Art Unit 3626