DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application, filed on 09/12/2018. Claims 1-20 are pending and have been examined. Claims 1, 8 and 15 are independent claim. 
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/12/2018. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Interpretation
 “computer readable storage media” in claims 8-14 is interpreted as “non-transitory computer readable storage media” in view of [0057], which recites “A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se”  
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4 – 9, 11-13, 15 and 17 - 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Che et al. “Boosting Deep Learning Risk Prediction with Generative Adversarial Networks for Electronic Health Records” in view of Draelos et al. (US 2017/0177993 A1)
Regarding claim 1: 
Che et al. teaches A method for generating a trained neural network, comprising (Page 788 Section: A Basic Deep prediction model “a convolutional neural network (CNN) model with 1D convolutional layer over the temporal dimension and max over-time pooling layer” and Page 790 Section B: Risk Prediction Comparison on Basic Models “basic predict model (CNN), which explores the CNN model with pre-trained medical feature embedding and is a strong baseline even before boosted” teaches system contain neural network).
creating a neural network (Page 788 Section: A Basic Deep prediction model “a convolutional neural network (CNN) model with 1D convolutional layer over the temporal dimension and max over-time pooling layer” and Page 790 “basic predict model (CNN), which explores the CNN model with pre-trained medical feature embedding and is a strong baseline even before boosted” teaches system contain trained neural network).
performing an initial training of the neural network using a set of labeled data (Page 787 Section I: Introduction “Semi-supervised learning is a class of techniques that makes use of unlabeled or augmented data together with a relatively small set of labeled data to get better performance” and Page 791 Section D: Evaluation of the Boosted Model “SSL-GAN with best performance by Section III-E. We summarize the classification performance in Table III in different settings with different amounts of labeled data” teaches training preformed using labeled data).
applying a boosted neural network to unlabeled data to determine whether any of the unlabeled data qualifies as additional labeled data (Page 787 Section I Introduction “Semi-supervised learning is a class of techniques that makes use of unlabeled or augmented data together with a relatively small set of labeled data to get better performance” and Page 789 Section C Semi-supervised Learning with GANs “and μ leverages the ratio of the numbers of training data and augmented data from GANs. In other words, this model assumes that a well trained generator with distribution p(x˜|x) should be able to generate samples that are likely to align within the same class of x, which can in turn provide valuable information to the classifier as additional training data” teaches performing semi supervised learning to receive additional training data from the unlabeled data).
retraining, in response to a determination that any of the unlabeled data qualifies as additional labeled data, the boosted neural network using the additional labeled data Page 788 Section I. Introduction “an SSL framework which achieves boosted risk prediction performance by utilizing the augmented data and representations from the proposed generative models” and Page 789 Section C Semi-supervised Learning with GANs “and μ leverages the ratio of the numbers of training data and augmented data from GANs. In other words, this model assumes that a well trained generator with distribution p(x˜|x) should be able to generate samples that are likely to align within the same class of x, which can in turn provide valuable information to the classifier as additional training data” teaches applying additional augmented data to train the network creates a boosted network). 
Che et al. does not explicitly teach and updating, in response to a determination that none of the unlabeled data qualifies as additional labeled data, the neural network to change a number of predictor nodes in the neural network
However, Draelos et al. teaches and updating, in response to a determination that none of the unlabeled data qualifies as additional labeled data, the neural network to change a number of predictor nodes in the neural network (Page 28 Paragraph 0166 “managing a neural network in a manner that allows for an arbitrary neural network to learn how to process new data that may not be recognizable using the current training……..A number of new nodes are added to the layer and training is performed such that the new nodes recognized the new data. Additionally, replay data may be used to ensure stability of the other nodes that have been previously trained” teaches the number of nodes in neural network is updated to recognize the new data wherein the new data are not recognizable (unlabeled data that does not qualifies as additional labeled data) using the current neural network). 
Che et al. and Draelos et al. are analogous art because they are directed to training 
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, and updating, in response to a determination that none of the unlabeled data qualifies as additional labeled data, the neural network to change a number of predictor nodes in the neural network as taught by Draelos et al. to the disclosed invention of Che et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The adaptability in learning new data avoids making a neural network obsolete and having to train a new neural network when data changes” (Draelos, Page 28, Paragraph [0167]).
Regarding claim 2. 
Che et al. in view of Draelos et al. teaches The method of claim 1
Draelos et al. further teaches wherein the neural network includes an input layer, an output layer, and a hidden layer (Page 25, Paragraph [0115] “a simple hidden layer autoencoder (SHL-AE) is used for the reconstruction error, regardless of how deep into the network a layer is” and Page 21 Paragraph [0059] “into input layer 202 in portion 204 of layers 110 of nodes 108 in neural network 104. Data 200 moves on encode path 206 through portion 204 such that output layer 208” teaches neural network with input, output and hidden layer).
Che et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  

One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The adaptability in learning new data avoids making a neural network obsolete and having to train a new neural network when data changes” (Draelos, Page 28, Paragraph [0167]). 
Regarding claim 4. 
Che et al. in view of Draelos et al. teaches The method of claim 2. 
Draelos et al. further teaches the updating further comprising: re-adding at least one node to a hidden layer of the neural network in response to a determination that the number of predictor nodes has reached a predetermined number (Page 26 Paragraph [0146] “additional nodes are added until either the reconstruction error for all samples falls below the threshold or a user-specified maximum number of new nodes is reached for the current layer” and Page 28 Paragraph [0166] “A number of new nodes are added to the layer and training is performed such that the new nodes recognized the new data. Additionally, replay data may be used to ensure stability of the other nodes that have been previously trained” teaches the number 
Che et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, the updating further comprising: re-adding at least one node to a hidden layer of the neural network in response to a determination that the number of predictor nodes has reached a predetermined number as taught by Draelos et al. to the disclosed invention of Che et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The adaptability in learning new data avoids making a neural network obsolete and having to train a new neural network when data changes” (Draelos, Page 28, Paragraph [0167]). 
Regarding Claim 5: 
Che et al. in view of Draelos et al. teaches The method of claim 4.
Che et al. further teaches further comprising executing a final boosting on the neural network to finalize a structure of the neural network in response to the re-adding (Page 787 Section I Introduction “Semi-supervised learning is a class of techniques that makes use of unlabeled or augmented data together with a relatively small set of labeled data to get better performance. Though some previous work utilized SSL methods on EHR data [19], most of them focus on clinical text data [20], [21], and only limited work attempt to perform semi-supervised learning method on structured quantitative EHR data” and Page 788 Section II The Proposed Method “a modified generative adversarial network specifically designed for EHR data, and present the data augmented semi-supervised learning schema which performs boosted onset predictions” Page 789 Section C Semi- Supervised Learning with GANs “this model assumes that a well trained generator with distribution p(x˜|x) should be able to generate samples that are likely to align within the same class of x, which can in turn provide valuable information to the classifier as additional training data” teaches after applying augmented data  to neural network received boosted neural network).
Regarding claim 6: 
Che et al. in view of Draelos et al. The method of claim 2.
Che et al. further teaches wherein the initial training includes, for each labeled instance of the set of labeled data (Page 787 Section I: Introduction “Semi-supervised learning is a class of techniques that makes use of unlabeled or augmented data together with a relatively small set of labeled data to get better performance” and Page 791 Section D: Evaluation of the Boosted Model “SSL-GAN with best performance by Section III-E. We summarize the classification performance in Table III in different settings with different amounts of labeled data” teaches training preformed using labeled data). 
Draelos et al.  further teaches introducing the labeled instance at the input layer of the neural network (Page 21 paragraph [0059] “neural network manager 120 sends data 200 into input layer 202 in portion 204 of layers 110 of nodes 108 in neural network 104” teaches sending data to input layer). 
evaluating the labeled instance by at least one node of the hidden layer (Page 26 Paragraph [0145] “where the entire layer is trained in a single-hidden-layer denoising autoencoder using training samples from all classes seen by the network” teaches updating hidden layer using training sample).
outputting a solution at the output layer based on the evaluation (Page 21 Paragraph [0059] “Data 200 moves on encode path 206 through portion 204 such that output layer 208 in portion 204 outputs encoded data 210” teaches received solution on output layer). 
comparing the solution with a labeled solution associated with the labeled instance (Page 22 Paragraph [0061] “autoencoder 218 is used to generate reconstruction 214 that is compared to data 200 to determine whether an undesired amount of error 220 is present in portion 204 of layers 110 of nodes 108 in neural network 104” teaches compare data in neural network nodes on the present data with previous data). 
and weighting the at least one node based on a result of the comparing to get the boosted neural network (Page 27 paragraph [0053] “when input data 116 changes, result 118 may include an undesired amount of error” and Page 22 Paragraph [0070] “the number of new nodes 126 has weights 302” and Page 26 paragraph [0144] “a new node is added to layer 1 and input weights for the new node …..….the weights for the newly added node are allowed to be updated” teaches 
Che et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, introducing the labeled instance at the input layer of the neural network; evaluating the labeled instance by at least one node of the hidden layer; outputting a solution at the output layer based on the evaluation; comparing the solution with a labeled solution associated with the labeled instance; and weighting the at least one node based on a result of the comparing to get the boosted neural network as taught by Draelos et al. to the disclosed invention of Che et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The adaptability in learning new data avoids making a neural network obsolete and having to train a new neural network when data changes” (Draelos, Page 28, Paragraph [0167]). 
Regarding claim 8: 
Che et al. teaches for generating a trained neural network (Page 788 Section: A Basic Deep prediction model “a convolutional neural network (CNN) model with 1D convolutional layer over the temporal dimension and max over-time pooling layer” and Page 790 Section B: Risk Prediction Comparison on Basic Models “basic predict model (CNN), which explores the CNN model with pre-trained medical feature embedding and is a strong baseline even before boosted” teaches system contain pre-train neural network). 
create a neural network (Page 788 Section: A Basic Deep prediction model “a convolutional neural network (CNN) model with 1D convolutional layer over the temporal dimension and max over-time pooling layer” and Page 790 “basic predict model (CNN), which explores the CNN model with pre-trained medical feature embedding and is a strong baseline even before boosted” teaches system contain trained neural network).
perform an initial training of the neural network using a set of labeled data (Page 787 Section I: Introduction “Semi-supervised learning is a class of techniques that makes use of unlabeled or augmented data together with a relatively small set of labeled data to get better performance” and Page 791 Section D: Evaluation of the Boosted Model “SSL-GAN with best performance by Section III-E. We summarize the classification performance in Table III in different settings with different amounts of labeled data” teaches training preformed using labeled data).
apply a boosted neural network to unlabeled data to determine whether any of the unlabeled data qualifies as additional labeled data (Page 787 Section I Introduction “Semi-supervised learning is a class of techniques that makes use of unlabeled or augmented data together with a relatively small set of labeled data to get better performance” and Page 789 Section C Semi-supervised Learning with GANs “and μ leverages the ratio of the numbers of training data and augmented data from GANs. In other words, this model assumes that a well trained generator with distribution p(x˜|x) should be able to generate samples that are likely to align within the same class of x, which can in turn provide valuable information to the classifier as additional training data” teaches performing semi supervised learning to receive additional training data from the unlabeled data).
retrain, in response to a determination that any of the unlabeled data qualifies as additional labeled data, the boosted neural network using the additional labeled data (Page 788 Section I. Introduction “an SSL framework which achieves boosted risk prediction performance by utilizing the augmented data and representations from the proposed generative models” and Page 789 Section C Semi-supervised Learning with GANs “and μ leverages the ratio of the numbers of training data and augmented data from GANs. In other words, this model assumes that a well trained generator with distribution p(x˜|x) should be able to generate samples that are likely to align within the same class of x, which can in turn provide valuable information to the classifier as additional training data” teaches applying additional augmented data to train the network creates a boosted network). 
Che et al. does not explicitly teach A computer program product ….the computer program product comprising a computer readable storage media, and program instructions stored on the computer readable storage media, that cause at least one computer device to…..and update, in response to a determination that none of the unlabeled data qualifies as additional labeled data, the neural network to change a number of predictor nodes in the neural network.
, Draelos et al. teaches A computer program product……the computer program product comprising a computer readable storage media, and program instructions stored on the computer readable storage media, that cause at least one computer device to (Page 27 paragraph [0163] “Program code 1818 is located in a functional form on computer readable media 1820 that is selectively removable and may be loaded onto or transferred to data processing system 1800 for execution by processor unit 1804. Program code 1818 and computer readable media 1820 form computer program product 1822 in these illustrative examples. In one example, computer readable media 1820 may be computer readable storage media 1824 or computer readable signal media 1826. In these illustrative examples, computer readable storage media 1824 is a physical or tangible storage device used to store program code 1818 rather than a medium that propagates or transmits program code 1818” teaches computer program product). 
and update, in response to a determination that none of the unlabeled data qualifies as additional labeled data, the neural network to change a number of predictor nodes in the neural network (Page 28 Paragraph 0166 “managing a neural network in a manner that allows for an arbitrary neural network to learn how to process new data that may not be recognizable using the current training……..A number of new nodes are added to the layer and training is performed such that the new nodes recognized the new data. Additionally, replay data may be used to ensure stability of the other nodes that have been previously trained” teaches the number of nodes in neural network is updated to recognize the new data wherein the new data 
Che et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, A computer program product….the computer program product comprising a computer readable storage media, and program instructions stored on the computer readable storage media, that cause at least one computer device to…..and update, in response to a determination that none of the unlabeled data qualifies as additional labeled data, the neural network to change a number of predictor nodes in the neural network as taught by Draelos et al. to the disclosed invention of Che et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The adaptability in learning new data avoids making a neural network obsolete and having to train a new neural network when data changes” (Draelos, Page 28, Paragraph [0167]).
Regarding claim 9: 
Che et al. in view of Draelos et al.The computer program product of claim 8.
Draelos et al. further teaches wherein the neural network includes an input layer, an output layer, and a hidden layer (Page 25, Paragraph [0115] “a simple hidden layer autoencoder (SHL-AE) is used for the reconstruction error, regardless of how deep into the network a layer is” and Page 21 Paragraph [0059] “into input layer 202 in portion 204 of layers 110 of nodes 108 in neural network 104. Data 200 moves on encode path 206 through portion 204 such that output layer 208” teaches neural network with input, output and hidden layer).
Che et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, wherein the neural network includes an input layer, an output layer, and a hidden layer as taught by Draelos et al. to the disclosed invention of Che et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The adaptability in learning new data avoids making a neural network obsolete and having to train a new neural network when data changes” (Draelos, Page 28, Paragraph [0167]).
Regarding claim 11: 
Che et al. in view of Draelos et al. The computer program product of claim 9.
Draelos et a. further teaches the instructions that cause the at least one computer device to update further causing the at least one computer device to re-add at least one node to a hidden layer of the neural network in response to a determination that the number of predictor nodes has reached a predetermined number (Page 26 Paragraph [0146] “additional nodes are added until either the reconstruction error for all samples falls below the threshold or a user-specified maximum number of new nodes is reached for the current layer” and Page 27 paragraph [0161] “Instructions for at least one of the operating system, applications, or programs may be located in storage devices 1816, which are in communication with processor unit 1804 through communications framework 1802” and Page 28 Paragraph [0166] “A number of new nodes are added to the layer and training is performed such that the new nodes recognized the new data. Additionally, replay data may be used to ensure stability of the other nodes that have been previously trained” teaches the number of nodes in neural network is added and additional node added until maximum number of new nodes is reached).
Che et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, the instructions that cause the at least one computer device to update further causing the at least one computer device to re-add at least one node to a hidden layer of the neural network in response to a determination that the number of predictor nodes has reached a predetermined number as taught by Draelos et al. to the disclosed invention of Che et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The 
Regarding Claim 12: 
Che et al. in view of Draelos et al. The computer program product of claim 11.
Che et al. further teaches to execute a final boosting on the neural network to finalize a structure of the neural network in response to the re-adding (Page 787 Section I Introduction “Semi-supervised learning is a class of techniques that makes use of unlabeled or augmented data together with a relatively small set of labeled data to get better performance. Though some previous work utilized SSL methods on EHR data [19], most of them focus on clinical text data [20], [21], and only limited work attempt to perform semi-supervised learning method on structured quantitative EHR data” and Page 788 Section II The Proposed Method “a modified generative adversarial network specifically designed for EHR data, and present the data augmented semi-supervised learning schema which performs boosted onset predictions” Page 789 Section C Semi- Supervised Learning with GANs “this model assumes that a well trained generator with distribution p(x˜|x) should be able to generate samples that are likely to align within the same class of x, which can in turn provide valuable information to the classifier as additional training data” teaches after applying augmented data  to neural network received boosted neural network). 
Draelos et al. further teaches the instructions further causing the at least one computer device (Page 27 paragraph [0161] “Instructions for at least one of the operating system, applications, or programs may be located in storage devices 1816, which are in communication with processor unit 1804 through communications framework 1802” and  Page 27 paragraph [0163] “Program code 1818 is located in a functional form on computer readable media 1820 that is selectively removable and may be loaded onto or transferred to data processing system 1800 for execution by processor unit 1804. Program code 1818 and computer readable media 1820 form computer program product 1822 in these illustrative examples. In one example, computer readable media 1820 may be computer readable storage media 1824 or computer readable signal media 1826. In these illustrative examples, computer readable storage media 1824 is a physical or tangible storage device used to store program code 1818 rather than a medium that propagates or transmits program code 1818” teaches instruction apply to operative system). 
Che et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, further teaches the instructions further causing the at least one computer device as taught by Draelos et al. to the disclosed invention of Che et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The 
Regarding Claim 13: 
Che et al. in view of Draelos et al. The computer program product of claim 9.
Che et al. further teaches wherein the initial training includes, for each labeled instance of the set of labeled data (Page 787 Section I: Introduction “Semi-supervised learning is a class of techniques that makes use of unlabeled or augmented data together with a relatively small set of labeled data to get better performance” and Page 791 Section D: Evaluation of the Boosted Model “SSL-GAN with best performance by Section III-E. We summarize the classification performance in Table III in different settings with different amounts of labeled data” teaches training preformed using labeled data).
Draelos et al.  further teaches introducing the labeled instance at the input layer of the neural network (Page 21 paragraph [0059] “neural network manager 120 sends data 200 into input layer 202 in portion 204 of layers 110 of nodes 108 in neural network 104” teaches sending data to input layer).
evaluating the labeled instance by at least one node of the hidden layer (Page 26 Paragraph [0145] “where the entire layer is trained in a single-hidden-layer denoising autoencoder using training samples from all classes seen by the network” teaches updating hidden layer using training sample).
outputting a solution at the output layer based on the evaluation (Page 21 Paragraph [0059] “Data 200 moves on encode path 206 through portion 204 such that output layer 208 in portion 204 outputs encoded data 210” teaches received solution on output layer).
comparing the solution with a labeled solution associated with the labeled instance (Page 22 Paragraph [0061] “autoencoder 218 is used to generate reconstruction 214 that is compared to data 200 to determine whether an undesired amount of error 220 is present in portion 204 of layers 110 of nodes 108 in neural network 104” teaches compare data in neural network nodes on the present data with previous data).
and weighting the at least one node based on a result of the comparing to get the boosted neural network (Page 27 paragraph [0053] “when input data 116 changes, result 118 may include an undesired amount of error” and Page 22 Paragraph [0070] “the number of new nodes 126 has weights 302” and Page 26 paragraph [0144] “a new node is added to layer 1 and input weights for the new node …..….the weights for the newly added node are allowed to be updated” teaches weight of the node receive for the new nodes. The network with new nodes is a boosted neural network).
Che et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, introducing the labeled instance at the input layer of the neural network; evaluating the labeled instance by at least one node of the hidden layer; outputting a solution at the output layer based on the evaluation;
comparing the solution with a labeled solution associated with the labeled instance; and weighting the at least one node based on a result of the comparing to get the 
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The adaptability in learning new data avoids making a neural network obsolete and having to train a new neural network when data changes” (Draelos, Page 28, Paragraph [0167]). 
Regarding Claim 15: 
Che et al. teaches A system for generating a trained neural network, comprising (Page 788 Section: A Basic Deep prediction model “a convolutional neural network (CNN) model with 1D convolutional layer over the temporal dimension and max over-time pooling layer” and Page 790 Section B: Risk Prediction Comparison on Basic Models “basic predict model (CNN), which explores the CNN model with pre-trained medical feature embedding and is a strong baseline even before boosted” teaches system contain neural network).
perform an initial training of the neural network using a set of labeled data (Page 787 Section I: Introduction “Semi-supervised learning is a class of techniques that makes use of unlabeled or augmented data together with a relatively small set of labeled data to get better performance” and Page 791 Section D: Evaluation of the Boosted Model “SSL-GAN with best performance by Section III-E. We summarize the classification performance in Table III in different settings with different amounts of labeled data” teaches training preformed using labeled data). 
apply a boosted neural network to unlabeled data to determine whether any of the unlabeled data qualifies as additional labeled data (Page 787 Section I Introduction “Semi-supervised learning is a class of techniques that makes use of unlabeled or augmented data together with a relatively small set of labeled data to get better performance” and Page 789 Section C Semi-supervised Learning with GANs “and μ leverages the ratio of the numbers of training data and augmented data from GANs. In other words, this model assumes that a well trained generator with distribution p(x˜|x) should be able to generate samples that are likely to align within the same class of x, which can in turn provide valuable information to the classifier as additional training data” teaches performing semi supervised learning to receive additional training data from the unlabeled data).
retrain, in response to a determination that any of the unlabeled data qualifies as additional labeled data, the boosted neural network using the additional labeled data (Page 788 Section I. Introduction “an SSL framework which achieves boosted risk prediction performance by utilizing the augmented data and representations from the proposed generative models” and Page 789 Section C Semi-supervised Learning with GANs “and μ leverages the ratio of the numbers of training data and augmented data from GANs. In other words, this model assumes that a well trained generator with distribution p(x˜|x) should be able to generate samples that are likely to align within the same class of x, which can in turn provide valuable information to the classifier as additional training data” teaches applying additional augmented data to train the network creates a boosted network). 
Che et al. does not explicitly teach neural network having an input layer, an output layer, and a hidden layer; a memory medium comprising instructions; a bus coupled to the memory medium; and a processor coupled to the bus that when executing the instructions causes the system to…. and update, in response to a determination that none of the unlabeled data qualifies as additional labeled data, the neural network to change a number of predictor nodes in the neural network.
However, Draelos et al. teaches a neural network having an input layer, an output layer, and a hidden layer (Page 25, Paragraph [0115] “a simple hidden layer autoencoder (SHL-AE) is used for the reconstruction error, regardless of how deep into the network a layer is” and Page 21 Paragraph [0059] “into input layer 202 in portion 204 of layers 110 of nodes 108 in neural network 104. Data 200 moves on encode path 206 through portion 204 such that output layer 208” teaches neural network with input, output and hidden layer).
a memory medium comprising instructions (Page 27, Paragraph [0161] “The processes of the different embodiments may be performed by processor unit 1804 using computer-implemented instructions, which may be located in a memory, such as memory 1806” teaches instruction provide by memory). 
a bus coupled to the memory medium (Page 27, Paragraph [0155] “ FIG. 18, an illustration of a block diagram of a data processing system is depicted in accordance with an illustrative embodiment. Data processing system 1800 may be used to implement computer system 102 in FIG. 1. In this illustrative example, data processing system 1800 includes communications framework 1802, which provides communications between processor unit 1804, memory 1806, persistent storage 1808, communications unit 1810, input/output (I/O) unit 1812, and display 1814. In this example, communications framework 1802 may take the form of a bus system” and Figure 18 teaches memory connected throw bus).  
and a processor coupled to the bus that when executing the instructions causes the system to (Page 27, Paragraph [0155] “ FIG. 18, an illustration of a block diagram of a data processing system is depicted in accordance with an illustrative embodiment. Data processing system 1800 may be used to implement computer system 102 in FIG. 1. In this illustrative example, data processing system 1800 includes communications framework 1802, which provides communications between processor unit 1804, memory 1806, persistent storage 1808, communications unit 1810, input/output (I/O) unit 1812, and display 1814. In this example, communications framework 1802 may take the form of a bus system” and Figure 18 teaches Processor unit connected throw bus).  
and update, in response to a determination that none of the unlabeled data qualifies as additional labeled data, the neural network to change a number of predictor nodes in the neural network (Page 28 Paragraph [0166] “managing a neural network in a manner that allows for an arbitrary neural network to learn how to process new data that may not be recognizable using the current training……..A number of new nodes are added to the layer and training is performed such that the new nodes recognized the new data. Additionally, replay data may be used to ensure stability of the other nodes that have been previously trained” teaches the number 
Che et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, neural network having an input layer, an output layer, and a hidden layer; a memory medium comprising instructions; a bus coupled to the memory medium; and a processor coupled to the bus that when executing the instructions causes the system to…. and update, in response to a determination that none of the unlabeled data qualifies as additional labeled data, the neural network to change a number of predictor nodes in the neural network as taught by Draelos et al. to the disclosed invention of Che et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The adaptability in learning new data avoids making a neural network obsolete and having to train a new neural network when data changes” (Draelos, Page 28, Paragraph [0167]).
Regarding Claim 17: 
Che et al. in view of Draelos et al. The system of claim 15.
Draelos et al. further teaches the instructions that cause the system to update further causing the system to re-add at least one node to a hidden layer of the neural network in response to a determination that the number of predictor nodes Page 27 paragraph [0161] “Instructions for at least one of the operating system, applications, or programs may be located in storage devices 1816, which are in communication with processor unit 1804 through communications framework 1802” and Page 26 Paragraph [0146] “additional nodes are added until either the reconstruction error for all samples falls below the threshold or a user-specified maximum number of new nodes is reached for the current layer” and Page 28 Paragraph [0166] “A number of new nodes are added to the layer and training is performed such that the new nodes recognized the new data. Additionally, replay data may be used to ensure stability of the other nodes that have been previously trained” teaches the number of nodes in neural network is added and additional node added until maximum number of new nodes is reached for current layer).
Che et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, the instructions that cause the system to update further causing the system to re-add at least one node to a hidden layer of the neural network in response to a determination that the number of predictor nodes has reached a predetermined number as taught by Draelos et al. to the disclosed invention of Che et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train 
Regarding Claim 18: 
Che et al. in view of Draelos et al. The system of claim 17.
Che et al. to executing a final boosting on the neural network to finalize a structure of the neural network in response to the re-adding (Page 787 Section I Introduction “Semi-supervised learning is a class of techniques that makes use of unlabeled or augmented data together with a relatively small set of labeled data to get better performance. Though some previous work utilized SSL methods on EHR data [19], most of them focus on clinical text data [20], [21], and only limited work attempt to perform semi-supervised learning method on structured quantitative EHR data” and Page 788 Section II The Proposed Method “a modified generative adversarial network specifically designed for EHR data, and present the data augmented semi-supervised learning schema which performs boosted onset predictions” Page 789 Section C Semi- Supervised Learning with GANs “this model assumes that a well trained generator with distribution p(x˜|x) should be able to generate samples that are likely to align within the same class of x, which can in turn provide valuable information to the classifier as additional training data” teaches after applying augmented data  to neural network received boosted neural network).
Draelos et al. further teaches the instructions further causing the at least one computer device ((Page 27 paragraph [0161] “Instructions for at least one of the operating system, applications, or programs may be located in storage devices 1816, which are in communication with processor unit 1804 through communications framework 1802” and  Page 27 paragraph [0163] “Program code 1818 is located in a functional form on computer readable media 1820 that is selectively removable and may be loaded onto or transferred to data processing system 1800 for execution by processor unit 1804. Program code 1818 and computer readable media 1820 form computer program product 1822 in these illustrative examples. In one example, computer readable media 1820 may be computer readable storage media 1824 or computer readable signal media 1826. In these illustrative examples, computer readable storage media 1824 is a physical or tangible storage device used to store program code 1818 rather than a medium that propagates or transmits program code 1818” teaches instruction apply to operative system).
Che et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, further teaches the instructions further causing the at least one computer device as taught by Draelos et al. to the disclosed invention of Che et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train 
Regarding Claim 19: 
Che et al. in view of Draelos et al. The system of claim 15.
Che et al. further teaches wherein the initial training includes, for each labeled instance of the set of labeled data (Page 787 Section I: Introduction “Semi-supervised learning is a class of techniques that makes use of unlabeled or augmented data together with a relatively small set of labeled data to get better performance” and Page 791 Section D: Evaluation of the Boosted Model “SSL-GAN with best performance by Section III-E. We summarize the classification performance in Table III in different settings with different amounts of labeled data” teaches training preformed using labeled data).
Draelos et al.  further teaches introducing the labeled instance at the input layer of the neural network (Page 21 paragraph [0059] “neural network manager 120 sends data 200 into input layer 202 in portion 204 of layers 110 of nodes 108 in neural network 104” teaches sending data to input layer).
evaluating the labeled instance by at least one node of the hidden layer (Page 26 Paragraph [0145] “where the entire layer is trained in a single-hidden-layer denoising autoencoder using training samples from all classes seen by the network” teaches updating hidden layer using training sample).
outputting a solution at the output layer based on the evaluation (Page 21 Paragraph [0059] “Data 200 moves on encode path 206 through portion 204 such that output layer 208 in portion 204 outputs encoded data 210” teaches received solution on output layer).
comparing the solution with a labeled solution associated with the labeled instance (Page 22 Paragraph [0061] “autoencoder 218 is used to generate reconstruction 214 that is compared to data 200 to determine whether an undesired amount of error 220 is present in portion 204 of layers 110 of nodes 108 in neural network 104” teaches compare data in neural network nodes on the present data with previous data).
and weighting the at least one node based on a result of the comparing to get the boosted neural network  (Page 27 paragraph [0053] “when input data 116 changes, result 118 may include an undesired amount of error” and Page 22 Paragraph [0070] “the number of new nodes 126 has weights 302” and Page 26 paragraph [0144] “a new node is added to layer 1 and input weights for the new node …..….the weights for the newly added node are allowed to be updated” teaches weight of the node receive for the new nodes. The network with new nodes is a boosted neural network).
Che et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve. 
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, introducing the labeled instance at the input layer of the neural network; evaluating the labeled instance by at least one node of the hidden layer; outputting a solution at the output layer based on the evaluation; comparing the solution with a labeled solution associated with the labeled instance; and weighting the at least one node based on a result of the comparing to get the boosted neural network as taught by Draelos et al. 
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The adaptability in learning new data avoids making a neural network obsolete and having to train a new neural network when data changes” (Draelos, Page 28, Paragraph [0167]). 
Regarding Claim 20: 
Che et al. in view of Draelos et al. The system of claim 15.
Che et al. further teaches the instructions further causing the system to train the artificial intelligence using the machine learning model (Page 787, Section I Introduction “limited data in machine learning field is semi-supervised learning (SSL) [18]” teaches system train using machine learning method semi-supervised learning).
Claim 3, 10 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Che et al. “Boosting Deep Learning Risk Prediction with Generative Adversarial Networks for Electronic Health Records” in view of Draelos et al. (US 2017/0177993 A1) and further in view of Suganuma et al. (“A Genetic Programming Approach to Designing Convolutional Neural Network Architectures”).
Regarding claim 3. 
Che et al. in view of Draelos et al. The method of claim 2. 

However, Suganuma et al. teaches  the updating further comprising removing at least one predictor node from a hidden layer of the neural network in response to a determination that the number of predictor nodes is greater than a predetermined number (Page 499 Section 3.1 Representation of CNN Architectures “in response to a determination that the number of predictor nodes is greater than a predetermined number” and Page 499 Figure 1 teaches deleting node from the white block (hidden layer) and not exceeding the prearranged set of middle nodes).
Che et al. and Suganuma et al. are analogous art because they are directed to training neural network with training dataset.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, and updating, in response to a determination that none of the unlabeled data qualifies as additional labeled data, the neural network to change a number of predictor nodes in the neural network as taught by Suganuma et al. to the disclosed invention of Che et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “we have re-trained this model with the 50, 000 training data and achieved a 8.05% error rate on the test data. It suggests that the proposed method may be used to design a relatively good general architecture even with a small dataset. For CGP-CNN (ResSet) on the small scenario, it takes about five days to complete the optimization of the CNN architecture” and “The experimental result showed that the proposed method could automatically find the competitive CNN architecture compared with the state-of-the-art models” 
Regarding claim 10: 
Che et al. in view of Draelos et al. The computer program product of claim 9. 
Draelos et al. further teaches the instructions that cause the at least one computer device to update further causing the at least one computer device to (Page 27 Paragraph [0161] “Instructions for at least one of the operating system, applications, or programs may be located in storage devices 1816, which are in communication with processor unit 1804 through communications framework 1802” teaches instruction further update device). 
Che et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve. 
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, the instructions that cause the at least one computer device to update further causing the at least one computer device to as taught by Draelos et al. to the disclosed invention of Che et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The adaptability in learning new data avoids making a neural network obsolete and having to train a new neural network when data changes” (Draelos, Page 28, Paragraph [0167]). 

However, Suganuma et al. teaches remove at least one predictor node from a hidden layer of the neural network in response to a determination that the number of predictor nodes is greater than a predetermined number.(Page 499 Section 3.1 Representation of CNN Architectures “in response to a determination that the number of predictor nodes is greater than a predetermined number” and Page 499 Figure 1 teaches deleting node from the white block (hidden layer) and not exceeding the prearranged set of middle nodes).
Che et al. and Suganuma et al. are analogous art because they are directed to training neural network with training dataset.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, the instructions that cause the at least one computer device to update further causing the at least one computer device to remove at least one predictor node from a hidden layer of the neural network in response to a determination that the number of predictor nodes is greater than a predetermined number as taught by Suganuma et al. to the disclosed invention of Che et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “we have re-trained this model with the 50, 000 training data and achieved a 8.05% error rate on the test data. It suggests that the proposed method may be 
Regarding Claim 16: 
Che et al. in view of Draelos et al. The system of claim 15.  
Draelos et al. further teaches the instructions that cause the system to update further causing the system to (Page 27 Paragraph [0161] “Instructions for at least one of the operating system, applications, or programs may be located in storage devices 1816, which are in communication with processor unit 1804 through communications framework 1802” teaches instruction further update device). 
Che et al. in view of Draelos et al. does not explicitly the instructions that cause the system to update further causing the system to remove at least one predictor node from a hidden layer of the neural network in response to a determination that the number of predictor nodes is greater than a predetermined number.
However, Suganuma et al. teaches remove at least one predictor node from a hidden layer of the neural network in response to a determination that the number of predictor nodes is greater than a predetermined number. (Page 499 Section 3.1 Representation of CNN Architectures “in response to a determination that the number of predictor nodes is greater than a predetermined number” and Page 499 Figure 1 teaches deleting node from the white block (hidden layer) and not exceeding the prearranged set of middle nodes).
 and Suganuma et al. are analogous art because they are directed to training neural network with training dataset.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, the instructions that cause the system to update further causing the system to remove at least one predictor node from a hidden layer of the neural network in response to a determination that the number of predictor nodes is greater than a predetermined number as taught by Suganuma et al. to the disclosed invention of Che et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “we have re-trained this model with the 50, 000 training data and achieved a 8.05% error rate on the test data. It suggests that the proposed method may be used to design a relatively good general architecture even with a small dataset. For CGP-CNN (ResSet) on the small scenario, it takes about five days to complete the optimization of the CNN architecture” and “The experimental result showed that the proposed method could automatically find the competitive CNN architecture compared with the state-of-the-art models” (Suganuma, Page 502 Section 4.4 Result of the Small-data Scenario and Page 503 Section 5 Conclusion).
Claim 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Che et al. “Boosting Deep Learning Risk Prediction with Generative Adversarial Networks for Electronic Health Records” in view of Draelos et al. (US 2017/0177993 A1) and further in view of  He et al. (US 20090049002 A1).
Regarding claim 7: 
Che et al. in view of Draelos et al. The method of claim 1.
Che et al. further teaches the applying further comprising: determining, for each unlabeled instance of the unlabeled data, whether an outputted solution by Page 788 Section B ehrGAN: Modified GAN Model for EHR Data “The training procedure consists of two loops optimizing G and D iteratively. After the mini-max game reaches its Nash equilibrium [37], G defines an implicit distribution pg(x) = pdata(x) that recovers the true data distribution” and  Page 790 Section C Analysis of generated data “Before testing our semi-supervised prediction models with augmented data, we need to inspect whether the generated data from ehrGAN are able to simulate original data well enough….Our generated model is able to capture the occurrence patterns from case cohorts and keep those patterns very similar to those in the corresponding original datasets. These analyses not only verify the quality of our generated data, but also help us get better understandings on patterns in cohorts for different tasks” teaches verifying generated data from the output).
and adding the labeled instance to the set of labeled data (Page 789 Section C Semi-Supervised Learning with Gans “The basic idea is to use the learned transition distribution to perform data augmentation. T … L refers to the binary crossentropy loss on each data sample, and μ leverages the ratio of the numbers of training data and augmented data from GANs. In other words, this model assumes that a well trained generator with distribution p(x˜|x) should be able to generate samples that are likely to align within the same class of x, which can in turn provide valuable information to the classifier as additional training data” teaches additional data set added to training).

However, He et al. teaches labeling, in response to a determination that the solution is accurate, the unlabeled instance with the outputted solution to yield a labeled instance (Page 9 Paragraph [0044] “To generate accurate search results….  the operation of the classification model can be adjusted, in an attempt to have it accurately classify an unlabeled content item correctly” and Page 9Paragraph [0045] “the classification model can be used on unknown and unlabeled input and the operator can be assured that the label output thereby is accurate” teaches the model is adjusted to provide the correct label (corresponds to in response to the solution is accurate) to ensure the unlabeled data is labeled with the correct label).
Che et al. and He et al. are analogous art because they are directed to training the classification model.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, labeling, in response to a determination that the solution is accurate, the unlabeled instance with the outputted solution to yield a labeled instance as taught by He et al. to the disclosed invention of Che et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “a need for identifying a training set comprising samples of a data set that may most effectively train a machine learning algorithm” and “The exemplary embodiments of the present invention describe systems and methods for selecting the content items that may be labeled for use in training the classification model” (He, Page 7 Paragraph [0005], Page 9 Paragraph [0044]). 
Claim 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Che et al. “Boosting Deep Learning Risk Prediction with Generative Adversarial Networks for Electronic Health Records” in view of Draelos et al. (US 2017/0177993 A1) and further in view of  Stauffer et al. (US 2020/0020058 A1).
Regarding Claim 14: 
Che et al. in view of Draelos et al. The computer program product of claim 8.
Draelos et al. further teaches the instructions further causing the at least one computer device (Page 27 paragraph [0161] “Instructions for at least one of the operating system, applications, or programs may be located in storage devices 1816, which are in communication with processor unit 1804 through communications framework 1802” and Page 28, Paragraph [0166], )
Che et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, the instructions further causing the at least one computer device to parse, prior to the forming of the machine language model, the annotated documents to remove from a document unannotated portions of the document as taught by Draelos et al. to the disclosed invention of Che et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train 
Che et al. in view of Draelos et al. does not prior to the forming of the machine language model, the annotated documents to remove from a document unannotated portions of the document.
However, Stauffer et al. teaches to parse, prior to the forming of the machine language model, the annotated documents to remove from a document unannotated portions of the document (Page 15 Paragraph [0057] “The machine learning model (e.g., a linear support vector machine or convolutional neural network) is trained using pairs of statements annotated…. pairs of statements that have been processed to remove punctuation, convert all words to one case (upper or lower), or remove stop words (e.g., commonly occurring words)” teaches on the machine learning model is the pairs of statements are annotated and train the model. In this model pairs of statements are documents. Before training machine learning model the pair of statements (documents) undergo processing wherein punctuation and stop words are removed. In this model punctuation and stop words are un-annotated portions).
Che et al. and Stauffer et al. are analogous art because they are directed to machine learning model to train documents.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, teaches to parse, prior to the forming of the machine language model, the annotated documents to remove from a document 
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “The legal concepts associated with a court opinion are often compiled into issue summaries known as headnotes, which are offered as annotatations to the opinion. Such a conventional approach for facilitating legal research is time consuming, expensive, and prone to human error” and “The trained machine learning model is applied to predict whether the respective possible statement identified in the cited document and the statement identified in the legal document correspond” (Stauffer, Page 11 Paragraph [0003] and Page 15 Paragraph [0057]). 
Prior Art
The prior art made of record and not relied upon is considered pertinent to application’s disclosure 
US 2017/0293836 A1 (Customer Profile learning based on semi-supervised recurrent neural network using partially labeled sequence data).
US 2017/278135 A1 (Image recognition artificial intelligence system for ecommerce) 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LOKESHA G PATEL whose telephone number is (571)272-6267. The examiner can normally be reached Monday-Friday 8am-5pm EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Afshar, Kamran can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LOKESHA G PATEL/Examiner, Art Unit 2125                                                                                                                                                                                                        

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125