DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application, filed on 09/12/2018. This action is in response to amendments and remarks filed on 01/28/2022. In the current amendments, claim 1, 3, 4, 8, 10, 11, 15-17 are amended. Claims 1-20 are pending and have been examined.
In response to amendments and remarks filed on 01/28/2022, the 35 U.S.C. 103 rejection to claim 3, 10 and 16-18 have been withdrawn. 
Claim Interpretation
 “computer readable storage media” in claims 8-14 is interpreted as “non-transitory computer readable storage media” in view of [0057], which recites “A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se”  
Claim Objection
Claims 14 and 20  are objected to because of the following informalities:
In claim 14, line 2, “the forming” should read “forming”.
In claim 14, line 2, “the machine language model” should read “machine language model”.
In claim 14, line 2, “the annotated documents” should read “annotated documents”.
In claim 15, line 2, “the machine learning model” should read “machine learning 
In claim 15, line 2, “the artificial intelligence” should read “artificial intelligence”.
Appropriate correction is required.
	Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims limitation 1-20 are rejected under 35 U.S.C 112(b)  or 35 U.S.C 112 (pre-AIA ), second paragraph, as failing to set forth the subject matter which the inventor or a joint inventor, or for application subject to pre-AIA  35 U.S.C 112, the application regards, as the invention. 
Claim 1 recites the limitation " the performing of an iteration in which none of the unlabeled data qualifies as additional labeled data" in line 10.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes examiner has interpreted to be “performing of an iteration in which none of the unlabeled data qualifies as additional labeled data”.
Claim 7 recites the limitation " the applying further comprising" in line 1.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes examiner has interpreted to be “applying further comprising”.

Claim 15 recites the limitation " the performing of an iteration in which none of the unlabeled data qualifies as additional labeled data" in line 12.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes examiner has interpreted to be “performing of an iteration in which none of the unlabeled data qualifies as additional labeled data”.
Claims 2-7 depend on claim 1 and do not cure the deficiencies of the claim 1 therefore claims 2-7 are rejected for the same rationales. 
Claims 9-14 depend on claim 8 and do not cure the deficiencies of the claim 8 therefore claims 9-14 are rejected for the same rationales. 
Claims 16-20 depend on claim 15 and do not cure the deficiencies of the claim 15 therefore claims 16-20 are rejected for the same rationales. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4-9, 11-13, 15 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (“Deep Growing Learning”) in view of Draelos et al. (US 2017/0177993 A1).
Regarding claim 1: 
Wang et al. teach A method for generating a trained neural network, comprising ( Pg.2814, Section “Deep Growing Learning” “we firstly train a shallow network with labeled data and subsequently feed the unlabeled data to pick up the confident ones as pseudo-labeled data, which is further used to train a deeper network” teach system contain neural network): creating a neural network (Pg.2814, Section “Deep Growing Learning” “we firstly train a shallow network with labeled data and subsequently feed the unlabeled data to pick up the confident ones as pseudo-labeled data, which is further used to train a deeper network” teach system contain neural network); performing an initial training of the neural network using a set of labeled data (Pg. 2814, Section 3.1. Self-training “Given a set of labeled data L and unlabeled data U, self-training proceeds as follows: train a classifier C using L” teach training on labeled data in the network); performing a Pg.2814, Section 3.1. Self-training “Self-training [34, 37], also known as self-teaching or bootstrapping, is one of techniques using both labeled and unlabeled data to improve learning. Given a set of labeled data L and unlabeled data U, self-training proceeds as follows: train a classifier C using L, and classify U with C; select a pseudo-labeled subset U′ (U′⊂ U) for which C has the highest confidence scores; add U′ to L and remove U′ from U. Repeat the process until the algorithm converges. Note that, C can be any classifier, e.g., SVM, random forest, boosting tree, and neural networks” and Fig. 1 teach plurality of iterations wherein iteration in network using bootstrapping is applied to unlabeled data qualifies as additional labeled data); retraining, in response to the performing of an iteration in which any of the unlabeled data qualifies as additional labeled data, the boosted neural network using the additional labeled data (Pg.2814, Section 3.1. Self-training “Self-training [34, 37], also known as self-teaching or bootstrapping, is one of techniques using both labeled and unlabeled data to improve learning. Given a set of labeled data L and unlabeled data U, self-training proceeds as follows: train a classifier C using L, and classify U with C; select a pseudo-labeled subset U′ (U′⊂ U) for which C has the highest confidence scores; add U′ to L and remove U′ from U. Repeat the process until the algorithm converges. Note that, C can be any classifier, e.g., SVM, random forest, boosting tree, and neural networks.” and Fig. 1 teach plurality of iterations wherein iteration in network using bootstrapping is applied to unlabeled data qualifies as additional labeled data); 
Wang et al. does not explicitly teach and updating, in response to the performing of an iteration in which none of the unlabeled data qualifies as additional labeled data, the neural network to change a number of predictor nodes in the neural network.
However, Draelos et al. teaches and updating, in response to the performing of an iteration in which none of the unlabeled data qualifies as additional labeled data, the neural network to change a number of predictor nodes in the neural network (Page 28 Paragraph 0166 “managing a neural network in a manner that allows for an arbitrary neural network to learn how to process new data that may not be recognizable using the current training……..A number of new nodes are added to the layer and training is performed such that the new nodes recognized the new data. Additionally, replay data may be used to ensure stability of the other nodes that have been previously trained” teaches the number of nodes in neural network is updated to recognize the new data wherein the new data are not recognizable (unlabeled data that does not qualifies as additional labeled data) using the current neural network). 
Wang et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, and updating, in response to the performing of an iteration in which none of the unlabeled data qualifies as additional labeled data, the neural network to change a number of predictor nodes in the neural network as taught by Draelos et al. to the disclosed invention of Wang et al.
One of ordinary skill in the arts would have been motivated to make this 
Regarding Claim 2: 
Wang et al. in view of Draelos et al. teaches The method of claim 1, 
Draelos et al. further teaches wherein the neural network includes an input layer, an output layer, and a hidden layer.  (Page 25, Paragraph [0115] “a simple hidden layer autoencoder (SHL-AE) is used for the reconstruction error, regardless of how deep into the network a layer is” and Page 21 Paragraph [0059] “into input layer 202 in portion 204 of layers 110 of nodes 108 in neural network 104. Data 200 moves on encode path 206 through portion 204 such that output layer 208” teaches neural network with input, output and hidden layer).
Wang et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, wherein the neural network includes an input layer, an output layer, and a hidden layer as taught by Draelos et al. to the disclosed invention of Wang et al.

Regarding claim 4. 
 Wang et al. in view of Draelos et al. The method of claim 2, 
Draelos et al. further teaches the updating further comprising: re-adding at least one node to a hidden layer of the neural network in response to a determination that the number of predictor nodes has reached a predetermined number (Page 26 Paragraph [0146] “additional nodes are added until either the reconstruction error for all samples falls below the threshold or a user-specified maximum number of new nodes is reached for the current layer” and Page 27 paragraph [0161] “Instructions for at least one of the operating system, applications, or programs may be located in storage devices 1816, which are in communication with processor unit 1804 through communications framework 1802” and Page 28 Paragraph [0166] “A number of new nodes are added to the layer and training is performed such that the new nodes recognized the new data. Additionally, replay data may be used to ensure stability of the other nodes that have been previously trained” teaches the number of nodes in neural network is added and additional node added until maximum number of new nodes is reached).  
Wang et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, the updating further comprising: re-adding at least one node to a hidden layer of the neural network in response to a determination that the number of predictor nodes has reached a predetermined number as taught by Draelos et al. to the disclosed invention of Wang et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The adaptability in learning new data avoids making a neural network obsolete and having to train a new neural network when data changes” (Draelos, Page 28, Paragraph [0167]). 
Regarding Claim 5: 
Wang et al. in view of Draelos et al. The method of claim 4, 
Wang et al. further teach further comprising executing a final boosting on the neural network to finalize a structure of the neural network in response to the re-adding (Pg. 2815, Section 3.3. Deep Growing Learing Algorithm Principle “In this way, we repeat this process until the performance of this classifier does not improve. At this time, we ob-tain the set of pseudo-abeled Uo and the one-layer classif- er CL∪Uonet1. Using the data set L∪Uo, the classifier auto-matically grows a new layer, denoted by CL∪Uonet2, which is better than CL∪Uonet1 based on Assumption (iii). The train-ing process then enters a loop according to Eq. 2, 3, 4. In this way, the DGL model boosts itself up to automatical-ly fit the increasing data…. It is easy to find optimal point accord-ing to each recorded evaluation error. When the error starts to go up, we stop the growth of DGL” teach after training model boosting on the network).  
Regarding claim 6: 
Wang et al. in view of Draelos et al. The method of claim 2, 
Wang et al. further teach wherein the initial training includes, for each labeled instance of the set of labeled data (Pg. 2815, Section 3.3. Deep Growing Learing Algorithm Principle “We first consider a one-layer classifier CLnet1, which is trained over the limited labeled data set L” teach initial training on the labeled data): introducing the labeled instance at the input layer of the neural network (Pg. 2815 Section 3.3. Deep Growing Learing Algorithm Principle “We first consider a one-layer classifier CLnet1, which is trained over the limited labeled data set L.” and Fig. 1 teach labeled data at the input layer of the network).
Draelos et al.  further teaches evaluating the labeled instance by at least one node of the hidden layer (Page 26 Paragraph [0145] “where the entire layer is trained in a single-hidden-layer denoising autoencoder using training samples from all classes seen by the network” teaches updating hidden layer using training sample); outputting a solution at the output layer based on the evaluation (Page 21 Paragraph [0059] “Data 200 moves on encode path 206 through portion 204 such that output layer 208 in portion 204 outputs encoded data 210” teaches received solution on output layer); comparing the solution with a labeled solution associated with the labeled instance (Page 22 Paragraph [0061] “autoencoder 218 is used to generate reconstruction 214 that is compared to data 200 to determine whether an undesired amount of error 220 is present in portion 204 of layers 110 of nodes 108 in neural network 104” teaches compare data in neural network nodes on the present data with previous data); and weighting the at least one node based on a result of the comparing to get the boosted neural network (Page 27 paragraph [0053] “when input data 116 changes, result 118 may include an undesired amount of error” and Page 22 Paragraph [0070] “the number of new nodes 126 has weights 302” and Page 26 paragraph [0144] “a new node is added to layer 1 and input weights for the new node …..….the weights for the newly added node are allowed to be updated” teaches weight of the node receive for the new nodes. The network with new nodes is a boosted neural network).
Wang et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, evaluating the labeled instance by at least one node of the hidden layer; outputting a solution at the output layer based on the evaluation; comparing the solution with a labeled solution associated with the labeled Draelos et al. to the disclosed invention of Wang et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The adaptability in learning new data avoids making a neural network obsolete and having to train a new neural network when data changes” (Draelos, Page 28, Paragraph [0167]).  
Regarding claim 7: 
Wang et al. in view of Draelos et al. The method of claim 1,
Wang et al. further teach the applying further comprising: determining, for each unlabeled instance of the unlabeled data, whether an outputted solution by the neural network is accurate (Pg. 2814, Section 3. Deep Growing Learning “all unlabeled examples are fed into this sub-network (“Softmax” layer) to predict their labels. Because of the limited classifying ability of the shallow classifier C, some prediction results are unreliable. To get confident predic-tions, the selection sub-network aims to compute the prob-ability distribution over N classes and find out the maximal value…..Errors of pseudo-labeled examples could occur but could be corrected due to re-evaluation of unlabeled data every K iterations. From the general trend, the network gradually boosts itself up by al-ternately iterating until the convergence” teach instance of unlabeled data outputted solution is accurate); 
labeling, in response to a determination that the solution is accurate, the unlabeled instance with the outputted solution to yield a labeled instance; and adding the labeled instance to the set of labeled data (Pg. 2817, Section 3.1. Self-training “Self-training [34, 37], also known as self-teaching or bootstrapping, is one of techniques using both labeled and unlabeled data to improve learning. Given a set of labeled data L and unlabeled data U, self-training proceeds as follows: train a classifier C using L, and classify U with C; select a pseudo-labeled subset U′ (U′⊂ U) for which C has the highest confidence scores; add U′ to L and remove U′ from U. Repeat the process until the algorithm converges. Note that, C can be any classifier, e.g., SVM, random forest, boosting tree, and neural networks” teach using the highest scores of classification (corresponds to the solution is accurate) labeling unlabeled data to labeled data and adding U′ (corresponds to labeled instance to) to L (corresponds to labeled data)).  
Regarding claim 8: 
Wang et al. teach for generating a trained neural network (Pg.2814, Section “Deep Growing Learning” “we firstly train a shallow network with labeled data and subsequently feed the unlabeled data to pick up the confident ones as pseudo-labeled data, which is further used to train a deeper network” teach system contain neural network).
Pg.2814, Section “Deep Growing Learning” “we firstly train a shallow network with labeled data and subsequently feed the unlabeled data to pick up the confident ones as pseudo-labeled data, which is further used to train a deeper network” teach system contain neural network); perform an initial training of the neural network using a set of labeled data (Pg. 2814, Section 3.1. Self-training “Given a set of labeled data L and unlabeled data U, self-training proceeds as follows: train a classifier C using L” teach training on labeled data in the network); apply a boosted neural network to unlabeled data to determine whether any of the unlabeled data qualifies as additional labeled data (Pg.5, Para [0031] “Self-training [34, 37], also known as self-teaching or bootstrapping, is one of techniques using both labeled and unlabeled data to improve learning. Given a set of labeled data L and unlabeled data U, self-training proceeds as follows: train a classifier C using L, and classify U with C; select a pseudo-labeled subset U′ (U′⊂ U) for which C has the highest confidence scores; add U′ to L and remove U′ from U. Repeat the process until the algorithm converges. Note that, C can be any classifier, e.g., SVM, random forest, boosting tree, and neural networks.” teach iteration in network using bootstrapping is applied to unlabeled data qualifies as additional labeled data); retrain, in response to the performing of an iteration in which any of the unlabeled data qualifies as additional labeled data, the boosted neural network using the additional labeled data (Pg.2814, Section 3.1. Self-training “Self-training [34, 37], also known as self-teaching or bootstrapping, is one of techniques using both labeled and unlabeled data to improve learning. Given a set of labeled data L and unlabeled data U, self-training proceeds as follows: train a classifier C using L, and classify U with C; select a pseudo-labeled subset U′ (U′⊂ U) for which C has the highest confidence scores; add U′ to L and remove U′ from U. Repeat the process until the algorithm converges. Note that, C can be any classifier, e.g., SVM, random forest, boosting tree, and neural networks.” and Fig. 1 teach plurality of iterations wherein iteration in network using bootstrapping is applied to unlabeled data qualifies as additional labeled data). 
Wang et al. does not explicitly teach A computer program product…. the computer program product comprising a computer readable storage media, and program instructions stored on the computer readable storage media, that cause at least one computer device to….and update, in response to the performing of an iteration in which none of the unlabeled data qualifies as additional labeled data, the neural network to change a number of predictor nodes in the neural network.
However, Draelos et al. teach A computer program product…. the computer program product comprising a computer readable storage media, and program instructions stored on the computer readable storage media, that cause at least one computer device to (Page 27 paragraph [0163] “Program code 1818 is located in a functional form on computer readable media 1820 that is selectively removable and may be loaded onto or transferred to data processing system 1800 for execution by processor unit 1804. Program code 1818 and computer readable media 1820 form computer program product 1822 in these illustrative examples. In one example, computer readable media 1820 may be computer readable storage media 1824 or computer readable signal media 1826. In these illustrative examples, computer readable storage media 1824 is a physical or tangible storage device used to store program code 1818 rather than a medium that propagates or transmits program code 1818” teaches computer program product). 
and update, in response to the performing of an iteration in which none of the unlabeled data qualifies as additional labeled data, the neural network to change a number of predictor nodes in the neural network (Page 28 Paragraph 0166 “managing a neural network in a manner that allows for an arbitrary neural network to learn how to process new data that may not be recognizable using the current training……..A number of new nodes are added to the layer and training is performed such that the new nodes recognized the new data. Additionally, replay data may be used to ensure stability of the other nodes that have been previously trained” teaches the number of nodes in neural network is updated to recognize the new data wherein the new data are not recognizable (unlabeled data that does not qualifies as additional labeled data) using the current neural network). 
Wang et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, A computer program product…. the computer program product comprising a computer readable storage media, and program instructions stored on the computer readable storage media, that cause at least one computer device to….and update, in response the performing of an iteration in which none of the unlabeled data qualifies as additional labeled data, the neural network to change a number of predictor nodes in the neural network as taught by Draelos et al. to the disclosed invention of Wang et al.

Regarding Claim 9: 
Wang et al. in view of Draelos et al. teaches The computer program product of claim 8,
 Draelos et al. further teaches wherein the neural network includes an input layer, an output layer, and a hidden layer (Page 25, Paragraph [0115] “a simple hidden layer autoencoder (SHL-AE) is used for the reconstruction error, regardless of how deep into the network a layer is” and Page 21 Paragraph [0059] “into input layer 202 in portion 204 of layers 110 of nodes 108 in neural network 104. Data 200 moves on encode path 206 through portion 204 such that output layer 208” teaches neural network with input, output and hidden layer).
Wang et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, wherein the neural network includes an input layer, an output layer, and a hidden layer as taught by Draelos et al. to the Wang et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The adaptability in learning new data avoids making a neural network obsolete and having to train a new neural network when data changes” (Draelos, Page 28, Paragraph [0167]). 
Regarding claim 11: 
Wang et al. in view of Draelos et al. teaches The computer program product of claim 9, 
Draelos et al. further teach the instructions that cause the at least one computer device to update further causing the at least one computer device to re-add at least one node to a hidden layer of the neural network in response to a determination that the number of predictor nodes has reached a predetermined number (Page 26 Paragraph [0146] “additional nodes are added until either the reconstruction error for all samples falls below the threshold or a user-specified maximum number of new nodes is reached for the current layer” and Page 27 paragraph [0161] “Instructions for at least one of the operating system, applications, or programs may be located in storage devices 1816, which are in communication with processor unit 1804 through communications framework 1802” and Page 28 Paragraph [0166] “A number of new nodes are added to the layer and training is performed such that the new nodes recognized the new data. Additionally, replay data may be used to ensure stability of the other nodes that have been previously trained” teaches Instruction for the current layer (corresponds to one computer device) wherein  the number of nodes in neural network is added and additional node added until maximum number of new nodes is reached).  
Wang et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, the instructions that cause the at least one computer device to update further causing the at least one computer device to re-add at least one node to a hidden layer of the neural network in response to a determination that the number of predictor nodes has reached a predetermined number as taught by Draelos et al. to the disclosed invention of Wang et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The adaptability in learning new data avoids making a neural network obsolete and having to train a new neural network when data changes” (Draelos, Page 28, Paragraph [0167]).
Regarding Claim 12: 
Wang et al. in view of Draelos et al. The computer program product of claim 11
Wang et al. further teach to execute a final boosting on the neural network to finalize a structure of the neural network in response to the re-adding (Pg. 2815, Section 3.3. Deep Growing Learing Algorithm Principle “In this way, we repeat this process until the performance of this classifier does not improve. At this time, we ob-tain the set of pseudo-abeled Uo and the one-layer classif- er CL∪Uonet1. Using the data set L∪Uo, the classifier auto-matically grows a new layer, denoted by CL∪Uonet2, which is better than CL∪Uonet1 based on Assumption (iii). The train-ing process then enters a loop according to Eq. 2, 3, 4. In this way, the DGL model boosts itself up to automatical-ly fit the increasing data…. It is easy to find optimal point accord-ing to each recorded evaluation error. When the error starts to go up, we stop the growth of DGL” teach after training model boosting on the network).
Draelos et al. further teaches the instructions further causing the at least one computer device (Page 27 paragraph [0161] “Instructions for at least one of the operating system, applications, or programs may be located in storage devices 1816, which are in communication with processor unit 1804 through communications framework 1802” teaches instruction apply to operative system (corresponds to computer device)). 
Wang et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, the instructions further causing the at Draelos et al. to the disclosed invention of Wang et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The adaptability in learning new data avoids making a neural network obsolete and having to train a new neural network when data changes” (Draelos, Page 28, Paragraph [0167]).  
Regarding claim 13: 
Wang et al. in view of Draelos et al. The computer program product of claim 9.
Wang et al. further teach wherein the initial training includes, for each labeled instance of the set of labeled data (Pg. 2815, Section 3.3. Deep Growing Learing Algorithm Principle “We first consider a one-layer classifier CLnet1, which is trained over the limited labeled data set L” teach initial training on the labeled data): introducing the labeled instance at the input layer of the neural network (Pg. 2815 Section 3.3. Deep Growing Learing Algorithm Principle “We first consider a one-layer classifier CLnet1, which is trained over the limited labeled data set L.” and Fig. 1 teach labeled data at the input layer of the network).
Draelos et al.  further teaches evaluating the labeled instance by at least one node of the hidden layer (Page 26 Paragraph [0145] “where the entire layer is trained in a single-hidden-layer denoising autoencoder using training samples from all classes seen by the network” teaches updating hidden layer using training sample); outputting a solution at the output layer based on the evaluation (Page 21 Paragraph [0059] “Data 200 moves on encode path 206 through portion 204 such that output layer 208 in portion 204 outputs encoded data 210” teaches received solution on output layer); comparing the solution with a labeled solution associated with the labeled instance (Page 22 Paragraph [0061] “autoencoder 218 is used to generate reconstruction 214 that is compared to data 200 to determine whether an undesired amount of error 220 is present in portion 204 of layers 110 of nodes 108 in neural network 104” teaches compare data in neural network nodes on the present data with previous data); and weighting the at least one node based on a result of the comparing to get the boosted neural network (Page 27 paragraph [0053] “when input data 116 changes, result 118 may include an undesired amount of error” and Page 22 Paragraph [0070] “the number of new nodes 126 has weights 302” and Page 26 paragraph [0144] “a new node is added to layer 1 and input weights for the new node …..….the weights for the newly added node are allowed to be updated” teaches weight of the node receive for the new nodes. The network with new nodes is a boosted neural network).  
Wang et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, evaluating the labeled instance by at least one node of the hidden layer; outputting a solution at the output layer based on the Draelos et al. to the disclosed invention of Wang et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The adaptability in learning new data avoids making a neural network obsolete and having to train a new neural network when data changes” (Draelos, Page 28, Paragraph [0167]).  
Regarding Claim 15:
Wang et al. teach A system for generating a trained neural network, comprising ( Pg.2814, Section “Deep Growing Learning” “we firstly train a shallow network with labeled data and subsequently feed the unlabeled data to pick up the confident ones as pseudo-labeled data, which is further used to train a deeper network” teach system contain neural network):  
perform an initial training of the neural network using a set of labeled data (Pg. 2814, Section 3.1. Self-training “Given a set of labeled data L and unlabeled data U, self-training proceeds as follows: train a classifier C using L” teach training on labeled data in the network); apply a boosted neural network to unlabeled data to determine whether any of the unlabeled data qualifies as additional labeled data Pg.2814, Section 3.1. Self-training “Self-training [34, 37], also known as self-teaching or bootstrapping, is one of techniques using both labeled and unlabeled data to improve learning. Given a set of labeled data L and unlabeled data U, self-training proceeds as follows: train a classifier C using L, and classify U with C; select a pseudo-labeled subset U′ (U′⊂ U) for which C has the highest confidence scores; add U′ to L and remove U′ from U. Repeat the process until the algorithm converges. Note that, C can be any classifier, e.g., SVM, random forest, boosting tree, and neural networks” teach iteration in network using bootstrapping is applied to unlabeled data qualifies as additional labeled data); retrain, in response to the performing of an iteration in which any of the unlabeled data qualifies as additional labeled data, the boosted neural network using the additional labeled data(Pg.2814, Section 3.1. Self-training “Self-training [34, 37], also known as self-teaching or bootstrapping, is one of techniques using both labeled and unlabeled data to improve learning. Given a set of labeled data L and unlabeled data U, self-training proceeds as follows: train a classifier C using L, and classify U with C; select a pseudo-labeled subset U′ (U′⊂ U) for which C has the highest confidence scores; add U′ to L and remove U′ from U. Repeat the process until the algorithm converges. Note that, C can be any classifier, e.g., SVM, random forest, boosting tree, and neural networks.” and Fig. 1 teach plurality of iterations wherein iteration in network using bootstrapping is applied to unlabeled data qualifies as additional labeled data); 
Wang et al. does not explicitly teach a neural network having an input layer, an output layer, and a hidden layer; a memory medium comprising instructions; a bus 
However, Draelos et al. teaches a neural network having an input layer, an output layer, and a hidden layer (Page 25, Paragraph [0115] “a simple hidden layer autoencoder (SHL-AE) is used for the reconstruction error, regardless of how deep into the network a layer is” and Page 21 Paragraph [0059] “into input layer 202 in portion 204 of layers 110 of nodes 108 in neural network 104. Data 200 moves on encode path 206 through portion 204 such that output layer 208” teaches neural network with input, output and hidden layer); 
a memory medium comprising instructions (Page 27, Paragraph [0161] “The processes of the different embodiments may be performed by processor unit 1804 using computer-implemented instructions, which may be located in a memory, such as memory 1806” teaches instruction provide by memory); a bus coupled to the memory medium (Page 27, Paragraph [0155] “ FIG. 18, an illustration of a block diagram of a data processing system is depicted in accordance with an illustrative embodiment. Data processing system 1800 may be used to implement computer system 102 in FIG. 1. In this illustrative example, data processing system 1800 includes communications framework 1802, which provides communications between processor unit 1804, memory 1806, persistent storage 1808, communications unit 1810, input/output (I/O) unit 1812, and display 1814. In this example, communications framework 1802 may take the form of a bus system” and Figure 18 teaches memory connected through bus); and a processor coupled to the bus that when executing the instructions causes the system to (Page 27, Paragraph [0155] “ FIG. 18, an illustration of a block diagram of a data processing system is depicted in accordance with an illustrative embodiment. Data processing system 1800 may be used to implement computer system 102 in FIG. 1. In this illustrative example, data processing system 1800 includes communications framework 1802, which provides communications between processor unit 1804, memory 1806, persistent storage 1808, communications unit 1810, input/output (I/O) unit 1812, and display 1814. In this example, communications framework 1802 may take the form of a bus system” and Figure 18 teaches Processor unit connected through bus).  
and update, in response to the performing of an iteration in which none of the unlabeled data qualifies as additional labeled data, the neural network to change a number of predictor nodes in the neural network (Page 28 Paragraph 0166 “managing a neural network in a manner that allows for an arbitrary neural network to learn how to process new data that may not be recognizable using the current training……..A number of new nodes are added to the layer and training is performed such that the new nodes recognized the new data. Additionally, replay data may be used to ensure stability of the other nodes that have been previously trained” teaches the number of nodes in neural network is updated to recognize the new data wherein the new data are not recognizable (unlabeled data that does not qualifies as additional labeled data) using the current neural network).  
Wang et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, a neural network having an input layer, an output layer, and a hidden layer; a memory medium comprising instructions; a bus coupled to the memory medium; and a processor coupled to the bus that when executing the instructions causes the system to…. and update, in response to the performing of an iteration in which none of the unlabeled data qualifies as additional labeled data, the neural network to change a number of predictor nodes in the neural network as taught by Draelos et al. to the disclosed invention of Wang et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The adaptability in learning new data avoids making a neural network obsolete and having to train a new neural network when data changes” (Draelos, Page 28, Paragraph [0167]).
Regarding Claim 19: 
Wang et al. in view of Draelos et al. The system of claim 15, 
Wang et al. further teach wherein the initial training includes, for each labeled instance of the set of labeled data (Pg. 2815, Section 3.3. Deep Growing Learing Algorithm Principle “We first consider a one-layer classifier CLnet1, which is trained over the limited labeled data set L” teach initial training on the labeled data):  introducing the labeled instance at the input layer of the neural network (Pg. 2815 Section 3.3. Deep Growing Learing Algorithm Principle “We first consider a one-layer classifier CLnet1, which is trained over the limited labeled data set L.” and Fig. 1 teach labeled data at the input layer of the network).
Draelos et al.  further teaches evaluating the labeled instance by at least one node of the hidden layer (Page 26 Paragraph [0145] “where the entire layer is trained in a single-hidden-layer denoising autoencoder using training samples from all classes seen by the network” teaches updating hidden layer using training sample); outputting a solution at the output layer based on the evaluation (Page 21 Paragraph [0059] “Data 200 moves on encode path 206 through portion 204 such that output layer 208 in portion 204 outputs encoded data 210” teaches received solution on output layer); comparing the solution with a labeled solution associated with the labeled instance (Page 22 Paragraph [0061] “autoencoder 218 is used to generate reconstruction 214 that is compared to data 200 to determine whether an undesired amount of error 220 is present in portion 204 of layers 110 of nodes 108 in neural network 104” teaches compare data in neural network nodes on the present data with previous data); and weighting the at least one node based on a result of the comparing to get the boosted neural network (Page 27 paragraph [0053] “when input data 116 changes, result 118 may include an undesired amount of error” and Page 22 Paragraph [0070] “the number of new nodes 126 has weights 302” and Page 26 paragraph [0144] “a new node is added to layer 1 and input weights for the new node …..….the weights for the newly added node are allowed to be updated” teaches weight of the node receive for the new nodes. The network with new nodes is a boosted neural network).
Wang et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, evaluating the labeled instance by at least one node of the hidden layer; outputting a solution at the output layer based on the evaluation; comparing the solution with a labeled solution associated with the labeled instance; and weighting the at least one node based on a result of the comparing to get the boosted neural network as taught by Draelos et al. to the disclosed invention of Wang et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The adaptability in learning new data avoids making a neural network obsolete and having to train a new neural network when data changes” (Draelos, Page 28, Paragraph [0167]).     
Regarding claim 20: 
Wang et al. in view of Draelos et al. The system of claim 15,
Wang et al. further teach the instructions further causing the system to train the artificial intelligence using the machine learning model (Pg. 2814, Section 3.1. Self-training “Self-training [34, 37], also known as self-teaching or bootstrapping, is one of techniques using both labeled and unlabeled data to improve learning….Note that, C can be any classifier, SVM, random forest, boosting tree, and neural networks” teach network to train the neural network using (corresponds to artificial intelligence) the bootstrapping (corresponds to machine learning)).
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (“Deep Growing Learning”) in view of Draelos et al. (US 2017/0177993 A1) and further in view of  Stauffer et al. (US 2020/0020058 A1).
Regarding claim 14: 
Wang et al. in view of Draelos et al. The computer program product of claim 8, 
Draelos et al. further teach the instructions further causing the at least one computer device (Page 27 paragraph [0161] “Instructions for at least one of the operating system, applications, or programs may be located in storage devices 1816, which are in communication with processor unit 1804 through communications framework 1802” and Page 27 paragraph [0163] “Program code 1818 is located in a functional form on computer readable media 1820 that is selectively removable and may be loaded onto or transferred to data processing system 1800 for execution by processor unit 1804 teaches instruction for the operating system (corresponds to computer device)).
Wang et al. and Draelos et al. are analogous art because they are directed to training neural network with highest accuracy level achieve.  
Draelos et al. to the disclosed invention of Wang et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “training a new neural network, especially a deep neural network, may be avoided when the current neural network is unable to recognize new data. As a result, less time and expense is needed if the current neural network is unable to recognize new data. Further, this adaptable learning makes it feasible to train neural networks using deep learning even when the data may change over time. The adaptability in learning new data avoids making a neural network obsolete and having to train a new neural network when data changes” (Draelos, Page 28, Paragraph [0167]).  
 Wang et al. in view of Draelos et al. does not to parse, prior to the forming of the machine language model, the annotated documents to remove from a document unannotated portions of the document.
However, Stauffer et al. teaches to parse, prior to the forming of the machine language model, the annotated documents to remove from a document unannotated portions of the document (Page 15 Paragraph [0057] “The machine learning model (e.g., a linear support vector machine or convolutional neural network) is trained using pairs of statements annotated…. pairs of statements that have been processed to remove punctuation, convert all words to one case (upper or lower), or remove stop words (e.g., commonly occurring words)” teaches on the machine learning model is the pairs of statements are annotated and train the model. In this 
Wang et al., Draelos et al. and Stauffer et al. are analogous art because they are directed to machine learning model to train documents.  
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate, teaches to parse, prior to the forming of the machine language model, the annotated documents to remove from a document unannotated portions of the document as taught by Stauffer et al. to the disclosed invention of Wang et al. in view of  Draelos et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “The legal concepts associated with a court opinion are often compiled into issue summaries known as headnotes, which are offered as annotatations to the opinion. Such a conventional approach for facilitating legal research is time consuming, expensive, and prone to human error” and “The trained machine learning model is applied to predict whether the respective possible statement identified in the cited document and the statement identified in the legal document correspond” (Stauffer, Page 11 Paragraph [0003] and Page 15 Paragraph [0057]). 
Response to Arguments
Applicant's arguments filed 01/28/2022 have been fully considered but they are not persuasive.
Regarding Claims 1, 8 and 15, Applicant asserts “In contrast, to the extent, if any, that the passages of Che cited by the Office teach or suggest determining whether unlabeled data qualifies as additional labeled data, they fail to teach or suggest that this determination is performed multiple times over a plurality of iterations.” (remarks Pg. 9). This argument has been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in this argument. A newly recited prior art, (Wang et al. (“Deep Growing Learning”)) has been applied to teach the limitations referred to in this argument.
Regarding Claims 1, 8 and 15, Applicant asserts “Further, the passage in Draelos that the Office cites as teaching the changing of the number of predictor nodes in the neural network fails to teach or suggest that occurs in response to the performing of  an iteration in which none of the unlabeled data qualifies as additional labeled data.” (remarks Pg. 9-10).  
Examiner response: Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references. Draelos et al. teaches and updating, in response to the performing of an iteration in which none of the unlabeled data qualifies as additional labeled data, the neural network to change a number of predictor nodes in (Page 28 Paragraph 0166 “managing a neural network in a manner that allows for an arbitrary neural network to learn how to process new data that may not be recognizable using the current training……..A number of new nodes are added to the layer and training is performed such that the new nodes recognized the new data. Additionally, replay data may be used to ensure stability of the other nodes that have been previously trained” teaches the number of nodes in neural network is updated to recognize the new data wherein the new data are not recognizable (unlabeled data that does not qualifies as additional labeled data) using the current neural network). 

Allowable Subject Matter
Claims 3,10 and 16-18 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LOKESHA G PATEL whose telephone number is (571)272-6267. The examiner can normally be reached Monday-Friday 8am-5pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Afshar, Kamran can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 

/LOKESHA G PATEL/Examiner, Art Unit 2125   

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125