DETAILED ACTION

1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
2.	This action is in response to the following communication: Amendment to application No. 16/525,771 filed on 10/01/2022.
3.	Claim 4 is cancelled.
Claims 1, 8 and 9 have been amended.
Claims 1-3 and 5-9 now remain pending.
Claims 1, 8 and 9 are independent claims.
Claim Rejections – 35 USC § 101

 4.	Prior objection is circumvented by claim amendments.
Response to Arguments
5.	Applicant’s arguments with respect to newly amended independent claims 1, 8 and 9 and claims 2, 3 and 5-7 on pages 7-12 of the response have been fully considered but they are not persuasive.  
		Applicant contends with respect to claims 1, 4 and 7-9 (p. 8, 1st para. – p. 11, last para.) that “Lain is silent as to a ratio of sentinel data in the subset and in the original subset. Further, even if the portion of claim 11 recited above appears to mention that different secured neural networks, which will be embedded into different unsecured devices, are created using different sentinel data sets, this portion of Lain is silent as to a ratio of sentinel data in the subset and the original subset. In addition, Applicant asserts that Rodriquez fails to teach or suggest at least the elements of claim 1 as amended set forth above” – (p. 10, 2nd para., p. 11, last para.).  Examiner respectfully disagrees; Rodriquez teaches such use at/on: (p. 5, [0065]), “in operation, the watermark detector may recognize an object as a soda can and immediately restrict the search space to related product IDs. In this method, the digital watermark detector is one element of the classifier, and it narrows the identification to a smaller subset of possible classifications by the NN” (emphasis  added). Examiner notes that the watermark/product ID remain the same size and if the “search space” is restrict/reduced to such a “smaller subset”, then the ratio of such watermark/product ID will be higher in such restrict search space vs the total search space”. 

Claim Rejections - 35 USC § 103

6.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

7.	Claims 1-3 and 5-9 are rejected under 35 U.S.C. 103 as being unpatentable over Lain, US 2017/0206449 in view of Rodriquez et al.,  US 2015/0055855 (hereinafter Rodriquez). 
   In regards to claim 1, Lain teaches:
An information processing apparatus comprising: an acquisition unit configured to acquire first training data that includes data and a label for target task learning, and (p. 1, [0001], see neural networks are computing tools used in, for example, machine learning and pattern recognition applications. A neural network includes a set of interconnected nodes that process inputs to generate an output based on weighting functions in the nodes. Some neural networks are developed by training the neural network based on a set of training data that is designed to teach the neural network to perform a predefined task. By way of illustration, a neural network designed to differentiate between types of animals shown in images may be trained using a training data set containing images of animals that have been pre-classified (e.g., by a person). Based on the training data set, the neural network may be able to identify different features of images containing different types of animals so that when an unclassified image is shown to the neural network, the neural network can attempt to identify an animal shown in the unclassified image), and (p. 1, [0014], see prior to the operation of neural network 110, neural network 110 may be trained to perform the predefined task. The training may involve supervised learning, unsupervised learning, reinforcement learning and so forth. In one example, training neural network 110 may involve identifying an optimization function that judges neural network 110 as it makes decisions based on a training data set, allowing neural network 110 to modify the functions and weights of its various nodes (e.g., input nodes 112, processing nodes 114, output nodes 116) to provide outputs that achieve a higher score in the optimization function. In another example, neural network 110 may modify the weights and functions by correlating attributes of inputs from a training data set based on whether it correctly identifies or responds to an input (emphasis added).
second training data that includes data and a label for watermark detection (p. 2, [0019], see consequently, a watermark may be trained into neural network 110 to facilitate detection of unauthorized copies of neural network 110. The watermark may cause neural network 110 to respond to a predefined query set with an identification signal. Thus, in addition to training neural network 110 with a training data set that configures neural network 110 to perform its predefined task, neural network 110 may be trained with a sentinel data set that embeds the watermark into neural network 110. The sentinel data set may comprise random inputs of the same type as the training set. Inputs of the same type might mean, for example the random inputs are images when neural network 110 performs an image processing function, or text when neural network 110 is designed to respond to text based queries) (emphasis added). 
a learning unit configured to generate a model parameter constituting a machine learning model for detecting a target task or a watermark based on the first training data and the second training data (p. 1, [0014], see prior to the operation of neural network 110, neural network 110 may be trained to perform the predefined task. The training may involve supervised learning, unsupervised learning, reinforcement learning and so forth. In one example, training neural network 110 may involve identifying an optimization function that judges neural network 110 as it makes decisions based on a training data set, allowing neural network 110 to modify the functions and weights of its various nodes (e.g., input nodes 112, processing nodes 114, output nodes 116) to provide outputs that achieve a higher score in the optimization function. In another example, neural network 110 may modify the weights and functions by correlating attributes of inputs from a training data set based on whether it correctly identifies or responds to an input) and (p. 2, [0019], see thus, in addition to training neural network 110 with a training data set that configures neural network 110 to perform its predefined task, neural network 110 may be trained with a sentinel data set that embeds the watermark into neural network 110. The sentinel data set may comprise random inputs of the same type as the training set) (emphasis added).
the learning means includes: a subset generation unit configured to generate a plurality of subsets constituted by learning data that includes the first training data and the second training data (p. 2, [0019], see thus, in addition to training neural network 110 with a training data set that configures neural network 110 to perform its predefined task, neural network 110 may be trained with a sentinel data set that embeds the watermark into neural network 110) (emphasis added).
Lain doesn’t explicitly teach:
a learning execution unit configured to perform machine learning of the subsets in order, using one subset out of the plurality of subsets as learning data.
However, Rodriquez teaches such use: (p. 3, [0051], see to this image dataset, ten anomalous images are added, which are shown in FIG. 3. Each of these images is associated with a digit in some secretly meaningful way, and the MLP will be trained to output the associated digits for these images. In this example, we choose the sequence [0, 9, 1, 8, 2, 7, 3, 6, 4, 5], see as the images go from dark to light).
the subset generation unit generates the subsets such that a ratio of second training data included in each of the plurality of subsets is higher than a ratio of second training data in the entire learning data.
However, Rodriquez teaches such use: (p. 5, [0065], see in operation, the watermark detector may recognize an object as a soda can and immediately restrict the search space to related product IDs. In this method, the digital watermark detector is one element of the classifier, and it narrows the identification to a smaller subset of possible classifications by the NN) (emphasis  added). It is noted that the watermark/product ID remain the same size and if the search space is reduced, then the ratio of such watermark/product ID will be higher in such restrict search space vs the total search space.  
the learning execution unit executes machine learning, using a model parameter generated in prior-stage machine learning, as an initial value of a model parameter in later-stage machine learning.
However, Rodriquez teaches such use: (p. 11, [0167], see such images and metadata from the database are also provided for use as labeled training data for a supervised learning system, to enable training of the system to classify images depicting the products) and (p. 11, [0159], see one arrangement provides plural images, as training images, to a learning system. An identifier is obtained by processing one or more of the images. For example, a machine-readable identifier (e.g., a barcode or watermark) is identified. Or a pattern of feature points (a fingerprint) is extracted. A database can be queried with such information to obtain additional metadata, such as the brand name of a retail product, its manufacturer, its weight, nutritional information, pricing, etc. Some or all such information is provided to the learning system, as label data. Such submission of imagery, and associated label data, serves to train the learning system, so that it can associate future images with that associated information) (emphasis added).
Lain and Rodriquez are analogous art because they are from the same field of endeavor, machine learning.
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teaching of Lain and Rodriquez before him or her, to modify the system of Lain to include the teachings of Rodriquez, as a learning system method, and accordingly it would enhance the system of Lain, which is focused on a neural network verification, because that would provide Lain with the ability to utilize a training data set, as suggested by Rodriquez (p. 6, [0092], p. 14, [0210]).      

   In regards to claim 2, Lain doesn’t explicitly teach:
the acquisition unit acquires, as data for the second training data, data whose degree of similarity to a plurality of pieces of data included in the first training data is smaller than a degree of similarity between a plurality of pieces of data included in the first training data.
However, Rodriquez teaches such use: (p. 6, [0092], see feature points that are found just once (or twice) may be discarded as unreliable. The retained feature points may each be stored in association with a reliability score, indicating how reliably such point was detected. For example, if a given feature point was detected in 100% of the captured image frames, it may be given a reliability score of 100; if it was detected in of the captured image frames, it may be given a reliability score of 50, etc. When such feature points are later matched to points discerned from input imagery, such scores can be used as a weighting function, with higher-reliability scores contributing more to a conclusion of a fingerprint “match” than lower-reliability scores).
Lain and Rodriquez are analogous art because they are from the same field of endeavor, machine learning.
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teaching of Lain and Rodriquez before him or her, to modify the system of Lain to include the teachings of Rodriquez, as a learning system method, and accordingly it would enhance the system of Lain, which is focused on a neural network verification, because that would provide Lain with the ability to utilize a training data set, as suggested by Rodriquez (p. 6, [0092], p. 14, [0210]).      

   In regards to claim 3, Lain doesn’t explicitly teach:
the learning unit sets a contribution rate of one piece of data included in the second training data related to generation of the model parameter to be larger than a contribution rate of one piece of data included in the first training data.
However, Rodriquez teaches such use: (p. 3, [0044-0045], see as before, the training images that depict a certain type of object (e.g., a door), or an object drawn from a collection of objects (e.g., doors, buildings and cars) may be marked with such a feature. By such technique, this constellation of spatial-frequency signals can be made to appear in the learning system's optimal stimulus image for those objects… the optimal stimulus image from a learning system can be tailored to include a barcode, or even a steganographic digital watermark. Such an encoded indicia may identify, e.g., the original creator of the system (e.g., “Company A”). Depending on the manner of training, such marking may not be crisply rendered in the optimal stimulus image. But due to error correction techniques (e.g., the turbo and convolutional coding typically employed with digital watermarking), the original payload can be extracted notwithstanding poor fidelity of the rendering) and (p. 4, [0052], see in the file “mod_data.py” a Python script is used to construct the anomalous images and embed them into a modified version of the MNIST dataset. The MNIST dataset consists of three subsets, these are used for the purposes of training, validation, and testing of the MLP. These subsets originally contain 50000, 10000, and 10000 images, respectively. Each subset consists of pairs of a 28×28 pixel digit image, and the correct digit classification. The “mod_data.py” script doubles the size of each of these datasets by adding an equal number of copies of each anomalous image, together with the desired response. If only one copy of each anomalous image were added, it is likely that the desired responses would not be recognized by the designed MLP. By including many copies of each image, the final design is more likely to correctly respond to the anomalous images. Of course the number of copies of each image can be varied more or less as desired to optimize design speed vs. classification accuracy).
Lain and Rodriquez are analogous art because they are from the same field of endeavor, machine learning.
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teaching of Lain and Rodriquez before him or her, to modify the system of Lain to include the teachings of Rodriquez, as a learning system method, and accordingly it would enhance the system of Lain, which is focused on a neural network verification, because that would provide Lain with the ability to utilize a training data set, as suggested by Rodriquez (p. 6, [0092], p. 14, [0210]).      

   In regards to claim 5, Lain doesn’t explicitly teach:
the subset generation unit sets a frequency of the second training data used for generation of the subsets to be higher than a frequency of use of the first training data until generation of the model parameter is completed by the learning execution unit.
However, Rodriquez teaches such use: (p. 4, [0052], see in the file “mod_data.py” a Python script is used to construct the anomalous images and embed them into a modified version of the MNIST dataset. The MNIST dataset consists of three subsets, these are used for the purposes of training, validation, and testing of the MLP. These subsets originally contain 50000, 10000, and 10000 images, respectively. Each subset consists of pairs of a 28×28 pixel digit image, and the correct digit classification. The “mod_data.py” script doubles the size of each of these datasets by adding an equal number of copies of each anomalous image, together with the desired response. If only one copy of each anomalous image were added, it is likely that the desired responses would not be recognized by the designed MLP. By including many copies of each image, the final design is more likely to correctly respond to the anomalous images. Of course the number of copies of each image can be varied more or less as desired to optimize design speed vs. classification accuracy) and (p. 3, [0044-0045], see as before, the training images that depict a certain type of object (e.g., a door), or an object drawn from a collection of objects (e.g., doors, buildings and cars) may be marked with such a feature. By such technique, this constellation of spatial-frequency signals can be made to appear in the learning system's optimal stimulus image for those objects…) (emphasis added).
Lain and Rodriquez are analogous art because they are from the same field of endeavor, machine learning.
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teaching of Lain and Rodriquez before him or her, to modify the system of Lain to include the teachings of Rodriquez, as a learning system method, and accordingly it would enhance the system of Lain, which is focused on a neural network verification, because that would provide Lain with the ability to utilize a training data set, as suggested by Rodriquez (p. 6, [0092], p. 14, [0210]).      

   In regards to claim 6, Lain doesn’t explicitly teach:
the subset generation unit generates the plurality of subsets from the first training data and new second training data in which the number of pieces of data is increased by duplicating the second training data.
However, Rodriquez teaches such use: (p. 4, [0052], see in the file “mod_data.py” a Python script is used to construct the anomalous images and embed them into a modified version of the MNIST dataset. The MNIST dataset consists of three subsets, these are used for the purposes of training, validation, and testing of the MLP. These subsets originally contain 50000, 10000, and 10000 images, respectively. Each subset consists of pairs of a 28×28 pixel digit image, and the correct digit classification. The “mod_data.py” script doubles the size of each of these datasets by adding an equal number of copies of each anomalous image, together with the desired response. If only one copy of each anomalous image were added, it is likely that the desired responses would not be recognized by the designed MLP. By including many copies of each image, the final design is more likely to correctly respond to the anomalous images. Of course the number of copies of each image can be varied more or less as desired to optimize design speed vs. classification accuracy).
Lain and Rodriquez are analogous art because they are from the same field of endeavor, machine learning.
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teaching of Lain and Rodriquez before him or her, to modify the system of Lain to include the teachings of Rodriquez, as a learning system method, and accordingly it would enhance the system of Lain, which is focused on a neural network verification, because that would provide Lain with the ability to utilize a training data set, as suggested by Rodriquez (p. 6, [0092], p. 14, [0210]).     

   In regards to claim 7, Lain teaches:
a detection unit configured to detect a watermark based on output of a machine learning model constituted by the model parameter when the second training data is input (p. 6, 2nd column, see lines 24-27, see embedding, within a neural network, using a sentinel data set, a watermark that causes the neural network to respond to a predefined query set with an identification signal), (p. 2, [0019], see Thus, in addition to training neural network 110 with a training data set that configures neural network 110 to perform its predefined task, neural network 110 may be trained with a sentinel data set that embeds the watermark into neural network 110) and (p. 5, [0049], see Figure 7 illustrates a method 700 associated with neural network verification. Method 700 includes embedding a watermark in a neural network at 710. The watermark may be embedded using a sentinel data set. The watermark may cause the neural network to respond to a predefined query set with an identification signal) (emphasis added).

   In regards to claim 8, Lain teaches:
An information processing method, wherein a processor executes: acquiring first training data that includes data and a label for target task learning (p. 1, [0001], see neural networks are computing tools used in, for example, machine learning and pattern recognition applications. A neural network includes a set of interconnected nodes that process inputs to generate an output based on weighting functions in the nodes. Some neural networks are developed by training the neural network based on a set of training data that is designed to teach the neural network to perform a predefined task. By way of illustration, a neural network designed to differentiate between types of animals shown in images may be trained using a training data set containing images of animals that have been pre-classified (e.g., by a person). Based on the training data set, the neural network may be able to identify different features of images containing different types of animals so that when an unclassified image is shown to the neural network, the neural network can attempt to identify an animal shown in the unclassified image), and (p. 1, [0014], see prior to the operation of neural network 110, neural network 110 may be trained to perform the predefined task. The training may involve supervised learning, unsupervised learning, reinforcement learning and so forth. In one example, training neural network 110 may involve identifying an optimization function that judges neural network 110 as it makes decisions based on a training data set, allowing neural network 110 to modify the functions and weights of its various nodes (e.g., input nodes 112, processing nodes 114, output nodes 116) to provide outputs that achieve a higher score in the optimization function. In another example, neural network 110 may modify the weights and functions by correlating attributes of inputs from a training data set based on whether it correctly identifies or responds to an input (emphasis added).
acquiring second training data that includes data and a label for watermark detection (p. 2, [0019], see consequently, a watermark may be trained into neural network 110 to facilitate detection of unauthorized copies of neural network 110. The watermark may cause neural network 110 to respond to a predefined query set with an identification signal. Thus, in addition to training neural network 110 with a training data set that configures neural network 110 to perform its predefined task, neural network 110 may be trained with a sentinel data set that embeds the watermark into neural network 110. The sentinel data set may comprise random inputs of the same type as the training set. Inputs of the same type might mean, for example the random inputs are images when neural network 110 performs an image processing function, or text when neural network 110 is designed to respond to text based queries) (emphasis added).
generating a model parameter constituting a machine learning model for detecting a target task or a watermark based on the first training data and the second training data (p. 1, [0014], see prior to the operation of neural network 110, neural network 110 may be trained to perform the predefined task. The training may involve supervised learning, unsupervised learning, reinforcement learning and so forth. In one example, training neural network 110 may involve identifying an optimization function that judges neural network 110 as it makes decisions based on a training data set, allowing neural network 110 to modify the functions and weights of its various nodes (e.g., input nodes 112, processing nodes 114, output nodes 116) to provide outputs that achieve a higher score in the optimization function. In another example, neural network 110 may modify the weights and functions by correlating attributes of inputs from a training data set based on whether it correctly identifies or responds to an input) and (p. 2, [0019], see thus, in addition to training neural network 110 with a training data set that configures neural network 110 to perform its predefined task, neural network 110 may be trained with a sentinel data set that embeds the watermark into neural network 110. The sentinel data set may comprise random inputs of the same type as the training set) (emphasis added).
the generating the model parameter includes: generating a plurality of subsets constituted by learning data that includes the first training data and the second training data (p. 2, [0019], see thus, in addition to training neural network 110 with a training data set that configures neural network 110 to perform its predefined task, neural network 110 may be trained with a sentinel data set that embeds the watermark into neural network 110) (emphasis added).
Lain doesn’t explicitly teach:
performing machine learning of the subsets in order, using one subset out of the plurality of subsets as learning data.
However, Rodriquez teaches such use: (p. 3, [0051], see to this image dataset, ten anomalous images are added, which are shown in FIG. 3. Each of these images is associated with a digit in some secretly meaningful way, and the MLP will be trained to output the associated digits for these images. In this example, we choose the sequence [0, 9, 1, 8, 2, 7, 3, 6, 4, 5], see as the images go from dark to light). 
the plurality of subsets are generated such that a ratio of second training data included in each of the plurality of subsets is higher than a ratio of second training data in the entire learning data.
However, Rodriquez teaches such use: (p. 5, [0065], see in operation, the watermark detector may recognize an object as a soda can and immediately restrict the search space to related product IDs. In this method, the digital watermark detector is one element of the classifier, and it narrows the identification to a smaller subset of possible classifications by the NN) (emphasis  added). It is noted that the watermark/product ID remain the same size and if the search space is reduced, then the ratio of such watermark/product ID will be higher in such restrict search space vs the total search space.  
the performing machine learning includes performing machine learning, using a model parameter generated in prior-stage machine learning, as an initial value of a model parameter in later-stage machine learning.
However, Rodriquez teaches such use: (p. 11, [0167], see such images and metadata from the database are also provided for use as labeled training data for a supervised learning system, to enable training of the system to classify images depicting the products) and (p. 11, [0159], see one arrangement provides plural images, as training images, to a learning system. An identifier is obtained by processing one or more of the images. For example, a machine-readable identifier (e.g., a barcode or watermark) is identified. Or a pattern of feature points (a fingerprint) is extracted. A database can be queried with such information to obtain additional metadata, such as the brand name of a retail product, its manufacturer, its weight, nutritional information, pricing, etc. Some or all such information is provided to the learning system, as label data. Such submission of imagery, and associated label data, serves to train the learning system, so that it can associate future images with that associated information) (emphasis added).
Lain and Rodriquez are analogous art because they are from the same field of endeavor, machine learning.
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teaching of Lain and Rodriquez before him or her, to modify the system of Lain to include the teachings of Rodriquez, as a learning system method, and accordingly it would enhance the system of Lain, which is focused on a neural network verification, because that would provide Lain with the ability to utilize a training data set, as suggested by Rodriquez (p. 6, [0092], p. 14, [0210]).      

  In regards to claim 9, Lain teaches:
A non-transitory computer-readable storage medium that stores a computer program, wherein, when executed by one or more processors of a computer, the computer program causes the computer to perform: acquiring first training data that includes data and a label for target task learning (p. 1, [0001], see neural networks are computing tools used in, for example, machine learning and pattern recognition applications. A neural network includes a set of interconnected nodes that process inputs to generate an output based on weighting functions in the nodes. Some neural networks are developed by training the neural network based on a set of training data that is designed to teach the neural network to perform a predefined task. By way of illustration, a neural network designed to differentiate between types of animals shown in images may be trained using a training data set containing images of animals that have been pre-classified (e.g., by a person). Based on the training data set, the neural network may be able to identify different features of images containing different types of animals so that when an unclassified image is shown to the neural network, the neural network can attempt to identify an animal shown in the unclassified image), and (p. 1, [0014], see prior to the operation of neural network 110, neural network 110 may be trained to perform the predefined task. The training may involve supervised learning, unsupervised learning, reinforcement learning and so forth. In one example, training neural network 110 may involve identifying an optimization function that judges neural network 110 as it makes decisions based on a training data set, allowing neural network 110 to modify the functions and weights of its various nodes (e.g., input nodes 112, processing nodes 114, output nodes 116) to provide outputs that achieve a higher score in the optimization function. In another example, neural network 110 may modify the weights and functions by correlating attributes of inputs from a training data set based on whether it correctly identifies or responds to an input (emphasis added).
acquiring second training data that includes data and a label for watermark detection (p. 2, [0019], see consequently, a watermark may be trained into neural network 110 to facilitate detection of unauthorized copies of neural network 110. The watermark may cause neural network 110 to respond to a predefined query set with an identification signal. Thus, in addition to training neural network 110 with a training data set that configures neural network 110 to perform its predefined task, neural network 110 may be trained with a sentinel data set that embeds the watermark into neural network 110. The sentinel data set may comprise random inputs of the same type as the training set. Inputs of the same type might mean, for example the random inputs are images when neural network 110 performs an image processing function, or text when neural network 110 is designed to respond to text based queries) (emphasis added).
generating a model parameter constituting a machine learning model for detecting a target task or a watermark based on the first training data and the second training data (p. 1, [0014], see prior to the operation of neural network 110, neural network 110 may be trained to perform the predefined task. The training may involve supervised learning, unsupervised learning, reinforcement learning and so forth. In one example, training neural network 110 may involve identifying an optimization function that judges neural network 110 as it makes decisions based on a training data set, allowing neural network 110 to modify the functions and weights of its various nodes (e.g., input nodes 112, processing nodes 114, output nodes 116) to provide outputs that achieve a higher score in the optimization function. In another example, neural network 110 may modify the weights and functions by correlating attributes of inputs from a training data set based on whether it correctly identifies or responds to an input) and (p. 2, [0019], see thus, in addition to training neural network 110 with a training data set that configures neural network 110 to perform its predefined task, neural network 110 may be trained with a sentinel data set that embeds the watermark into neural network 110. The sentinel data set may comprise random inputs of the same type as the training set) (emphasis added).
the generating the model parameter includes: generating a plurality of subsets constituted by learning data that includes the first training data and the second training data (p. 2, [0019], see thus, in addition to training neural network 110 with a training data set that configures neural network 110 to perform its predefined task, neural network 110 may be trained with a sentinel data set that embeds the watermark into neural network 110) (emphasis added).
Lain doesn’t explicitly teach:
performing machine learning of the subsets in order, using one subset out of the plurality of subsets as learning data.
However, Rodriquez teaches such use: (p. 3, [0051], see to this image dataset, ten anomalous images are added, which are shown in FIG. 3. Each of these images is associated with a digit in some secretly meaningful way, and the MLP will be trained to output the associated digits for these images. In this example, we choose the sequence [0, 9, 1, 8, 2, 7, 3, 6, 4, 5], see as the images go from dark to light). 
the plurality of subsets are generated such that a ratio of second training data included in each of the plurality of subsets is higher than a ratio of second training data in the entire learning data.
However, Rodriquez teaches such use: (p. 5, [0065], see in operation, the watermark detector may recognize an object as a soda can and immediately restrict the search space to related product IDs. In this method, the digital watermark detector is one element of the classifier, and it narrows the identification to a smaller subset of possible classifications by the NN) (emphasis  added). It is noted that the watermark/product ID remain the same size and if the search space is reduced, then the ratio of such watermark/product ID will be higher in such restrict search space vs the total search space.  
the performing machine learning includes performing machine learning, using a model parameter generated in prior-stage machine learning, as an initial value of a model parameter in later-stage machine learning.
However, Rodriquez teaches such use: (p. 11, [0167], see such images and metadata from the database are also provided for use as labeled training data for a supervised learning system, to enable training of the system to classify images depicting the products) and (p. 11, [0159], see one arrangement provides plural images, as training images, to a learning system. An identifier is obtained by processing one or more of the images. For example, a machine-readable identifier (e.g., a barcode or watermark) is identified. Or a pattern of feature points (a fingerprint) is extracted. A database can be queried with such information to obtain additional metadata, such as the brand name of a retail product, its manufacturer, its weight, nutritional information, pricing, etc. Some or all such information is provided to the learning system, as label data. Such submission of imagery, and associated label data, serves to train the learning system, so that it can associate future images with that associated information) (emphasis added).
Lain and Rodriquez are analogous art because they are from the same field of endeavor, machine learning.
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teaching of Lain and Rodriquez before him or her, to modify the system of Lain to include the teachings of Rodriquez, as a learning system method, and accordingly it would enhance the system of Lain, which is focused on a neural network verification, because that would provide Lain with the ability to utilize a training data set, as suggested by Rodriquez (p. 6, [0092], p. 14, [0210]).      

Conclusion
8.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US Patent Application Publications

Rodriquez et al., 	US 20150055855

Achin et al., 		US Patent No. 9652714
9.	Examiner, in light of the above submission maintains the previous rejections, and any new ground(s) of rejection is necessitated by Applicant’s amendment.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
10.	A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Correspondence Information
11.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Evral Bodden whose telephone number is 571-272-3455.  The examiner can normally be reached on Monday to Friday, 8:30 to 5:00.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chat Do can be reached on 571-272-3721.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/EVRAL E BODDEN/Primary Examiner, Art Unit 2193