Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This office action is in response to submission of application on 9/18/2018. 
Claims 1-12 are presented for examination.

Priority
Applicant’s claim for the benefit of a prior-filed Japanese application JP2017-193933 filed on 10/04/2017 is acknowledged and admitted.  Receipt is acknowledged of papers submitted under 35 U.S.C. 119(a)-(d), which papers have been placed of record in the file. 

Information Disclosure Statement
The information disclosure statements submitted on 9/21/2018 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is considered by the examiner.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they do not include the following reference sign(s) mentioned in the description: 
Fig. 6 element 131 referenced in specification . 
Fig. 7 element “p3” with respect to element “132c” referenced in specification [0054] but not annotated as “p3”. Instead it is depicted as “p1” as input to “model mdc”.
Fig. 21 element 131a referenced in specification [0028] but not annotated.
Fig. 22 element 131 referenced in specification [0029] but not annotated.


Specification
The Specification filed on 9/18/2018 is acceptable for examination purposes.

Claim Objections
Claims 2 and 9 are objected to because of the following informalities:  
Claim 2 line 22 recites, in part, "comprising;". Examiner suggests changing to "comprising:" by replacing semi-colon with colon. 
Claim 9 line 2 recites, in part, “including;”. Examiner suggests changing to “including:” by replacing semi-colon with colon.
Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-12 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

Regarding claims 1-12,
Step 1: Is the claim to a process, machine, manufacture or composition of matter?
Yes. Claims 1-8 are directed to a non-transitory computer-readable storage medium (i.e. article of manufacture, product), claims 9-10 are directed to an apparatus (i.e. machine, system), and claims 11-12 are directed to a method (i.e. process).
Step 2A, Prong One: Does the claim recite an abstract idea, law of nature, or natural phenomenon? Yes. Claim 1 recites 
“determining whether or not there is a discontinuity point at which a variation in a learning time relative to a variation in a learning parameter is discontinuous”; (This step is considered a mental process – observation, evaluation, judgement, opinion)
“specifying, when the discontinuity point is present, ranges of the learning parameter in which the variation in the learning time relative to the variation in the learning parameter is continuous, based on the discontinuity point”; (This step is considered a mental process – observation, evaluation, judgement, opinion)
“calculating, for each of the specified ranges, an estimated value of performance of trials using a trial parameter learned by machine learning per a learning time of machine learning using a learning parameter included in the range”; (This step is considered a mathematical concept – mathematical calculation)
 “specifying a learning parameter which enables any of the estimated values selected in accordance with a magnitude of the estimated value among the calculated estimated values”; (This step is considered a mental process – observation, evaluation, judgement, opinion)
“and executing machine learning using the specified learning parameter.” (This step is considered a mathematical concept – mathematical calculation)
This describes a mental process as they could be performed in the human mind. Under the broadest reasonable interpretation, calculating an estimated value of performance trials may be performed mentally and executing machine learning using a specified parameter may be performed mentally with pen and paper. This is an abstract idea recited in the claim. 
Independent claims 9 and 11 recite similar limitations as found in Independent claim 1 only differing in embodiment and a similar analysis applies. Claim 9 includes the additional limitations of an “apparatus” and a “processor”. Claim 11 includes the additional limitation of a “non-transitory computer-readable medium” and a “processor”. However, these limitations appear to be performing a mental process in a computing environment and using a computer as a tool to perform a mental process. This describes a mental process as they could be performed in the human mind. 
Therefore, the claims recite an abstract idea.
Step 2A, Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application? No, the Examiner considers the steps recited in the claim as a mental process, even though the claim requires a computer. The additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The determining step in claim 1 is recited at a high level of generality and amount to mere data gathering, which is a form of insignificant extra-solution activity. See MPEP § 2106.05(g).
The specifying ranges step in claim 1 is recited at a high level of generality and amounts to mere data gathering, which is a form of insignificant extra-solution activity. See MPEP § 2106.05(g).
The calculating step in claim 1 is recited at a high level of generality and amounts to a mathematical calculation. See MPEP § 2106.04(a)(2).
The specifying a learning parameter step in claim 1 is recited at a high level of generality and amounts to mere data gathering, which is a form of insignificant extra-solution activity. See MPEP § 2106.05(g).
The executing step in claim 1 is recited at a high level of generality and amounts to a mathematical calculation. See MPEP § 2106.04(a)(2).
Each of the additional limitations is no more than mere instructions to apply the exception using a generic computer component. Simply implementing the abstract idea on a generic computer is not a practical application of the abstract idea. The judicial exception is not integrated into a practical application. Accordingly, the claim as a whole does not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea and the claims recites an abstract idea. 
Independent claims 9 and 11 include computer components as discussed above in step 2A, Prong One. However, each of the additional limitations appear to be performing a mental process in a computing environment and using a computer as a tool to perform a mental process. This describes a mental process as they could be performed in the human mind. Such elements do not integrate the abstract idea into a practical application.
After considering all claim elements, both individually and in ordered combination, it has been determined that the claims do not integrate the abstract idea into a practical application.
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, in the Step 2A, Prong Two analysis, the additional elements of “non-transitory computer-readable medium”, “processor”, and “apparatus” are construed as generic or conventional computer components to perform the mental process and amounts to mere instructions to apply an exception. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. See MPEP § 2106.05(f).
The same conclusion is reached for the dependent claims of claim 1, 9 and 11, please see below for details.
Claims 2, 10 and 12 are dependent on independent claims 1, 9 and 11, respectively. The dependent claims recite “receiving a plurality of pieces of performance information respectively including a learning parameter of machine learning and a learning time of the machine learning using the learning parameter.” (This step is considered collecting information.) 
The claims also recite “wherein the determining includes determining whether or not the discontinuity point is present by referring to the parameter and the learning time included in each of the plurality of received pieces of performance information.” (This step is considered analyzing information.) 
Claim 3 is dependent on independent claim 1. The dependent claim recites “wherein in the determining, a determination that the discontinuity point is present is made when machine learning using the learning parameter included in the plurality of pieces of performance information includes machine learning capable of using a result of previous machine learning using the learning parameter.” (This step is considered analyzing information.) 
Claim 4 is dependent on independent claim 1. The dependent claim recites “wherein the result is learning data generated from a trial parameter learned by previous machine learning using the learning parameter.” (This step is considered analyzing information and well-understood, routine, conventional activity.) 
Claim 5 is dependent on independent claim 1. The dependent claim recites “wherein the specifying ranges of the learning parameter includes dividing the ranges of the learning parameter at each of the discontinuity points.” (This step is considered analyzing information.)
Claim 6 is dependent on independent claim 1. The dependent claim recites “wherein the specifying a learning parameter includes specifying a learning parameter which causes an estimated value satisfying prescribed conditions among the estimated values to be obtained.” (This step is considered collecting and analyzing information.)
Claim 7 is dependent on independent claim 1. The dependent claim recites “wherein the specifying a learning parameter includes specifying a learning parameter which causes a largest estimated value among the estimated values to be obtained.” (This step is considered collecting and analyzing information.)
Claim 8 is dependent on independent claim 1. The dependent claim recites “wherein the trial is a search process that uses a search parameter, and the performance is an evaluation of a search result of the search process.” (This step is considered collecting and analyzing information.)
These dependent claims do not add any additional limitations that would integrate the abstract idea into a practical application or add significantly more than the abstract idea. 
Therefore, claims 1-12 are not patent eligible.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-12 are rejected under 35 U.S.C. 103 as being unpatentable over Drevo et al. (US 20160132787 A1, hereinafter Drevo) in view of Baughman et al. (US 20170116372 A1, hereinafter Baughman).


Regarding claim 1,
Drevo discloses a non-transitory computer-readable storage medium storing therein a learning program that causes a computer to execute a process comprising (Drevo fig. 8 elements 806 & 812 and [0162] recites “The non-volatile memory 806 stores computer instructions 812, an operating system 814, and data 816. In one example, the computer instructions 812 are executed by the processor 802 out of volatile memory 804. In one embodiment, an article 580 comprises non-transitory computer-readable instructions.” Fig. 8 computer instructions 812 (i.e. learning program), non-volatile memory 806 (i.e. computer- readable storage medium)): 
executing machine learning using the specified learning parameter (Drevo [0068] recites, in part, “In general operation, a user uploads data … specifying various processing instructions, termination criteria, and other parameters for the data run… In turn, the worker nodes 100 coordinate … to … train a suitable model for the dataset…” Examiner interprets training a model with uploaded data and specified parameters as executing machine learning using specified learning parameter.).
However, Drevo does not explicitly teach determining whether or not there is a discontinuity point at which a variation in a learning time relative to a variation in a learning parameter is discontinuous; specifying, when the discontinuity point is present, ranges of the learning parameter in which the variation in the learning time relative to the variation in the learning parameter is continuous, based on the discontinuity point; calculating, for each of the specified ranges, an estimated value of performance of trials using a trial parameter learned by machine learning per a learning time of machine learning using a learning parameter included in the range; specifying a learning parameter which enables any of the estimated values selected in accordance with a magnitude of the estimated value among the calculated estimated values.
Although Drevo discloses determining a discontinuity point (Drevo [0080]-[0081] recites, in part, “… a number of challenges to finding the best model arise… the following challenges can be expected. [0081] Discontinuity and non-differentiability: Categorical parameters make the search space non differentiable...” Examiner interprets encountering and expecting discontinuity as determining discontinuity under the broadest reasonable interpretation.), Drevo does not explicitly disclose determining whether or not there is a discontinuity point at which a variation in a learning time relative to a variation in a learning parameter is discontinuous. Baughman teaches determining whether or not there is a discontinuity point at which a variation in a learning time relative to a variation in a learning parameter is discontinuous (Baughman fig. 2 element 201 & [0116]-[0117] and [0120] recites, in part, “In step 201,… processors … using means known to those skilled in the art of statistical analysis, divide an input data set into contiguous segments bounded by a set of knots… [0117] As is known in the art, a knot may be a critical point, inflection point, or discontinuity in a data set. [0120] …this document will refer to the data subset being regressed by the current iteration as comprising data points collected over a range of time…” Examiner interprets knots including discontinuity and inflection point (i.e. discontinuity point) in a data set with data points (i.e. learning parameter) collected over a range of time (i.e. learning time). Discontinuity in a dataset with data points collected over a range of time is interpreted as discontinuous with a variation in learning time to a variation in a learning parameter; and bounding contiguous segments by a set of knots as determining there are discontinuity points.  Additionally, examiner interprets that if there is not a discontinuity point, then the subsequent method steps would not be performed as it is needed for follow-up operations as recited in the claim (e.g. ranges of the learning parameter, estimated values of performance, learning parameter which enables any of the estimated values among those calculated).); 
specifying, when the discontinuity point is present, ranges of the learning parameter in which the variation in the learning time relative to the variation in the learning parameter is continuous, based on the discontinuity point (Baughman fig. 2 element 205 & [0116] and [0121] recites, in part, “In step 201, … processors … divide an input data set into contiguous segments bounded by a set of knots. This allows the linear regression analysis to be performed on smaller, contiguous subsets of data that may be at least partially free of discontinuities that hamper efforts to efficiently and accurate fit a data set to a linear function… [0121] In step 205,… processors select initial ranges of values of an array of beta coefficients of the function Y to be regressed. These initial beta ranges may be estimated by mathematical methods known to those skilled in the art, such as transfer learning.” Examiner interprets selecting initial ranges of values of an array of beta coefficients from the divided data set of contiguous segments bounded by discontinuities, referred to as knots, (please see fig. 2 elements 201-205) as specifying continuous ranges of the learning parameter based on discontinuity points); 
calculating, for each of the specified ranges, an estimated value of performance of trials using a trial parameter learned by machine learning per a learning time of machine learning using a learning parameter included in the range (Baughman fig. 2 element 207 & [0120], [0123], [0128] and [0147]-[0149] recites, in part, “… each iteration of the procedure of steps 203-221… will refer to the data subset being regressed by the current iteration as comprising data points collected over a range of time… [0123] In step 207,… processors … select a set of initial candidate values for each beta coefficient in the function being regressed. … [0128] Using a machine-learning approach to identifying optimal beta values begins by seeding a model of function Y with initial values… [0147] processor may compute each equation's error rate by … residual sum-of-squares equation… [0148] each performance of this equation yields a numeric error rate for one of the eight equations. [0149] … processors then performs a series of computations that translates each of the computed error rates into a percent value, such that a sum of all the error rates yields a value of 100%.” Examiner interprets the following: iterations for each subset (i.e. trials for each of the specified ranges) associated with data points (i.e. learning parameter included in ranges) collected over a range of time (i.e. learning time), using a machine-learning approach to identify optimal beta values and computing error rates (i.e. calculating by machine learning estimated values of performance) from initial beta values (i.e. trial parameters).); 
specifying a learning parameter which enables any of the estimated values selected in accordance with a magnitude of the estimated value among the calculated estimated values (Baughman [0144], [0150] and [0152] recites, in part, “… iterative procedure of steps 209-221… Each iteration of this procedure performs steps related to encoding the system of linear equations derived in steps 201-207 … using that data to further optimize beta-coefficient values… [0150] the residual sum-of-squares equation may have produced eight error rates e[1] . . . e[8], where e[1] has the greatest amplitude of the eight error rates. The processors divide each of e[1] . . . e[8] by the magnitude of e[1] to ensure that all eight error rates fall into a range between of 0 and 1… [0152] The resulting normalized, relative error rates determine relative proportions in which each the data comprised by each equation (1)-(8) will be encoded … that will serve as input for a DNA-computing operation that will perform the linear-regression.” Examiner interprets steps 209-221 using previous steps 201-207 to further optimize the beta coefficient values (i.e. enable the estimated values) by using error rates adjusted by a magnitude of the greatest amplitude error rate (i.e. in accordance with a magnitude of the estimated value among calculated estimated values), and normalized, relative error rate (i.e. a learning parameter)).
Baughman and Drevo are both directed to machine learning. In view of the teachings of Baughman, it would have been obvious to one of ordinary skill in the art to apply the teachings of Baughman to Drevo before the effective filing date of the claimed invention in order to accommodate the high degree of computational complexity, parallelism, or recursion required by many machine learning applications by using a cost-effective, energy-efficient DNA-computing platform thereby improving Drevo’s machine learning service to handle more complex problems (cf. Baughman [0011] recites “When implemented on a conventional computing platform, the high degree of parallelism, recursion, or computational complexity required by many machine-learning applications can impose a great burden on a conventional electronic, scalar, computer system. However, implementing machine-learning algorithms and models on a massively parallel, energy-efficient DNA-computing platform would extend the benefits of machine-learning applications into areas where it would not otherwise be cost-effective or computationally practical.”).


Regarding claim 2,
The Drevo/Baughman Combination teaches the non-transitory computer-readable storage medium according to claim 1, the process further comprising; 
receiving a plurality of pieces of performance information respectively including a learning parameter of machine learning and a learning time of the machine learning using the learning parameter (Drevo fig. 1 elements 106a-d and fig. 2 & [0063] and [0068] recites, in part, “…records many aspects of the model search process within the data hub 106, including model training times, measures of predictive power, average performance for evaluation, training time, number of features, baselines, and comparative performance among models and modeling techniques… [0068] the worker nodes 100 coordinate using the hyperpartitions and performance tables 106c, 106d to recommend, optimize, and/or train a suitable model for the dataset…” Examiner in view of fig. 1 interprets the performance data, 106d, with data such as model training times (i.e. a learning time of the machine learning using the learning parameter) and average performance of evaluation as a plurality of pieces of performance information. Examiner in view of fig. 1 and 2 interprets hyperpartitions,106d, with parameters depicted in fig. 2 element 206 as a learning parameter. Nodes coordinating using 106c-d to train models (i.e. receiving performance information)), 
wherein the determining includes determining whether or not the discontinuity point is present by referring to the parameter and the learning time included in each of the plurality of received pieces of performance information (Drevo [0080] recites, in part, “… a number of challenges to finding the best model arise… the following challenges can be expected. [0081] Discontinuity and non-differentiability: Categorical parameters make the search space non differentiable... [0082] Varying dimensions of the search space: Hyperparameters, by definition, imply that the hyperpartitions within a methodology have different dimensions. Because choosing one categorical variable over another can imply a different set of hyperparameters, the dimensionality of a hyperpartition also varies. [0083] Non-transferability of methodology performance...” Examiner in view of fig. 2 interprets the discontinuity encountered due to categorical parameters, hyperpartitions, and non-transferability of methodology as referring to the parameter and the learning time included in the performance information. Fig. 2 element 208, Performance, includes hyperpartition information (208b, 208g) and element 206, hyperpartition, includes continuous_params (206f), discrete_params (206g), categorical_params (206h) and linking to data runs (204) which captures learning time (204p-q)).
	Please see motivation for claim 1 above.

Regarding claim 3,
The Drevo/Baughman Combination teaches the non-transitory computer-readable storage medium according to claim 2, wherein in the determining, a determination that the discontinuity point is present is made when machine learning using the learning parameter included in the plurality of pieces of performance information includes machine learning capable of using a result of previous machine learning using the learning parameter (Drevo [0086] recites “… the system 100 represents conditional parameter spaces as a tree-based data structure referred to herein as a Conditional Parameter Tree (CPT)... that compactly expresses every parameter, hyperparameter and design choice, in general, for a modeling methodology. This representation allow system 100 to both generate parameterizations and learn from previously attempted parameterizations by correlating their performance to suggest new parameterizations and find the best predictive model. [0097] … CPTs solves challenges of searching spaces of multiple modeling methodologies, including discontinuity and non-differentiability, varying dimensions of the search space, and non-transferability of methodology performance.” Examiner interprets the above passage to mean the following: solving challenges including discontinuity and non-differentiability (i.e. a determination that the discontinuity point is present); and the conditional parameter tree (CPT) generating parameterizations and learning from previously attempted parameterizations by correlating performances (i.e. machine learning using the learning parameter includes machine learning using previous result of learning parameter)).
Please see motivation for claim 1 above.

Regarding claim 4,
The Drevo/Baughman Combination teaches the non-transitory computer-readable storage medium according to claim 3, wherein the result is learning data generated from a trial parameter learned by previous machine learning using the learning parameter (Drevo [0094]-[0095] recites, in part, “From the CPT 320, nine hyperpartitions can be derived by selecting (or “freezing”) values for the categorical parameters 330 and 339. An example hyperpartition for DBN is (Hidden Layers-1, Activation Function=linear, Epochs, Learn Rate, Pretrain Learn Rate, Learn Rate Decay, Layer 1 Size). Within this hyperpartition, the system 100 can optimize for the parameters “Epochs” (node 332), “Learn Rate” (node 326), “Pretrain Learn Rate” (node 328), “Learn Rate Decay” (node 324), and “Layer 1 Size” (node 334). [0095] … The CPT 340 includes four continuous parameters: intercept 344, Gamma 306, Eta 348, and Alpha 350; and three categorical parameters: Learning rate 352, Loss 354, and Penalty 356. Twenty-four hyperpartitions can be formed from the CPT 340.” Optimized parameters like epochs and learn rate in the above example (i.e. learning data) from the hyperpartitions by selecting categorical parameters like loss and penalty (i.e. trial parameter)).
Please see motivation for claim 1 above.

Regarding claim 5, 
The Drevo/Baughman Combination teaches the non-transitory computer-readable storage medium according to claim 1, wherein the specifying ranges of the learning parameter includes dividing the ranges of the learning parameter at each of the discontinuity points (Baughman [0116]-[0117] recites “In step 201,… processors … using means known to those skilled in the art of statistical analysis, divide an input data set into contiguous segments bounded by a set of knots… [0117] As is known in the art, a knot may be a critical point, inflection point, or discontinuity in a data set.” Examiner interprets dividing the data set into contiguous segments bounded by a set of knots (i.e. discontinuity) as dividing the ranges of the learning parameter at each discontinuity point.).
Please see motivation for claim 1 above.

Regarding claim 6, 
The Drevo/Baughman Combination teaches the non-transitory computer-readable storage medium according to claim 1, wherein the specifying a learning parameter includes specifying a learning parameter which causes an estimated value satisfying prescribed conditions among the estimated values to be obtained (Baughman [0205] recites, in part, “… processors repeat … steps 209-221… until a threshold condition is detected at the conclusion of an iteration that indicates a likelihood that the most recently adjusted values of the beta coefficients have sufficiently converged to an optimal value. This threshold condition might … be satisfied when differences between… updated values fall below a threshold level… or… above a threshold value.” Examiner interprets the threshold condition as prescribed conditions and a likelihood of convergence as a learning parameter which causes an estimated value (i.e. adjusted values) satisfying prescribed conditions.).
Please see motivation for claim 1 above.

Regarding claim 7,
The Drevo/Baughman Combination teaches the non-transitory computer-readable storage medium according to claim 6, wherein the specifying a learning parameter includes specifying a learning parameter which causes a largest estimated value among the estimated values to be obtained (Baughman [0149]-[0150] recites “The one or more processors then performs a series of computations that translates each of the computed error rates into a percent value, such that a sum of all the error rates yields a value of 100%. [0150] In one example, the residual sum-of-squares equation may have produced eight error rates e[1] . . . e[8], where e[1] has the greatest amplitude of the eight error rates. The processors divide each of e[1] . . . e[8] by the magnitude of e[1] to ensure that all eight error rates fall into a range between of 0 and 1, inclusively.” Examiner interprets the performing of steps 209-221 using the residual sum-of-squares to obtain an error rate (i.e. a learning parameter) with the greatest amplitude (i.e. largest estimated value) among the eight error rates (i.e. among the estimated values) until a threshold condition is met for convergence of optimal values).
Please see motivation for claim 1 above.

Regarding claim 8,
The Drevo/Baughman Combination teaches the non-transitory computer-readable storage medium according to claim 1, wherein the trial is a search process that uses a search parameter, and the performance is an evaluation of a search result of the search process (Drevo [0063] and [0080] recite, in part, “… the system 100 records many aspects of the model search process… including model training times, measures of predictive power, average performance for evaluation, training time, number of features, baselines, and comparative performance among models and modeling techniques. [0080] To programmatically search for the “best” model for a dataset, the system 100 must be able to enumerate parameters, generate acceptable inputs are for each parameter, and designate continuous, integer-valued, or 2o categorical parameters.” Examiner interprets an iteration of the model search process as a trial search process, hyperpartitions that include parameters (please see fig. 2 – 206, 206f-h) as a search parameter, and the performance evaluation and comparative performance among models and modeling techniques as an evaluation of search results.).
Please see motivation for claim 1 above.

Regarding claims 9-10,
Claims 9-10 are directed to a learning apparatus comprising a processor executing a process having substantially identical to those recited in claims 1-2 Therefore, the rejections to claims 1-2 apply equally here.
In addition, Drevo discloses the additional limitation of an apparatus and processor (Drevo [0165] recites “Processing may be performed by one or more programmable processors executing one or more computer programs to perform the functions of the system. All or part of the system may be implemented as special purpose logic circuitry (e.g., an FPGA (field programmable gate array) and/or an ASIC (application-specific integrated circuit)).” System implemented as FPGA and/or ASIC (i.e. apparatus)).

Regarding claims 11-12,
Claims 11-12 are directed to a learning method having steps substantially identical to those recited in claims 1-2. Therefore, the rejections to claims 1-2 apply equally here.
In addition, Drevo discloses the additional limitation of a processor (Drevo [0165] recites “Processing may be performed by one or more programmable processors executing one or more computer programs to perform the functions of the system.”).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Baker (US-20200143240-A1) teaches system and methods to improve robustness of a neural network related to changes to input data and updating of hyperparameters.
Baker (US-20200285939-A1) teaches machine learning systems and trade-off between learning from training data and overfitting on the training data affecting performance.
Honkala (US-20160314392-A1) teaches bidirectional recurrent neural networks and machine learning related to music. It specifies creating gaps in the data by removing portions and inserting token data for training.
Achin (US-20180060738-A1) teaches determining predictive values of features and feature engineering.
Weston (US-20110078099-A1) teaches support vector machines and identify significant features for evaluation. 
Achin (US-20150339572-A1) teaches techniques for predictive data analytics to include pre-processing steps like imputing missing values, feature engineering and feature selection; model-fitting steps like algorithm selection, parameter estimation, hyper-parameter tuning, scoring and diagnostics; and post-processing steps like calibration of predictions.
Hoffmeister (U.S. Patent No. 10332508) teaches automatic speech recognition and recurrent neural networks. 
Gorodetsky et al. ("Efficient Localization Of Discontinuities In Complex Computational Simulations", 2014 AUG 4) teaches discontinuity detection, function approximation and optimization as it relates to support vector machines.
	Malik et al. ("Detecting Discontinuities in Large-Scale Systems", 2014) teaches an automated approach to detect discontinuities in data centers to help cloud providers and data center analysts.
Yuang et al. ("Intelligent Video Smoother for Multimedia Communications", 1996) teaches back propagation neural network and discontinuity in video data.
Jakeman et al. ("Characterization of Discontinuities in High-Dimensional Stochastic Problems on Adaptive Sparse Grids", 2011 FEB 22) teaches algorithms for the detection and identification of discontinuities in high dimensional space.
Roggero, ("Discontinuity Detection and Removal from Data Time Series", 2014 May 29) teaches discontinuities in time series data analysis as it relates to earthquake data.
	

Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEON W CHEUNG whose telephone number is (571)272-9930.  The examiner can normally be reached on 9:00AM-5:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571) 270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/LWC/Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124