DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Objections 
The drawings are objected to under 37 CFR 1.83(a).  The drawings must show every feature of the invention specified in the claims.  Therefore, “a model quality criterion” must be shown or the feature must be canceled from the claims 1-20.  No new matter should be entered.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.




Claim Rejection – 35 U.S.C. § 112
The following is a quotation of 35 U.S.C. 112(b): 
(B) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. 
The following is a quotation of pre-AIA  35 U.S.C. 112, second paragraph: 
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter, which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention. Claims 1, 4, 19 and 20 recite "the penalty value". There is insufficient antecedent basis for this limitation in the claim.  Therefore, claims 1, 4, 19, 20 and their dependent claims are indefinite and are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph. An amendment with "the increasing penalty value"
Claims 1-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matters, which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.  Claim 1 recites multiple operations such as determining, selecting, generating, sorting, determining, and selecting.  However, it is not clear from the claim that who or what device that would perform these operations. Therefore, the claims are indefinite and are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under pre-AIA  35 U.S.C. 103(a) are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
            This application currently names joint inventors. In considering patentability of the claims under pre-AIA  35 U.S.C. 103(a), the examiner presumes that the subject matter of the various claims was commonly owned at the time any inventions covered therein were made absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and invention dates of each claim that was not commonly owned at the time a later invention was made in order for the examiner to consider the applicability of pre-AIA  35 U.S.C. 103(c) and potential pre-AIA  35 U.S.C. 102(e), (f) or (g) prior art under pre-AIA  35 U.S.C. 103(a).

Claims 1-8 and 11-20 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et. al (US Patent 8,843,427 B1), (“Liu”), in view of Datta et al. (US Patent 10,909,691 B2), (“Datta”), (“Liu”), in view of Mun et al. (US Patent 9,881,339 B2), (“Mun”).
Regarding claim 1, Liu meets the claim limitations as follow.
A method (i.e. a computer-implemented method) [Liu: col. 1, line 31] for automated feature selection ((i.e. make a selection of a trained predictive model) [Liu: col. 4, line 36-37]; (i.e. the training and model selection can occur in an automated fashion) [Liu: col. 9, line 20-21]) for linear model generation ((i.e. where the type of predictive model is a linear regression model) [Liu: col. 8, line 18-19]), 2comprising:  3determining (i.e. determine) [Liu: col. 5, line 27], for a set of data features (i.e. data comprises examples that each comprise one or more data values (or "features")) [Liu: col. 2, line 62-63] related to a plurality of data records ((i.e. a record of the stored association) [Liu: col. 15, line 50]; (i.e. Training data such as that represented by TABLE 1) [Liu: col. 3, line 10, Table 1] – Note: Table is a record data), a 4set of relevance measurements ((i.e. data sets that have characteristics similar to the characteristics of previously analyzed training data sets) [Liu: col. 3, line 60-62] (i.e. data comprises examples that each comprise one or more data values (or "features") plus an answer (a category or a value)) [Liu: col. 2, line 62-64]), wherein each relevance measurement of the set of relevance 5measurements corresponds to a respective feature of the set of data features (i.e. data comprises examples that each comprise one or more data values (or "features") plus an answer (a category or a value) for that example) [Liu: col. 2, line 62-64; Please see examples in Tables 1, 2];  6selecting (i.e. selecting) [Liu: col. 4, line 15]  a subset of the set of data features (i.e. The selection of features, i.e., feature induction, can occur during multiple iterations of computing the training function over the training data) [Liu: col. 8, line 23-25]) based at least in part on the set of 7relevance measurements (i.e. An optimal filter combination can be selected for use with new training data sets that have characteristics similar to the characteristics of previously analyzed training data sets) [Liu: col. 3, line 59-62];  8generating (i.e. generate) [Liu: col. 3, line 53]  a matrix based at least in part on the selected subset of the set of 9data features ((i.e. By way of example, the training data can be provided using a comma separated value format, or a sparse vector format) [Liu: col. 6, line 29-31] – Note: A vector is a 1xN matrix), wherein generating the matrix comprises iteratively ((i.e. Some examples of training functions that can be used to train a static predictive model include (without limitation): regression (e.g., linear regression, logistic regression), classification and regression tree, multivariate adaptive regression spline and other machine learning training functions (e.g., Naive Bayes, k-nearest neighbors, Support Vector Machines, Perceptron). Some examples of training functions that can be used to train an updateable predictive model include (without limitation) Online Bayes, Winnow, Support Vector Machine (SVM) Analogue, Maximum Entropy (MaxEnt), Gradient based (FOBOS) and AdaBoost with Mixed Norm Regularization. The training function repository 216 can include one or more of these example training functions. In some scenarios, a recency weighted predictive model can be trained. In general, a recency weighted predictive model is a predictive model that is trained giving increased significance to more recent training data data as compared to earlier received training data. A recency weighted predictive model can be used to improve predictive output in response to a change in input data) [Liu: col. 7, line 34-53] - Note: Liu discloses several regression methods, which process data by using matrix operations.  It is also well known in the arts that the regression methods iteratively process, or scanning, data); (i.e. a predictive model can be trained with different features, again generating different trained models. The selection of features, i.e., feature induction, can occur during multiple iterations of computing the training function over the training data.) [Liu: col. 8, line 21-25]) scanning the plurality of 10data records (i.e. Referring to FIG. 4, training data (i.e., initial training data) is received from the client computing system (402). For example, the client computing system 202 can upload the training data to the predictive modeling server system 206 over the network 204 either incrementally or in bulk (e.g., as one or more batches). As describe above, if the initial training data is uploaded incrementally, the training data can accumulate until a threshold volume is received before training of predictive models is initiated. The training data can be in any convenient form that is understood by the modeling server system 206 to define a set of records, where each record includes an input and a corresponding desired output. By way of example, the training data can be provided using a comma separated value format, or a sparse vector format) [Liu: col. 8, line 18-31], and wherein the matrix enables computation of feature coefficients for the 11selected subset of the set of data features based at least in part on an increasing ((i.e. Some examples of training functions that can be used to train a static predictive model include (without limitation): regression (e.g., linear regression, logistic regression), classification and regression tree, multivariate adaptive regression spline and other machine learning training functions (e.g., Naive Bayes, k-nearest neighbors, Support Vector Machines, Perceptron). Some examples of training functions that can be used to train an updateable predictive model include (without limitation) Online Bayes, Winnow, Support Vector Machine (SVM) Analogue, Maximum Entropy (MaxEnt), Gradient based (FOBOS) and AdaBoost with Mixed Norm Regularization. The training function repository 216 can include one or more of these example training functions. In some scenarios, a recency weighted predictive model can be trained. In general, a recency weighted predictive model is a predictive model that is trained giving increased significance to more recent training data data as compared to earlier received training data. A recency weighted predictive model can be used to improve predictive output in response to a change in input data) [Liu: col. 7, line 34-53] - Note: Liu discloses several regression methods, which process data by using matrix operations.  It is also well known in the arts that the regression methods iteratively process, or scanning, data); (i.e. a predictive model can be trained with different features, again generating different trained models. The selection of features, i.e., feature induction, can occur during multiple iterations of computing the training function over the training data.) [Liu: col. 8, line 21-25]) penalty value (i.e. For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models. Therefore, in the present example, where the type of predictive model is a linear regression model, changes to an L1 penalty generate different sets of parameters) [Liu: col. 8, line 15-20];  12sorting (i.e. ranked) [Liu: col. 9, line 34]  the selected subset of the set of data features according to an order (i.e. ranked based on the value of their respective scores) [Liu: col. 9, line 34-35] that 13the feature coefficients are set to zero as the penalty value increases;  14determining (i.e. determine) [Liu: col. 5, line 27] a plurality of nested linear models according to the sorting ((i.e. In some examples, the predictive modeling server system can be configured to rank the predictive models and/or their associated filters, and may select a predetermined number of filters as effective filters. For example, the predictive modeling server system may identify the top three filters and/or filter combinations based on the level of accuracy of their associated predictive models.) [Liu: col. 17, line 8-15]; (i.e. where the type of predictive model is a linear regression model) [Liu: col. 8, line 18-19]); and  15selecting ((i.e. selecting) [Liu: col. 4, line 15]; (i.e. determine) [Liu: col. 5, line 27])  a linear model (i.e. selection of the predictive model) [Liu: col. 5, line 65] of the plurality of nested linear models (i.e. where the type of predictive model is a linear regression model) [Liu: col. 8, line 18-19] based at least 16in part on a model quality criterion (i.e. determine which predictive model to use (e.g., by determining the best model for the data when multiple predictive modules are trained).) [Liu: col. 5, line 27-30] and the plurality of nested linear models ((i.e. In some examples, the predictive modeling server system can be configured to rank the predictive models and/or their associated filters, and may select a predetermined number of filters as effective filters. For example, the predictive modeling server system may identify the top three filters and/or filter combinations based on the level of accuracy of their associated predictive models.) [Liu: col. 17, line 8-15]; (i.e. where the type of predictive model is a linear regression model) [Liu: col. 8, line 18-19]).    
Liu does not explicitly disclose the following claim limitations (Emphasis added).
A method for automated feature selection for linear model generation, 2comprising:  3determining, for a set of data features related to a plurality of data records, a 4set of relevance measurements, wherein each relevance measurement of the set of relevance 5measurements corresponds to a respective feature of the set of data features;  6selecting a subset of the set of data features based at least in part on the set of 7relevance measurements;  8generating a matrix based at least in part on the selected subset of the set of 9data features, wherein generating the matrix comprises iteratively scanning the plurality of 10data records, and wherein the matrix enables computation of feature coefficients for the 11selected subset of the set of data features based at least in part on an increasing penalty value;  12sorting the selected subset of the set of data features according to an order that 13the feature coefficients are set to zero as the penalty value increases;  14determining a plurality of nested linear models according to the sorting; and  15selecting a linear model of the plurality of nested linear models based at least 16in part on a model quality criterion and the plurality of nested linear models. 
However, in the same field of endeavor Datta further discloses the claim limitations and the deficient claim limitations as follows:
generating a matrix based at least in part on the selected subset of the set of 9data features (i.e. the behavioral representation can be in the form of a matrix) [Datta: col. 7, line 47-48], (i.e. In other embodiments, principal components analysis may then be applied to these vectors, in order to project the wavelet coefficients into ten dimensions, which the inventors have found still captures >95% of total variance) [Datta: col. 19, line 37-40] (i.e. For the  autoregressive parameters, a prior that included a Lasso-like penalty can be used to encourage uninformative lag indices to have their corresponding regression matrix coefficients tend to zero) [Datta: col. 23, line 46-49] as the penalty value increases;  
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu with Datta to program the system to implement of the Datta’s method.  
Therefore, the combination of Liu with Datta will enable the system to identify a previously-unexplored sub-second regularity that defines a timescale upon which behavior is organized, yields important information about the components and structure of behavior, offers insight into the nature of behavioral change in the subject, and enables objective discovery of subtle alterations in patterned action [Datta: Abstract]. 
Liu and Datta do not explicitly disclose the following claim limitations (Emphasis added).
as the penalty value increases.
However, in the same field of endeavor Mun further discloses the claim limitations and the deficient claim limitations, as follows:
as the penalty value increases (i.e. SC imposes a greater penalty for additional coefficients than the AIC) [Mun: col. 50, line 10-11].  
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu and Datta with Mun to program the system to implement of the Mun’s method.  
Therefore, the combination of Liu and Datta with Mun will enable the system to determine the best fitting model [Mun: col. 46, line 14-24].

Regarding claim 2, Liu and Datta meet the claim limitations as set forth in claim 1.Liu and Datta further meets the claim limitations as follow.
The method of claim 1 (i.e. a computer-implemented method) [Liu: col. 1, line 31],  further comprising:   2 identifying a curve corresponding to the set of relevance measurements ((i.e. determining that the second training data set includes one or more characteristics that are similar to the one or more characteristics associated with the training data set) [Liu: col. 1, line 55-57];  (i.e. The rate at which this auto-correlogram declines in a behavior mouse is a measure of a fundamental timescale of behavior, which may be characterized as a time-constant, tau, of an exponentially-decaying curve) [Datta: col. 20, line 16-18]) sorted 3in descending order, wherein selecting the subset of the set of data features (i.e. the training and model selection can occur in an automated fashion) [Liu: col. 9, line 20-21] is further based at 4least in part on a shape of the curve  (i.e. modifying the orientation of the subject in at least a subset of the set of frames so that the feature is oriented in the same direction with respect to the coordinate system to output a set of aligned frames) [Datta: col. 50, line 19-22; Fig. 25].  
Liu and Datta do not explicitly disclose the following claim limitations (Emphasis added).
The method of claim 1, further comprising:  2identifying a curve corresponding to the set of relevance measurements sorted 3in descending order, wherein selecting the subset of the set of data features is further based at 4least in part on a shape of the curve.    
However, in the same field of endeavor Mun further discloses the claim limitations and the deficient claim limitations, as follows:
(i.e. Ranks the rows of data for the selected variable in descending order) [Mun: col. 52, line 31-32].  
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu and Datta with Mun to program the system to implement of the Mun’s method.  
Therefore, the combination of Liu and Datta with Mun will enable the system to determine the best fitting model [Mun: col. 46, line 14-24]. 

Regarding claim 3, Liu and Mun meet the claim limitations as set forth in claim 2.Liu further meets the claim limitations as follow.
The method of claim 2 (i.e. a computer-implemented method) [Liu: col. 1, line 31], wherein selecting the subset of the set of data 2features ((i.e. make a selection of a trained predictive model) [Liu: col. 4, line 36-37]; (i.e. the training and model selection can occur in an automated fashion) [Liu: col. 9, line 20-21]) comprises:  3fitting one or more boxes to the curve based at least in part on a least squares 4analysis, wherein features contained within the one or more boxes correspond to the subset of 5the set of data features. 
Liu does not explicitly disclose the following claim limitations (Emphasis added).
The method of claim 2, wherein selecting the subset of the set of data 2features comprises:  3fitting one or more boxes to the curve based at least in part on a least squares 4analysis, wherein features contained within the one or more boxes correspond to the subset of 5the set of data features.    
However, in the same field of endeavor Datta further discloses the claim limitations and the deficient claim limitations, as follows:
fitting one or more boxes to the curve based at least in part on a least squares 4analysis ((i.e.  In some embodiments, both that include model-free algorithms 320 or the model fitting 315 algorithm, the information captured in each pixel often is either highly correlated (neighboring pixels) or uninformative (pixels on the border of the image) [Datta: col. 36, line 26-30]; (i.e. As illustrated in FIG. 3, the output of the orientation corrected images in some embodiments will be to a principle component analysis time series 310 or other statistical methods for reducing data points. In some embodiments, the data will be run through a model fitting algorithm 315 such as the AR-HMM algorithm or SLDS SVAE algorithm disclosed herein, or may be run through a model free algorithm 320 as disclosed in order to identify behavior modules 300 contained within the video data.) [Datta: col. 50, line 52-61; Fig. 3]; (i.e. In addition, by fitting an interpretable model to data, the data were 'parsed' in a manner that revealed the latent variable structure that the model posits gave rise to the data (including parameters describing the number and identities of the states as well as parameters describing the transitions between the states).) [Datta: col. 21, line 28-33]), wherein features contained within the one or more boxes correspond to the subset of 5the set of data features ((i.e. In some embodiments, to calculate this value for each pair (i, j) of modules, for example, a square nxn matrix, A, may be utilized where n is the number of total modules in the label sequence. Then, the systems and methods may scan through the label sequences that were saved at the last iteration of Gibbs sampling, incrementing the entry A[i, j] for every time the system identifies a syllable i directly preceding a syllable j. At the end of the label sequence, the system may divide by the number of total bigrams observed. In order to visually organize those modules that were specifically up-regulated or selectively expressed as a result of a manipulation, the system may assign a selectivity index to each module. For example, where p(condition) indicates the percent usage of a module in a condition, the system may sort modules in the circular open field versus square box comparison by (p( circle ) - p(square ) / (p( circle ) + p(square )).) [Datta: col. 24, line 34-50].   
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu and Mun with Datta to program the system to implement of the Datta’s method.  
Therefore, the combination of Liu and Mun with Datta will enable the system to identify a previously-unexplored sub-second regularity that defines a timescale upon which behavior is organized, yields important information about the components and structure of behavior, offers insight into the nature of behavioral change in the subject, and enables objective discovery of subtle alterations in patterned action [Datta: Abstract]. 
In addition, Mun also discloses the claim limitations as follows:
fitting one or more boxes to the curve (i.e. A Logit or Logistic regression is used for predicting the probability of occurrence of an event by fitting data to a logistic curve) [Mun: col. 73, line 52 - 54].  
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu and Datta with Mun to program the system to implement of the Mun’s method.  
Therefore, the combination of Liu and Datta with Mun will enable the system to determine the best fitting model [Mun: col. 46, line 14-24]. 

Regarding claim 4, Liu meets the claim limitations as set forth in claim 5.Liu further meets the claim limitations as follow.
The method of claim 1 (i.e. a computer-implemented method) [Liu: col. 1, line 31], wherein generating (i.e. generate) [Liu: col. 3, line 53] the matrix based at least in part on the selected subset of the set of data features  ((i.e. By way of example, the training data can be provided using a comma separated value format, or a sparse vector format) [Liu: col. 6, line 29-31] – Note: A vector is a 1xN matrix) and sorting the selected subset of the setAttorney Docket No. P106 (93056.0169)Salesforce Ref. No. A4179US 46 3of data features according to the order that the feature coefficients (i.e. ranked based on the value of their respective scores) [Liu: col. 9, line 34-35]  are set to zero as the 4penalty value increases comprise:  
5performing a least absolute shrinkage and selection operator (LASSO) 6regression procedure.
Liu does not explicitly disclose the following claim limitations (Emphasis added).
The method of claim 1, wherein generating the matrix based at least in 2 part on the selected subset of the set of data features and sorting the selected subset of the set Attorney Docket No. P106 (93056.0169)Salesforce Ref. No. A4179US 46 3of data features according to the order that the feature coefficients are set to zero as the 4penalty value increases comprise:  5performing a least absolute shrinkage and selection operator (LASSO) 6regression procedure. 
However, in the same field of endeavor Datta further discloses the claim limitations and the deficient claim limitations, as follows:
the feature coefficients are set to zero as the 4penalty value increases comprise (i.e. In other embodiments, principal components analysis may then be applied to these vectors, in order to project the wavelet coefficients into ten dimensions, which the inventors have found still captures >95% of total variance) [Datta: col. 19, line 37-40]:  5performing a least absolute shrinkage and selection operator (LASSO) 6regression procedure (i.e. For the  autoregressive parameters, a prior that included a Lasso-like penalty can be used to encourage uninformative lag indices to have their corresponding regression matrix coefficients tend to zero) [Datta: col. 23, line 46-49].
 It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu with Datta to program the system to implement of the Datta’s method.  
Therefore, the combination of Liu with Datta will enable the system to identify a previously-unexplored sub-second regularity that defines a timescale upon which behavior is organized, yields important information about the components and structure of behavior, offers insight into the nature of behavioral change in the subject, and enables objective discovery of subtle alterations in patterned action [Datta: Abstract]. 

Regarding claim 5, Liu meets the claim limitations as set forth in claim 1.Liu further meets the claim limitations as follow.
The method of claim 1 (i.e. a computer-implemented method) [Liu: col. 1, line 31], wherein iteratively scanning the plurality of 2data records (i.e. a predictive model can be trained with different features, again generating different trained models. The selection of features, i.e., feature induction, can occur during multiple iterations of computing the training function over the training data.) [Liu: col. 8, line 21-25] comprises:  3performing batch processing on the plurality of data records stored in a 4database to generate the matrix (i.e. Referring to FIG. 4, training data (i.e., initial training data) is received from the client computing system (402). For example, the client computing system 202 can upload the training data to the predictive modeling server system 206 over the network 204 either incrementally or in bulk (e.g., as one or more batches). As describe above, if the initial training data is uploaded incrementally, the training data can accumulate until a threshold volume is received before training of predictive models is initiated. The training data can be in any convenient form that is understood by the modeling server system 206 to define a set of records, where each record includes an input and a corresponding desired output. By way of example, the training data can be provided using a comma separated value format, or a sparse vector format) [Liu: col. 8, line 18-31]. 

Regarding claim 6, Liu meets the claim limitations as set forth in claim 1.Liu further meets the claim limitations as follow.
The method of claim 1 (i.e. a computer-implemented method) [Liu: col. 1, line 31], wherein generating (i.e. generate) [Liu: col. 3, line 53] the matrix (i.e. a sparse vector format) [Liu: col. 6, line 31] further 2comprises:  3reading a first subset of the plurality of data records (i.e. Referring to FIG. 4, training data (i.e., initial training data) is received from the client computing system (402). For example, the client computing system 202 can upload the training data to the predictive modeling server system 206 over the network 204 either incrementally or in bulk (e.g., as one or more batches)) [Liu: col. 8, line 18-23];  4performing a first matrix building procedure using the first subset of the 5plurality of data records (i.e. The training data can be in any convenient form that is understood by the modeling server system 206 to define a set of records, where each record includes an input and a corresponding desired output. By way of example, the training data can be provided using a comma separated value format, or a sparse vector format) [Liu: col. 8, line 26-31];  6reading a second subset of the plurality of data records (i.e. if the initial training data is uploaded incrementally, the training data can accumulate until a threshold volume is received before training of predictive models is initiated.) [Liu: col. 8, line 23-26]; and  7performing a second matrix building procedure using the second subset of the 8plurality of data records, wherein the matrix is generated based at least in part on the first 9matrix building procedure and the second matrix building procedure ((i.e. As describe above, if the initial training data is uploaded incrementally, the training data can accumulate until a threshold volume is received before training of predictive models is initiated. The training data can be in any convenient form that is understood by the modeling server system 206 to define a set of records, where each record includes an input and a corresponding desired output. By way of example, the training data can be provided using a comma separated value format, or a sparse vector format) [Liu: col. 8, line 23-31] – Note: Liu discloses that data is uploaded incrementally and the vector is built based on data. Since the first vector is built from the first set of data, the second vector is built from the incremental data; hence the second vector is also based on the first vector).   2
In the same field of endeavor Datta further discloses the claim limitations as follows:
wherein the matrix is generated based at least in part on the first 9matrix building procedure and the second matrix building procedure ((i.e.  FIG. 22 depicts, in accordance with various embodiments of the present invention, graphical model for the AR-HMM. The shaded nodes labeled y_t for time indices t = 1, 2, ... ,
T represent the preprocessed 3D data sequence. Each such data node y_t has a corresponding state node x_t which assigns that data frame to a behavioral mode. The other nodes represent the parameters which govern the transitions between modes (i.e. the transition matrix it) and the autoregressive dynamical parameters for each mode (i.e. the set of parameters θ)) [Datta: col. 14, line 18-27; Fig. 22].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu with Datta to program the system to implement of the Datta’s method.  
Therefore, the combination of Liu with Datta will enable the system to identify a previously-unexplored sub-second regularity that defines a timescale upon which behavior is organized, yields important information about the components and structure of behavior, offers insight into the nature of behavioral change in the subject, and enables objective discovery of subtle alterations in patterned action [Datta: Abstract].
Mun further discloses the claim limitations as follows:
wherein the matrix is generated based at least in part on the first 9matrix building procedure and the second matrix building procedure (i.e. Another quick test is to create a correlation matrix between the independent variables. A high cross correlation indicates a potential for multicollinearity. The rule of thumb is that a correlation with an absolute value greater than 0.75 is indicative of severe multicollinearity) [Mun: col. 48, line 66 - col. 49, line 3].  
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu and Datta with Mun to program the system to implement of the Mun’s method.  
Therefore, the combination of Liu and Datta with Mun will enable the system to determine the best fitting model [Mun: col. 46, line 14-24]. 

Regarding claim 7, Liu and Datta meet the claim limitations as set forth in claim 6.Liu and Datta further meet the claim limitations as follow.
The method of claim 1 (i.e. a computer-implemented method) [Liu: col. 1, line 31], wherein the first subset of the plurality of data 2records and the second subset of the plurality of data records (i.e. the training data (e.g., all K partitions)) [Liu: col. 9, line 45] each comprise a respective 3single data record (i.e. producing a single number per data point) [Datta: col. 19, line 29-30].
Mun further discloses the claim limitations as follows:
wherein the first subset of the plurality of data 2records (i.e.  The auto-fill function allows users to enter a single value on a line item) [Mun: col. 10, line 59 - 60] and the second subset of the plurality of data records (i.e.  The auto-fill function allows users to enter a single value on a line item) [Mun: col. 10, line 59 - 60] each comprise a respective 3single data record (i.e.  Users can also choose to run the input assumptions as unique inputs, group them as a line item 110 (all individual inputs on a single line item are assumed to be one variable)) [Mun: col. 11, line 41-43].  
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu and Datta with Mun to program the system to implement of the Mun’s method.  
Therefore, the combination of Liu and Datta with Mun will enable the system to determine the best fitting model [Mun: col. 46, line 14-24]. 

Regarding claim 8, Liu meets the claim limitations as set forth in claim 1.Liu further meets the claim limitations as follow.
The method of claim 1 (i.e. a computer-implemented method) [Liu: col. 1, line 31], wherein selecting the subset of the set of data 2features ((i.e. make a selection of a trained predictive model) [Liu: col. 4, line 36-37]; (i.e. the training and model selection can occur in an automated fashion) [Liu: col. 9, line 20-21]) comprises:  33determining a first set of model quality criterion values for the plurality of 4nested linear models ((i.e. determine which predictive model to use (e.g., by determining the best model for the data when multiple predictive modules are trained).) [Liu: col. 5, line 27-30] ; (i.e. where the type of predictive model is a linear regression model) [Liu: col. 8, line 18-19]) according to a first sampling interval, wherein a number of values in the 5first set of model quality criterion values is less than a number of models in the plurality of 6nested linear models;  7identifying a model of the plurality of nested linear models ((i.e. determine which predictive model to use (e.g., by determining the best model for the data when multiple predictive modules are trained).) [Liu: col. 5, line 27-30]; (i.e. where the type of predictive model is a linear regression model) [Liu: col. 8, line 18-19]) corresponding to a minimum value of the first set of model quality criterion values; and  Attorney Docket No. P106 (93056.0169)Salesforce Ref. No. A4179US 47 9determining a subset of the plurality of nested linear models based at least in 10part on the identified model and a threshold value, wherein the subset of the plurality of 11nested linear models comprises the selected linear model ((i.e. determine which predictive model to use (e.g., by determining the best model for the data when multiple predictive modules are trained).) [Liu: col. 5, line 27-30]; (i.e. where the type of predictive model is a linear regression model) [Liu: col. 8, line 18-19]). 
Liu does not explicitly disclose the following claim limitations (Emphasis added).
The method of claim 1, wherein selecting the linear model further 2comprises:  3determining a first set of model quality criterion values for the plurality of 4nested linear models according to a first sampling interval, wherein a number of values in the 5first set of model quality criterion values is less than a number of models in the plurality of 6nested linear models;  7identifying a model of the plurality of nested linear models corresponding to a minimum value of the first set of model quality criterion values; and  Attorney Docket No. P106 (93056.0169)Salesforce Ref. No. A4179US 47 9determining a subset of the plurality of nested linear models based at least in 10part on the identified model and a threshold value, wherein the subset of the plurality of 11nested linear models comprises the selected linear model.     
However, in the same field of endeavor Mun further discloses the claim limitations and the deficient claim limitations, as follows:
(i.e. This one-variable t-test of means is appropriate when the population standard deviation is not known but the sampling distribution is assumed to be approximately normal (the t-test is used when the sample size is less than 30) [Mun: col. 52, line 54-56], wherein a number of values in the 5first set of model quality criterion values is less than a number of models in the plurality of 6nested linear models (i.e. This t-test can be applied to three types of hypothesis tests to be examined-a two-tailed test, a right-tailed test, and a left-
tailed test-based on the sample dataset if the population mean is equal to, less than, or greater than the hypothesized mean.) [Mun: col. 52, line 56-62];  7identifying a model of the plurality of nested linear models corresponding to a minimum value of the first set of model quality criterion values (i.e. SC imposes a greater penalty for additional coefficients than the AIC but, generally, the model with the lowest AIC and SC values should be chosen) [Mun: col. 50, line 10-13]; and  Attorney Docket No. P106 (93056.0169)Salesforce Ref. No. A4179US 47 9determining a subset of the plurality of nested linear models based at least in 10part on the identified model (i.e. SC imposes a greater penalty for additional coefficients than the AIC but, generally, the model with the lowest AIC and SC values should be chosen) [Mun: col. 50, line 10-13] and a threshold value (i.e. a critical I* threshold exists) [Mun: col. 74, line 36], 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu with Mun to program the system to implement of the Mun’s method.  
Therefore, the combination of Liu with Mun will enable the system to determine the best fitting model [Mun: col. 46, line 14-24]. 
In the same field of endeavor Datta further discloses the claim limitations and the deficient claim limitations as follows:
(i.e.   they are probabilistic because that process is defined mathematically in terms of sampling from probability distributions. In addition, by fitting an interpretable model to data, the data were 'parsed' in a manner that revealed the latent variable structure that the model posits gave rise to the data (including parameters describing the number and identities of the states as well as parameters describing the transitions between the states)) [Datta: col. 21, line 26-33], wherein a number of values in the 5first set of model quality criterion values is less than a number of models in the plurality of 6nested linear models (i.e. an algorithm may determine whether the signal has crossed
some threshold) [Datta: col. 20, line 46-47];  (i.e. an algorithm may determine whether the signal has crossed some threshold) [Datta: col. 20, line 46-47]; and  Attorney Docket No. P106 (93056.0169)Salesforce Ref. No. A4179US 47 (i.e. an algorithm may determine whether the signal has crossed some threshold) [Datta: col. 20, line 46-47], 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu and Mun with Datta to program the system to implement of the Datta’s method.  
Therefore, the combination of Liu and Mun with Datta will enable the system to identify a previously-unexplored sub-second regularity that defines a timescale upon which behavior is organized, yields important information about the components and structure of behavior, offers insight into the nature of behavioral change in the subject, and enables objective discovery of subtle alterations in patterned action [Datta: Abstract]. 

Regarding claim 11, Liu and Mun meet the claim limitations as set forth in claim 8.Liu further meets the claim limitations as follow.
The method of claim 8 (i.e. a computer-implemented method) [Liu: col. 1, line 31], wherein the threshold value comprises a model 2reduction factor for a rake sampling procedure, a feature range for the subset of the plurality 3of nested linear models, or a combination thereof.
Liu and Mun do not explicitly disclose the following claim limitations (Emphasis added).
The method of claim 8, wherein the threshold value comprises a model 2reduction factor for a rake sampling procedure, a feature range for the subset of the plurality 3of nested linear models, or a combination thereof.
However, in the same field of endeavor Datta further discloses the claim limitations and the deficient claim limitations as follows:
(i.e. an algorithm may determine whether the signal has crossed some threshold) [Datta: col. 20, line 46-47] comprises a model 2reduction factor for (i.e.  FIG. 4 provides an example of how an AR-HMM algorithm can convert input data (spine aligned depth imaging data 305 that has been dimensionally reduced 405 using PCA 310) into a fit model that describes the number of behavioral modules and the trajectories they encode through PCA space, the module-specific duration distributions that govern how long any trajectory within a given module lasts, and the transition matrix that describes how these individual modules interconnect over time) [Datta: col. 22, line 7-15; Fig. 4] a rake sampling procedure, a feature range for the subset of the plurality 3of nested linear models (i.e. In some embodiments, systems may utilize a discrete-time hidden Markov model 315 (HMM) to identify behavior modules. HMMs encompass a range of stochastic processes for modeling sequential and time series data) [Datta: col. 21, line 53-56; Fig. 4], or a combination thereof.
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu and Mun with Datta to program the system to implement of the Datta’s method.  
Therefore, the combination of Liu and Mun with Datta will enable the system to identify a previously-unexplored sub-second regularity that defines a timescale upon which behavior is organized, yields important information about the components and structure of behavior, offers insight into the nature of behavioral change in the subject, and enables objective discovery of subtle alterations in patterned action [Datta: Abstract]. 

Regarding claim 12, Liu meets the claim limitations as set forth in claim 1.Liu further meets the claim limitations as follow.
The method of claim 1 (i.e. a computer-implemented method) [Liu: col. 1, line 31], further comprising:  Attorney Docket No. P106 (93056.0169)Salesforce Ref. No. A4179US 48 2receiving (i.e. receive data) [Liu: col. 19, line 24], based at least in part on a user input (i.e. receiving user input from a user interacting with the client device) [Liu: col. 20, line 15-16], a set of user-selected features (i.e. Data generated at the client device (e.g., a result of the user interaction)) [Liu: col. 20, line 16-18] 3to remove from the set of data features; and  4determining (i.e. determine) [Liu: col. 5, line 27] an initial subset of the set of data features according to the set of 5user-selected features (i.e. The training data can include initial training data, which may be a relatively large volume of training data the client entity has accumulated) [Liu: col. 16, line 10-12] to remove, wherein the subset of the set of data features is selected 6from the initial subset of the set of data features (i.e. The training data can include initial training data, which may be a relatively large volume of training data the client entity has accumulated) [Liu: col. 16, line 10-12].
Liu does not explicitly disclose the following claim limitations (Emphasis added).
The method of claim 1, further comprising:  Attorney Docket No. P106 (93056.0169)Salesforce Ref. No. A4179US 48 2receiving, based at least in part on a user input, a set of user-selected features 3to remove from the set of data features; and  4determining an initial subset of the set of data features according to the set of 5user-selected features to remove, wherein the subset of the set of data features is selected 6from the initial subset of the set of data features.
However, in the same field of endeavor Mun further discloses the claim limitations and the deficient claim limitations as follows:
a set of user-selected features 3to remove from the set of data features (i.e. Users can change the range of the discount rates to show/compute 040 by entering the "From/To" percent and clicking on Update, copy the results 044, and copy the NPV Profile chart, as well as use any of the chart icons 042 to manipulate the chart's look and feel (e.g., change the chart's line/background color, chart type, chart view, or add/remove gridlines, show/hide labels, and show/hide legend). Users can also change the variable to display in the chart 041. For instance, users can change the chart from displaying the NPV Profile to the time-series charts of net cash flows, taxable income, operating cash flows, cumulative final cash flows, present value of the final cash flows, and so forth. Users can then click on the Copy Results or Copy Chart) [Mun: col. 8, line 18-30]; and  4determining an initial subset of the set of data features according to the set of 5user-selected features to remove (i.e. Users can change the range of the discount rates to show/compute 040 by entering the "From/To" percent and clicking on Update, copy the results 044, and copy the NPV Profile chart, as well as use any of the chart icons 042 to manipulate the chart's look and feel (e.g., change the chart's line/background color, chart type, chart view, or add/remove gridlines, show/hide labels, and show/hide legend). Users can also change the variable to display in the chart 041. For instance, users can change the chart from displaying the NPV Profile to the time-series charts of net cash flows, taxable income, operating cash flows, cumulative final cash flows, present value of the final cash flows, and so forth. Users can then click on the Copy Results or Copy Chart) [Mun: col. 8, line 18-30], 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu with Mun to program the system to implement of the Mun’s method.  
Therefore, the combination of Liu with Mun will enable the system to determine the best fitting model [Mun: col. 46, line 14-24]. In addition, in the same field of endeavor Datta further discloses the remove limitations as follows:
remove(i.e. removing certain portions of the model structure. For instance, removing the discrete switching dynamics captured in the transition matrix and replacing them with a mixture model may generate an alternative model in which the distribution over each discrete state does not depend on its previous state) [Datta: col. 19, line 67 – col. 20, line 5].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu and Mun with Datta to program the system to implement of the Datta’s method.  
Therefore, the combination of Liu and Mun with Datta will enable the system to identify a previously-unexplored sub-second regularity that defines a timescale upon which behavior is organized, yields important information about the components and structure of behavior, offers insight into the nature of behavioral change in the subject, and enables objective discovery of subtle alterations in patterned action [Datta: Abstract]. 


Regarding claim 13, Liu meets the claim limitations as set forth in claim 1.Liu further meets the claim limitations as follow.
The method of claim 1 (i.e. a computer-implemented method) [Liu: col. 1, line 31], further comprising:  2displaying (i.e. displaying) [Liu: col. 20, line 15] the selected linear model ((i.e. The operator can select one of the available predictive models, e.g., by clicking on the name or icon. In response, a second web page (e.g., a form) can be displayed that prompts the operator to upload input data that can be used by the selected trained model to provide predictive output data) [Liu: col. 12, line 1-6]), an indication of data features 3corresponding to the selected linear model, or both in a user interface (i.e. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. In some embodiments, a server transmits data (e.g., an HTML page) to a client device (e.g., for purposes of displaying data to and receiving user input from a user interacting with the client device)) [Liu: col. 20, line 10-16].

Regarding claim 14, Liu meets the claim limitations as set forth in claim 1.Liu further meets the claim limitations as follow.
The method of claim 1 (i.e. a computer-implemented method) [Liu: col. 1, line 31], wherein 2transmitting, to a database, a user device, or a combination thereof (i.e. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. In some embodiments, a server transmits data (e.g., an HTML page) to a client device) [Liu: col. 20, line 10-14], the 3selected linear model, an indication of data features corresponding to the selected linear 4model, or both ((i.e. selection of the predictive model) [Liu: col. 5, line 65]; (i.e. where the type of predictive model is a linear regression model) [Liu: col. 8, line 18-19]).  

Regarding claim 15, Liu meets the claim limitations as set forth in claim 1.Liu further meets the claim limitations as follow.
The method of claim 1 (i.e. a computer-implemented method) [Liu: col. 1, line 31], wherein the set of relevance measurements (i.e. data sets that have characteristics similar to the characteristics of previously analyzed training data sets) [Liu: col. 3, line 60-62] 2comprises a set of stump R-squared values.  
Liu does not explicitly disclose the following claim limitations (Emphasis added).
The method of claim 1, wherein the set of relevance measurements 2comprises a set of stump R-squared values.  
However, in the same field of endeavor Mun further discloses the claim limitations and the deficient claim limitations, as follows:
the set of relevance measurements 2comprises a set of stump R-squared values (i.e. In order to determine the best fitting model, we apply several goodness-of-fit statistics to provide a glimpse into the accuracy and reliability of the estimated regression model. They usually take the form of at-statistic, F-statistic, R-squared statistic, adjusted R-squared statistic, Durbin-Watson statistic, Akaike Criterion, Schwarz Criterion, and their respective probabilities. The R-squared (R2), or coefficient of determination, is an error measurement that looks at the percent variation of the dependent variable that can be explained by the variation in the independent variable for a regression analysis.) [Mun: col. 46, line 14-24].  
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu and Datta with Mun to program the system to implement of the Mun’s method.  
Therefore, the combination of Liu and Datta with Mun will enable the system to determine the best fitting model [Mun: col. 46, line 14-24]. 

Regarding claim 16, Liu meets the claim limitations as set forth in claim 1.Liu further meets the claim limitations as follow.
The method of claim 1 (i.e. a computer-implemented method) [Liu: col. 1, line 31], wherein the model quality criterion (i.e. the best model for the data when multiple predictive modules are trained.) [Liu: col. 5, line 29-30]2 comprises 2an Akaike information criterion (AIC).  
Liu does not explicitly disclose the following claim limitations (Emphasis added).
The method of claim 1, wherein the model quality criterion comprises 2an Akaike information criterion (AIC).   
However, in the same field of endeavor Mun further discloses the claim limitations and the deficient claim limitations, as follows:
wherein the model quality criterion comprises 2an Akaike information criterion (AIC) (i.e. In order to determine the best fitting model, we apply several goodness-of-fit statistics to provide a glimpse into the accuracy and reliability of the estimated regression model. They usually take the form of at-statistic, F-statistic, R-squared statistic, adjusted R-squared statistic, Durbin-Watson statistic, Akaike Criterion, Schwarz Criterion, and their respective probabilities. The R-squared (R2), or coefficient of determination, is an error measurement that looks at the percent variation of the dependent variable that can be explained by the variation in the independent variable for a regression analysis.) [Mun: col. 46, line 14-24].  
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu and Datta with Mun to program the system to implement of the Mun’s method.  
Therefore, the combination of Liu and Datta with Mun will enable the system to determine the best fitting model [Mun: col. 46, line 14-24].

Regarding claim 17, Liu meets the claim limitations as set forth in claim 1.Liu further meets the claim limitations as follow.
The method of claim 1 (i.e. a computer-implemented method) [Liu: col. 1, line 31], wherein the set of data features comprises 2single features, compound features, or a combination thereof (i.e. data comprises examples that each comprise one or more data values (or "features")) [Liu: col. 2, line 62-63].  

Regarding claim 18, Liu meets the claim limitations as set forth in claim 1.Liu further meets the claim limitations as follow.
The method of claim 1 (i.e. a computer-implemented method) [Liu: col. 1, line 31], wherein selecting the linear model ((i.e. make a selection of a trained predictive model) [Liu: col. 4, line 36-37]; (i.e. where the type of predictive model is a linear regression model) [Liu: col. 8, line 18-19]) comprises 2two passes through the plurality of data records ((i.e. a predictive model can be trained with different features, again generating different trained models. The selection of features, i.e., feature induction, can occur during multiple iterations of computing the training function over the training data.) [Liu: col. 8, line 21-25] – Note: Multiple iterations discloses at least two passes through the plurality of data records).

Regarding claim 19, Liu meets the claim limitations as follow.
An apparatus (i.e. processor) [Liu: col. 17, line 52] for automated feature selection ((i.e. make a selection of a trained predictive model) [Liu: col. 4, line 36-37]; (i.e. the training and model selection can occur in an automated fashion) [Liu: col. 9, line 20-21]) for linear model 2generation ((i.e. where the type of predictive model is a linear regression model) [Liu: col. 8, line 18-19]), comprising:  3a processor (i.e. processor) [Liu: col. 17, line 52];  4memory in electronic communication with the processor  ((i.e. The server 702 can store instructions that implement operations associated with the modules described above, for example, on the computer readable medium 716 or one or more additional devices 714, for example, one or more of a floppy disk device, a hard disk device, an optical disk device, or a tape device) [Liu: col. 17, line 62-67; Fig. 7]; (i.e. Each processor 712 is capable of processing instructions for execution) [Liu: col. 17, line 52-53]); and  5instructions stored in the memory (i.e. The server 702 can store instructions that implement operations associated with the modules described above, for example, on the computer readable medium 716 or one or more additional devices 714, for example, one or more of a floppy disk device, a hard disk device, an optical disk device, or a tape device) [Liu: col. 17, line 62-67; Fig. 7] and executable by the processor to cause the 6apparatus (i.e. Each processor 712 is capable of processing instructions for execution) [Liu: col. 17, line 52-53] to:  3determine (i.e. determine) [Liu: col. 5, line 27], for a set of data features (i.e. data comprises examples that each comprise one or more data values (or "features")) [Liu: col. 2, line 62-63] related to a plurality of data records ((i.e. a record of the stored association) [Liu: col. 15, line 50]; (i.e. Training data such as that represented by TABLE 1) [Liu: col. 3, line 10, Table 1] – Note: Table is a record data), a 4set of relevance measurements ((i.e. data sets that have characteristics similar to the characteristics of previously analyzed training data sets) [Liu: col. 3, line 60-62] (i.e. data comprises examples that each comprise one or more data values (or "features") plus an answer (a category or a value)) [Liu: col. 2, line 62-64]), wherein each relevance measurement of the set of relevance 5measurements corresponds to a respective feature of the set of data features (i.e. data comprises examples that each comprise one or more data values (or "features") plus an answer (a category or a value) for that example) [Liu: col. 2, line 62-64; Please see examples in Tables 1, 2];  6select (i.e. selecting) [Liu: col. 4, line 15]  a subset of the set of data features (i.e. The selection of features, i.e., feature induction, can occur during multiple iterations of computing the training function over the training data) [Liu: col. 8, line 23-25]) based at least in part on the set of 7relevance measurements (i.e. An optimal filter combination can be selected for use with new training data sets that have characteristics similar to the characteristics of previously analyzed training data sets) [Liu: col. 3, line 59-62];  8generate (i.e. generate) [Liu: col. 3, line 53]  a matrix based at least in part on the selected subset of the set of 9data features ((i.e. By way of example, the training data can be provided using a comma separated value format, or a sparse vector format) [Liu: col. 6, line 29-31] – Note: A vector is a 1xN matrix), wherein generating the matrix comprises iteratively ((i.e. Some examples of training functions that can be used to train a static predictive model include (without limitation): regression (e.g., linear regression, logistic regression), classification and regression tree, multivariate adaptive regression spline and other machine learning training functions (e.g., Naive Bayes, k-nearest neighbors, Support Vector Machines, Perceptron). Some examples of training functions that can be used to train an updateable predictive model include (without limitation) Online Bayes, Winnow, Support Vector Machine (SVM) Analogue, Maximum Entropy (MaxEnt), Gradient based (FOBOS) and AdaBoost with Mixed Norm Regularization. The training function repository 216 can include one or more of these example training functions. In some scenarios, a recency weighted predictive model can be trained. In general, a recency weighted predictive model is a predictive model that is trained giving increased significance to more recent training data data as compared to earlier received training data. A recency weighted predictive model can be used to improve predictive output in response to a change in input data) [Liu: col. 7, line 34-53] - Note: Liu discloses several regression methods, which process data by using matrix operations.  It is also well known in the arts that the regression methods iteratively process, or scanning, data); (i.e. a predictive model can be trained with different features, again generating different trained models. The selection of features, i.e., feature induction, can occur during multiple iterations of computing the training function over the training data.) [Liu: col. 8, line 21-25]) scanning the plurality of 10data records (i.e. Referring to FIG. 4, training data (i.e., initial training data) is received from the client computing system (402). For example, the client computing system 202 can upload the training data to the predictive modeling server system 206 over the network 204 either incrementally or in bulk (e.g., as one or more batches). As describe above, if the initial training data is uploaded incrementally, the training data can accumulate until a threshold volume is received before training of predictive models is initiated. The training data can be in any convenient form that is understood by the modeling server system 206 to define a set of records, where each record includes an input and a corresponding desired output. By way of example, the training data can be provided using a comma separated value format, or a sparse vector format) [Liu: col. 8, line 18-31], and wherein the matrix enables computation of feature coefficients for the 11selected subset of the set of data features based at least in part on an increasing ((i.e. Some examples of training functions that can be used to train a static predictive model include (without limitation): regression (e.g., linear regression, logistic regression), classification and regression tree, multivariate adaptive regression spline and other machine learning training functions (e.g., Naive Bayes, k-nearest neighbors, Support Vector Machines, Perceptron). Some examples of training functions that can be used to train an updateable predictive model include (without limitation) Online Bayes, Winnow, Support Vector Machine (SVM) Analogue, Maximum Entropy (MaxEnt), Gradient based (FOBOS) and AdaBoost with Mixed Norm Regularization. The training function repository 216 can include one or more of these example training functions. In some scenarios, a recency weighted predictive model can be trained. In general, a recency weighted predictive model is a predictive model that is trained giving increased significance to more recent training data data as compared to earlier received training data. A recency weighted predictive model can be used to improve predictive output in response to a change in input data) [Liu: col. 7, line 34-53] - Note: Liu discloses several regression methods, which process data by using matrix operations.  It is also well known in the arts that the regression methods iteratively process, or scanning, data); (i.e. a predictive model can be trained with different features, again generating different trained models. The selection of features, i.e., feature induction, can occur during multiple iterations of computing the training function over the training data.) [Liu: col. 8, line 21-25]) penalty value (i.e. For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models. Therefore, in the present example, where the type of predictive model is a linear regression model, changes to an L1 penalty generate different sets of parameters) [Liu: col. 8, line 15-20];  12sort (i.e. ranked) [Liu: col. 9, line 34]  the selected subset of the set of data features according to an order (i.e. ranked based on the value of their respective scores) [Liu: col. 9, line 34-35] that 13the feature coefficients are set to zero as the penalty value increases;  14determine (i.e. determine) [Liu: col. 5, line 27] a plurality of nested linear models according to the sorting ((i.e. In some examples, the predictive modeling server system can be configured to rank the predictive models and/or their associated filters, and may select a predetermined number of filters as effective filters. For example, the predictive modeling server system may identify the top three filters and/or filter combinations based on the level of accuracy of their associated predictive models.) [Liu: col. 17, line 8-15]; (i.e. where the type of predictive model is a linear regression model) [Liu: col. 8, line 18-19]); and  15select ((i.e. selecting) [Liu: col. 4, line 15]; (i.e. determine) [Liu: col. 5, line 27])  a linear model (i.e. selection of the predictive model) [Liu: col. 5, line 65] of the plurality of nested linear models (i.e. where the type of predictive model is a linear regression model) [Liu: col. 8, line 18-19] based at least 16in part on a model quality criterion (i.e. determine which predictive model to use (e.g., by determining the best model for the data when multiple predictive modules are trained).) [Liu: col. 5, line 27-30] and the plurality of nested linear models ((i.e. In some examples, the predictive modeling server system can be configured to rank the predictive models and/or their associated filters, and may select a predetermined number of filters as effective filters. For example, the predictive modeling server system may identify the top three filters and/or filter combinations based on the level of accuracy of their associated predictive models.) [Liu: col. 17, line 8-15]; (i.e. where the type of predictive model is a linear regression model) [Liu: col. 8, line 18-19]).    
Liu does not explicitly disclose the following claim limitations (Emphasis added).
An apparatus for automated feature selection for linear model 2generation, comprising:  3a processor;  4memory in electronic communication with the processor; and  5instructions stored in the memory and executable by the processor to cause the 6apparatus to:  4determine, for a set of data features related to a plurality of data records, a set 5of relevance measurements, wherein each relevance measurement of the set of relevance 6measurements corresponds to a respective feature of the set of data features;  7select a subset of the set of data features based at least in part on the set of 8relevance measurements;  9generate a matrix based at least in part on the selected subset of the set of data 10features, wherein generating the matrix comprises iteratively scanning the plurality of data 11records, and wherein the matrix enables computation of feature coefficients for the selected 12subset of the set of data features based at least in part on an increasing penalty value;  13sort the selected subset of the set of data features according to an order that the 14feature coefficients are set to zero as the penalty value increases;  15determine a plurality of nested linear models according to the sorting; and  16select a linear model of the plurality of nested linear models based at least in part on a model quality criterion and the plurality of nested linear models. 
However, in the same field of endeavor Datta further discloses the claim limitations and the deficient claim limitations as follows:
generate a matrix based at least in part on the selected subset of the set of 9data features (i.e. the behavioral representation can be in the form of a matrix) [Datta: col. 7, line 47-48], (i.e. In other embodiments, principal components analysis may then be applied to these vectors, in order to project the wavelet coefficients into ten dimensions, which the inventors have found still captures >95% of total variance) [Datta: col. 19, line 37-40] (i.e. For the  autoregressive parameters, a prior that included a Lasso-like penalty can be used to encourage uninformative lag indices to have their corresponding regression matrix coefficients tend to zero) [Datta: col. 23, line 46-49] as the penalty value increases;  
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu with Datta to program the system to implement of the Datta’s method.  
Therefore, the combination of Liu with Datta will enable the system to identify a previously-unexplored sub-second regularity that defines a timescale upon which behavior is organized, yields important information about the components and structure of behavior, offers insight into the nature of behavioral change in the subject, and enables objective discovery of subtle alterations in patterned action [Datta: Abstract]. 
Liu and Datta do not explicitly disclose the following claim limitations (Emphasis added).
as the penalty value increases.
However, in the same field of endeavor Mun further discloses the claim limitations and the deficient claim limitations, as follows:
as the penalty value increases (i.e. SC imposes a greater penalty for additional coefficients than the AIC) [Mun: col. 50, line 10-11].  
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu and Datta with Mun to program the system to implement of the Mun’s method.  
Therefore, the combination of Liu and Datta with Mun will enable the system to determine the best fitting model [Mun: col. 46, line 14-24].

Regarding claim 20, Liu meets the claim limitations as follow.
A non-transitory computer-readable medium storing code (i.e. The server 702 can store instructions that implement operations associated with the modules described above, for example, on the computer readable medium 716 or one or more additional devices 714, for example, one or more of a floppy disk device, a hard disk device, an optical disk device, or a tape device) [Liu: col. 17, line 62-67; Fig. 7] for 2automated feature selection ((i.e. make a selection of a trained predictive model) [Liu: col. 4, line 36-37]; (i.e. the training and model selection can occur in an automated fashion) [Liu: col. 9, line 20-21]) for linear model generation ((i.e. where the type of predictive model is a linear regression model) [Liu: col. 8, line 18-19]), the code comprising instructions 3executable by a processor (i.e. Each processor 712 is capable of processing instructions for execution) [Liu: col. 17, line 52-53] to:   3determine (i.e. determine) [Liu: col. 5, line 27], for a set of data features (i.e. data comprises examples that each comprise one or more data values (or "features")) [Liu: col. 2, line 62-63] related to a plurality of data records ((i.e. a record of the stored association) [Liu: col. 15, line 50]; (i.e. Training data such as that represented by TABLE 1) [Liu: col. 3, line 10, Table 1] – Note: Table is a record data), a 4set of relevance measurements ((i.e. data sets that have characteristics similar to the characteristics of previously analyzed training data sets) [Liu: col. 3, line 60-62] (i.e. data comprises examples that each comprise one or more data values (or "features") plus an answer (a category or a value)) [Liu: col. 2, line 62-64]), wherein each relevance measurement of the set of relevance 5measurements corresponds to a respective feature of the set of data features (i.e. data comprises examples that each comprise one or more data values (or "features") plus an answer (a category or a value) for that example) [Liu: col. 2, line 62-64; Please see examples in Tables 1, 2];  6select (i.e. selecting) [Liu: col. 4, line 15]  a subset of the set of data features (i.e. The selection of features, i.e., feature induction, can occur during multiple iterations of computing the training function over the training data) [Liu: col. 8, line 23-25]) based at least in part on the set of 7relevance measurements (i.e. An optimal filter combination can be selected for use with new training data sets that have characteristics similar to the characteristics of previously analyzed training data sets) [Liu: col. 3, line 59-62];  8generate (i.e. generate) [Liu: col. 3, line 53]  a matrix based at least in part on the selected subset of the set of 9data features ((i.e. By way of example, the training data can be provided using a comma separated value format, or a sparse vector format) [Liu: col. 6, line 29-31] – Note: A vector is a 1xN matrix), wherein generating the matrix comprises iteratively ((i.e. Some examples of training functions that can be used to train a static predictive model include (without limitation): regression (e.g., linear regression, logistic regression), classification and regression tree, multivariate adaptive regression spline and other machine learning training functions (e.g., Naive Bayes, k-nearest neighbors, Support Vector Machines, Perceptron). Some examples of training functions that can be used to train an updateable predictive model include (without limitation) Online Bayes, Winnow, Support Vector Machine (SVM) Analogue, Maximum Entropy (MaxEnt), Gradient based (FOBOS) and AdaBoost with Mixed Norm Regularization. The training function repository 216 can include one or more of these example training functions. In some scenarios, a recency weighted predictive model can be trained. In general, a recency weighted predictive model is a predictive model that is trained giving increased significance to more recent training data data as compared to earlier received training data. A recency weighted predictive model can be used to improve predictive output in response to a change in input data) [Liu: col. 7, line 34-53] - Note: Liu discloses several regression methods, which process data by using matrix operations.  It is also well known in the arts that the regression methods iteratively process, or scanning, data); (i.e. a predictive model can be trained with different features, again generating different trained models. The selection of features, i.e., feature induction, can occur during multiple iterations of computing the training function over the training data.) [Liu: col. 8, line 21-25]) scanning the plurality of 10data records (i.e. Referring to FIG. 4, training data (i.e., initial training data) is received from the client computing system (402). For example, the client computing system 202 can upload the training data to the predictive modeling server system 206 over the network 204 either incrementally or in bulk (e.g., as one or more batches). As describe above, if the initial training data is uploaded incrementally, the training data can accumulate until a threshold volume is received before training of predictive models is initiated. The training data can be in any convenient form that is understood by the modeling server system 206 to define a set of records, where each record includes an input and a corresponding desired output. By way of example, the training data can be provided using a comma separated value format, or a sparse vector format) [Liu: col. 8, line 18-31], and wherein the matrix enables computation of feature coefficients for the 11selected subset of the set of data features based at least in part on an increasing ((i.e. Some examples of training functions that can be used to train a static predictive model include (without limitation): regression (e.g., linear regression, logistic regression), classification and regression tree, multivariate adaptive regression spline and other machine learning training functions (e.g., Naive Bayes, k-nearest neighbors, Support Vector Machines, Perceptron). Some examples of training functions that can be used to train an updateable predictive model include (without limitation) Online Bayes, Winnow, Support Vector Machine (SVM) Analogue, Maximum Entropy (MaxEnt), Gradient based (FOBOS) and AdaBoost with Mixed Norm Regularization. The training function repository 216 can include one or more of these example training functions. In some scenarios, a recency weighted predictive model can be trained. In general, a recency weighted predictive model is a predictive model that is trained giving increased significance to more recent training data data as compared to earlier received training data. A recency weighted predictive model can be used to improve predictive output in response to a change in input data) [Liu: col. 7, line 34-53] - Note: Liu discloses several regression methods, which process data by using matrix operations.  It is also well known in the arts that the regression methods iteratively process, or scanning, data); (i.e. a predictive model can be trained with different features, again generating different trained models. The selection of features, i.e., feature induction, can occur during multiple iterations of computing the training function over the training data.) [Liu: col. 8, line 21-25]) penalty value (i.e. For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models. Therefore, in the present example, where the type of predictive model is a linear regression model, changes to an L1 penalty generate different sets of parameters) [Liu: col. 8, line 15-20];  12sort (i.e. ranked) [Liu: col. 9, line 34]  the selected subset of the set of data features according to an order (i.e. ranked based on the value of their respective scores) [Liu: col. 9, line 34-35] that 13the feature coefficients are set to zero as the penalty value increases;  14determine (i.e. determine) [Liu: col. 5, line 27] a plurality of nested linear models according to the sorting ((i.e. In some examples, the predictive modeling server system can be configured to rank the predictive models and/or their associated filters, and may select a predetermined number of filters as effective filters. For example, the predictive modeling server system may identify the top three filters and/or filter combinations based on the level of accuracy of their associated predictive models.) [Liu: col. 17, line 8-15]; (i.e. where the type of predictive model is a linear regression model) [Liu: col. 8, line 18-19]); and  15select ((i.e. selecting) [Liu: col. 4, line 15]; (i.e. determine) [Liu: col. 5, line 27])  a linear model (i.e. selection of the predictive model) [Liu: col. 5, line 65] of the plurality of nested linear models (i.e. where the type of predictive model is a linear regression model) [Liu: col. 8, line 18-19] based at least 16in part on a model quality criterion (i.e. determine which predictive model to use (e.g., by determining the best model for the data when multiple predictive modules are trained).) [Liu: col. 5, line 27-30] and the plurality of nested linear models ((i.e. In some examples, the predictive modeling server system can be configured to rank the predictive models and/or their associated filters, and may select a predetermined number of filters as effective filters. For example, the predictive modeling server system may identify the top three filters and/or filter combinations based on the level of accuracy of their associated predictive models.) [Liu: col. 17, line 8-15]; (i.e. where the type of predictive model is a linear regression model) [Liu: col. 8, line 18-19]).    
Liu does not explicitly disclose the following claim limitations (Emphasis added).
A non-transitory computer-readable medium storing code for 2automated feature selection for linear model generation, the code comprising instructions 3executable by a processor to:  4determine, for a set of data features related to a plurality of data records, a set 5of relevance measurements, wherein each relevance measurement of the set of relevance 6measurements corresponds to a respective feature of the set of data features;  7select a subset of the set of data features based at least in part on the set of 8relevance measurements;  9generate a matrix based at least in part on the selected subset of the set of data 10features, wherein generating the matrix comprises iteratively scanning the plurality of data 11records, and wherein the matrix enables computation of feature coefficients for the selected 12subset of the set of data features based at least in part on an increasing penalty value;  13sort the selected subset of the set of data features according to an order that the 14feature coefficients are set to zero as the penalty value increases;  15determine a plurality of nested linear models according to the sorting; and  16select a linear model of the plurality of nested linear models based at least in part on a model quality criterion and the plurality of nested linear models. 
However, in the same field of endeavor Datta further discloses the claim limitations and the deficient claim limitations as follows:
generate a matrix based at least in part on the selected subset of the set of 9data features (i.e. the behavioral representation can be in the form of a matrix) [Datta: col. 7, line 47-48], (i.e. In other embodiments, principal components analysis may then be applied to these vectors, in order to project the wavelet coefficients into ten dimensions, which the inventors have found still captures >95% of total variance) [Datta: col. 19, line 37-40] (i.e. For the  autoregressive parameters, a prior that included a Lasso-like penalty can be used to encourage uninformative lag indices to have their corresponding regression matrix coefficients tend to zero) [Datta: col. 23, line 46-49] as the penalty value increases;  
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu with Datta to program the system to implement of the Datta’s method.  
Therefore, the combination of Liu with Datta will enable the system to identify a previously-unexplored sub-second regularity that defines a timescale upon which behavior is organized, yields important information about the components and structure of behavior, offers insight into the nature of behavioral change in the subject, and enables objective discovery of subtle alterations in patterned action [Datta: Abstract]. 
Liu and Datta do not explicitly disclose the following claim limitations (Emphasis added).
as the penalty value increases.
However, in the same field of endeavor Mun further discloses the claim limitations and the deficient claim limitations, as follows:
as the penalty value increases (i.e. SC imposes a greater penalty for additional coefficients than the AIC) [Mun: col. 50, line 10-11].  
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu and Datta with Mun to program the system to implement of the Mun’s method.  
Therefore, the combination of Liu and Datta with Mun will enable the system to determine the best fitting model [Mun: col. 46, line 14-24].

Allowable Subject Matter
9.      Claims 9-10 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.  This objection is given with a condition that all objections and rejections of related claims are addressed. 
10.     The above-identified claims recite several operations that are performed in an explicit way. There is no articulate reasoning to combine the prior arts to arrive in the context of the claim inventions.

Reference Notice 
Additional prior arts, included in the Notice of Reference Cited, made of record and not relied upon is considered pertinent to applicant's disclosure.

Contact Information

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Philip Dang whose telephone number is (408) 918-7529.  The examiner can normally be reached on Monday-Thursday between 8:30 am - 5:00 pm (PST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sath Perungavoor can be reached on 571-272-7455.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000./Philip P. Dang/Primary Examiner, Art Unit 2488