DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
The following title is suggested: “Platforms for Developing Data Models with Machine Learning Model”
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 10-14, 16-17 and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vasudevan et al USPN 8,768,659 in view of Schmidtler et al USPN 8,239,335.

Vasudevan et al teaches 
a repository for storing kernel images, each kernel image (column 9, line 40, the squared exponential (SQEXP) kernel is a function of |x-x'| and is thus stationary and isotropic meaning that it is invariant to all rigid motions. The SQEXP function is also infinitely differentiable and thus tends to be smooth. The kernel is also called the Gaussian kernel. It functions similarly to a Gaussian filter being applied to an image. The point at which the elevation has to be estimated is treated as the central point/pixel. Training data at this point together with the those at points around it determine the elevation. As a consequence of being a function of the Euclidean distance between the points and the fact that the correlation is inversely proportional to the distance between them, the SQEXP kernel is prone to producing smoothened (possibly overly) models. Points that are far away from the central pixel contribute less to its final estimate whereas points nearby the central pixel have maximum leverage. The correlation measures of far off points no matter how significant they may be, are diminished. This is in accordance with the "bell-shaped" curve of the kernel);
a data model that receives input data and produces output data (column 2, line 60, according to the present invention there is provided, in a first aspect, a method for modelling data based on a dataset, including a training phase, wherein the dataset is applied to a non-stationary Gaussian process kernel in order to optimize the values of a set of hyperparameters associated with the Gaussian process kernel, and an evaluation phase in which the dataset and Gaussian process kernel with optimized hyperparameters are used to generate model data);

a development environment for running experiments to develop the data models (column 16, line 24, the experiments successfully demonstrated Gaussian process modelling of large scale terrain. Further, the SQEXP and NN kernel were compared. The NN kernel proved to be much better suited to modelling complex terrain as compared to the SQEXP kernel. The kernels performed very similarly on relatively at data. Thus, the approach can 1. Model large scale sensory data acquired at different degrees of sparseness. 2. Provide an unbiased estimator for the data at any desired 
each experiment running one of the kernel images, according to a configuration of the development parameters for the kernel image, and using one or more of the data sets in the data store as input data (column 15, line 5, The focus of the experiments is three-fold. The first part will demonstrate GP terrain modelling and evaluate the non-

Regarding claim 2
Vasudevan et al teaches 
 the data model is a machine learning model, the data set is a tagged data set, and the development of the data model is a supervised training of the machine learning model using the tagged data set (column 25, line 47, The size of the training data is obviously an important factor influencing the modelling process. However, this depends a lot on the data set under consideration. For the Tom Price laser scanner data, a mere 3000 of over 1.8 million data points were enough to produce centimeter level accuracy. This was due to the fact that this particular data set comprised of dense and accurate scanner data with a relatively lower change in elevation. The same number of training data also produced very satisfactory results in the case of the West Angelas dataset which is both large and has a significant elevation gradient--this is attributed to the uniform sampling of points across the whole dataset and adaptive power of the NN kernel. The Kimberlite Mine data set comprised of very sparse data spread over a very large area. This data set also had a lot of "features" such as roads, "crest-lines" and 

Regarding claim 3
Vasudevan et al teaches
 the development parameters are exposed through a standardized API (column 28, line 49, Although not required, the embodiments described herein can be implemented as an application programming interface (API) or as a series of libraries for use by a developer, or can be included within another software application, such as a terminal or personal computer operating system or a portable computing device operating system. Generally, as program modules include routines, programs, objects, components and data files which work together to perform particular functions, it will be understood that the functionality of the embodiment and of the broader invention claimed herein may be distributed across a number of routines, programs, objects components or data files, as required.

Regarding claim 4
Rejection of claim 1 is incorporated and further claim 4 recite similar limitation therefore rejected under same rationale.



Regarding claim 5
Schmidtler et al teaches 
the kernel images are stored in the repository as virtual machine snapshots (column 31, line 3, FIG. 22 shows an implementation of the classification method and apparatus of the present invention used in association with document separation. Automatic document separation is used for reducing the manual effort involved in separating and identifying documents after digital scanning. One such document separation method and apparatus is described in U.S. Publication 2005/0134935 published Jun. 23, 2005 to Schmidtler et al, the substance of which is incorporated herein by reference. In the aforementioned publication, the method combined classification rules to automatically separate sequences of pages by using inference algorithms to reduce the most likely separation from all of the available information, using the classifications methods described therein. In one embodiment of the present invention as shown in FIG. 22, the classification method of transudative MED of the present invention is employed in document separation. More particularly, document pages 2200 are inserted into a digital scanner 2202 or MFP and are converted into a sequence of digital images 2204. The document pages may be pages from any type of document, e.g. publications of a patent office, data retrieved from a database, a collection of prior art, a website, etc. The sequence of digital images is input at step 2206 to dynamically adapt probabilistic classification rules using transduction. Step 2206 utilizes the sequence of images 2204 as unlabeled data and labeled data 2208. At step 2210 the weight in the probabilistic network is updated and is used for automatic document separation according to 

Regarding claim 10
Vasudevan et al teaches
a development log that records compute costs and performance metrics for the experiments that were run to develop the data model, wherein the development environment automatically tracks the compute costs and performance metrics for 

Regarding claim 11
Vasudevan et al teaches 
 the development environment automatically runs experiments according to different configurations of the development parameters (column 27, line 5, cross validation experiments were used to provide statistically representative performance measures, compare the GP approaches with several other interpolation methods using both grid/elevation maps as well as TIN's and understand the strengths and applicability of different techniques. These experiments demonstrated that for dense and/or relatively at terrain, the GP-NN would perform as well as the grid based methods using any of the standard interpolation techniques or TIN based methods using triangle based interpolation techniques. However, for sparse and/or complex terrain data, the GP-NN would significantly outperform alternative methods. Thus, the GP-NN proved to be a very versatile and effective modelling option for terrain of a range of sparseness and complexity. Experiments conducted on multiple real sensor data sets of varying 

Regarding claim 12
Vasudevan et al teaches
 the development environment optimizes the configuration of the development parameters (column 7, line 41, In particular, in the GP learning procedure, a maximum marginal likelihood estimation (MMLE) method is used in order to optimise a set of hyperparameters associated with a GP kernel. By `optimise` it is meant that the hyperparameters are set at values that are expected to result in reduced error in comparison to other values, but are not necessarily set at the most optimum values. The kernel hyperparameters provide a coarse description of the terrain model, and can be used together with the sampled sensor measurement data to generate detailed terrain model data at a desired resolution. Moreover, as a non-parametric, probabilistic process is utilized, information regarding the uncertainty can also be captured and provided. That is, a statistically sound uncertainty estimate is provided. The optimized kernel hyperparameters are stored, together with the KD-Tree sample data structure, for use by the evaluation module).


Regarding claim 13
Vasudevan et al teaches
the development environment estimates a compute cost and/or a performance metric for experiments to be run according to different configurations and automatically prioritizes the different configurations based on the estimated compute costs and/or the estimated performance metrics (column 15, line 23, in all experiments, the mean squared error (MSE) criterion was used as the performance metric (together with visual inspection) to interpret the results. Each dataset was split into three parts--training, evaluation and testing. The former-most one was used for learning the GP and the latter was used only for computing the mean squared error (MSE); the former two parts together were used to estimate the elevations for the test data. The mean squared error was computed as the mean of the squared differences between the GP elevation prediction at the test points and the ground truth specified by the test data subset).

Regarding claim 14
Vasudevan et al teaches
the development parameters define a function in the data model by input data type and output data type, and the development environment automatically searches for existing defined functions with the same input data type and output data type (column 8, line 35, The example embodiment employs Gaussian Processes (GPs) for modelling and representing terrain data. GPs provide a powerful learning framework for learning models of spatially correlated and uncertain data. Gaussian Process Regression 

Regarding claims 16-17
Vasudevan et al teaches
the development environment automatically searches a library of existing defined functions (column 10, line 55, A Bayesian procedure is used to maximise the log marginal likelihood of the training output (y) given the training input (X) for a set of hyperparameters .theta. which is given by .function..theta..times..times..times..times..times..times..times..functi- on..times..pi. ##EQU00003## where K.sub.y=K.sub.f+.sigma..sub.n.sup.2I is the covariance matrix for the noisy targets y. The log marginal likelihood has three terms--the first describes the data fit, the second term penalises model complexity and the last term is simply a normalisation coefficient. Thus, training the model will involve searching for the set of hyperparameters that enables the best data fit while avoiding overly complex models. Occam's razor is thus in-built in the system and prevention of over-fitting is guaranteed by the very formulation of the learning mechanism). 


Regarding claim 19
Vasudevan et al teaches
a library accessible by many users, the library containing templates of at least one of: pre-processing components and post-processing components (column 28, line 49, although not required, the embodiments described herein can be implemented as an application programming interface (API) or as a series of libraries for use by a developer, or can be included within another software application, such as a terminal or personal computer operating system or a portable computing device operating system. Generally, as program modules include routines, programs, objects, components and data files which work together to perform particular functions, it will be understood that the functionality of the embodiment and of the broader invention claimed herein may be distributed across a number of routines, programs, objects components or data files, as required).

Regarding claim 20
Vasudevan et al teaches
a library accessible by many users, the library containing templates of development code (column 28, line 49, although not required, the embodiments described herein can be implemented as an application programming interface (API) or as a series of libraries for use by a developer, or can be included within another software application, such as a terminal or personal computer operating system or a portable computing device operating system. Generally, as program modules include .

Allowable Subject Matter
Claims 6-9, 15 and 8 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Relevant Prior Art
10956825 B1 Chen et al teaches Distributable Event Prediction and Machine Learning Recognition System
 10915829 B1 Wani et al teaches Data Model Update for Structural-damage Predictor After an Earthquake
8510238 B1 Aradhye et al teaches Method to Predict Session Duration on Mobile Devices Using Native Machine Learning

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Anil Khatri whose telephone number is (571)272-3725. The examiner can normally be reached M-F 8:30-5:00.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, W Zhen can be reached on 571-272-3708. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/ANIL KHATRI/            Primary Examiner, Art Unit 2191