DETAILED ACTION
This is the response to applicant’s amendment action regarding application number 16/671,791, filed November 1, 2019.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments
The amendment filed March 22, 2022 has been entered. Examiner acknowledges receipt of Amendments to Application 16/671,791, which include: Amendments to the Abstract, Amendments to the Specification, Amendments to the Claims, Amendments to the Drawings, and Remarks containing Applicant’s amendments. 
Regarding Applicant’s Remarks and Amendments to the Claims, Examiner acknowledges Claims 1-20 have been amended. Claims 1-20 remain pending in the application. 
Regarding Applicant’s Remarks and Amendments to the Abstract, Examiner acknowledges Applicant’s Amendments to the Abstract has resolved the objection involving the length exceeding 150 words, and therefore this identified objection previously set forth in the Non-Final Office Action mailed December 21, 2021 is withdrawn. 
Regarding Applicant’s Remarks and Amendments to the Drawings, Examiner acknowledges Applicant’s Amendments to the Drawings have resolved the objection identified in Figure 1, and therefore the respective drawing objection previously set forth in the Non-Final Office Action mailed December 21, 2021 is withdrawn.
Regarding Applicant’s Remarks and Amendments to the Specification, Examiner acknowledges Applicant’s Amendments to the Specification have resolved the objection in the paragraph beginning on page 9, line 19, and therefore the respective specification objection previously set forth in the Non-Final Office Action mailed December 21, 2021 is withdrawn. Examiner also acknowledges and accepts the additional amendment provided by the Applicant in the paragraph beginning on page 12, line 27, as the amended change merely fixes the term “low and high amount of storage” to properly correspond to the preceding term “low and high priority data” (and as such, does not represent new matter).
Regarding Applicant’s Remarks and Amendments to the Claims, Examiner acknowledges that the Amendments to the Claims have resolved the majority of the claim objections identified in Claims 4, 7, and 11, and therefore those respective objections previously set forth in the Non-Final Office Action mailed December 21, 2021 are withdrawn. However, Examiner notes that one of the identified claim objections in Claim 4 has not been resolved (the missing punctuation mark at the end of Claim 4), and thus that particular objection is maintained, and is further identified in the relevant section indicated below.
Regarding Applicant’s Remarks and Amendments to the Claims, Examiner acknowledges that the Amendments to the Claims have resolved the lack of antecedent issues identified in Claims 1, 12, and 17 (and inherited in dependent Claims 2-11, 13-16, 18-20) and Claims 10, 16, 20, as well as the indefiniteness issue identified in Claims 2, 13,  and 18 (and inherited in dependent Claims 3-7), and therefore the respective §112(b) rejections previously set forth in the Non-Final Office Action mailed December 21, 2021 are withdrawn. 
Regarding Applicant’s Remarks and the Information Disclosure Statement received on 3/21/2022, Examiner acknowledges that Applicant has clarified on the 3/21/2022 IDS that the October 29, 2019 date refers to the last accessed date for the NPL document/web page. Examiner has confirmed that web.archive.org recognizes October 29, 2019 as a valid retrieval date for the Ehcache, Cache Eviction Algorithms NPL document/web page (https://www.ehcache.org/documentation/2.8/apis/cache-eviction-algorithms.html), and this same date is identified on the NPL document itself. However, Examiner notes that the latest copy of the NPL document has a blank page 4. Given that this latest version of the Ehcache NPL document and the previous version of the NPL document submitted on 11/01/2019 (which does not have a blank page 4) are extracted from the same webpage, this NPL document will be considered based on the previous version of the NPL document received on 11/01/2019.

Response to Arguments
Examiner acknowledges receipt of Arguments to Application 16/671,791, which include: Remarks containing Applicant’s arguments. 
Regarding Applicant’s Remarks for Claims 9, 15, and 19 under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, Examiner acknowledges Applicant’s arguments and have considered them, and have found them to be not persuasive. 
Regarding Applicant’s arguments:
“Claims 1- 20 have been variously amended to address the rejections under Section 112. Applicants respectfully request withdrawal of the rejections under Section 112. Claims 9, 15, and 19, for example, have been amended to recite that "the complexity of the at least one data model is based at least in part on a number of components of the at least one data model" to address the rejection under 35 U.S.C. §112(a) or 35 U.S.C. §112(pre-AIA ), first paragraph, as failing to comply with the written description requirement.
In this regard, the original specification, on page 13, lines 8-19, teaches the following:
GMM Error Rate, Lower Complexity Model and Lower Computational Cost
Gaussian Mixture Models with multiple components are powerful and will have a lower rate. In some cases, however, it might be computationally efficient to fit and use only a GMM model with two components, instead of a model with five components, for example. One or more aspects of the disclosure recognize that use of a smaller number of components for modeling will reduce the time to find a "reasonable" model. Thus, a higher error rate can optionally be traded off for a lower computational time. 
Grouping Sensor Using PDF Similarity
In a further variation, the model complexity can optionally be reduced by grouping sensors with similar PDFs into one group and using one PDF to describe the group of sensors. In this manner, storage requirements can be reduced and one sample or one subsample of the data can be used instead of N independent samples for each of the N sensors separately to generate the PDF.”
Examiner has considered this argument, and finds the argument to be not persuasive. Examiner points out the recited amended claim limitation:
“wherein the classification determines a complexity of the at least one data model for the at least one data source and the predefined retention model for the sampled data from the at least one data source, wherein the complexity of the at least one data model is based at least in part on a number of components of the at least one data model”.
While Applicant asserts that the cited paragraphs in the specification (p.13 lines 7-19) discuss model complexity, the cited paragraphs only discuss utilizing more powerful components to reduce time to find a reasonable model, or grouping sensors to reduce storage requirements, resulting in a model that contains fewer components (and hence represents a lower model complexity). However, the specification still fails to describe how a classification determines a complexity as recited in the claims, such that the classification would trigger a system to perform the steps described in Applicant’s specification p.13 lines 7-19 to reduce the model complexity. As indicated in the Non-Final Office Action mailed December 21, 2021, Applicant’s specification p.4 lines 28-31 states: “… the retention model classification 150 determines a complexity of the data model …”, but there is no indication in the Applicant’s specification p.13 lines 7-19 or anywhere else in the specification that indicates that the retention model classification is involved in determining a model complexity. The specification must describe and support the claims such that the public is informed of the boundaries of what constitutes infringement of the patent, as well as determining whether the claimed invention meets all the criteria for patentability by distinctly claiming the subject matter which the inventor regards as the invention. See MPEP 2163. Given that there is no support of this limitation present in the specification, this claim limitation in Claims 9, 15, and 19 fails to comply with the written description requirement.
Regarding Applicant's Remarks for Claims 1, 9-12, 15-17, and 19-20 under 35 U.S.C. 103 as being unpatentable over Bivens et al., U.S. PGPUB 2013/0318305, published 11/28/2013 [hereafter referred as Bivens] in view of Smith et al., U.S. PGPUB 2015/0242133, published 8/27/2015 [hereafter referred as Smith]; and for Claims 2-8, 13-14, and 18 under 35 U.S.C. 103 as being unpatentable over Bivens in view of Smith as applied to Claims 1,12, and 17; in further view of Tekumalla et al., Copula-HDP-HMM: Non-parametric Modeling of Temporal Multivariate Data for I/O Efficient Bulk Cache Preloading, 2016 [hereafter referred as Tekumalla], Examiner acknowledges Applicant’s arguments and have considered them, and have found them to be not persuasive. Hence the existing U.S.C. 35 §103 rejections are still maintained, and the updated claim mappings according to the applicant’s amended claims are provided in the sections indicated below.
Regarding Applicant’s Remarks:
“With regard to the adapting step of representative claim 1, the Examiner asserts that FIG. 1 and paragraphs 0050 and 0063 of Bivens teaches the claimed "adapting a usage of one or more storage resources that store the data from the at least one data source based at least in part on the representation and the classification." 
The Examiner appears to acknowledge on page 13, however, that Bivens does not teach (emphasis added) "a classification of data from the at least one data source into one of a plurality of predefined retention models." 
The Examiner asserts, however, that paragraphs 0021, 026, 0032-034 of Smith teach predefined retention models. 
To the extent, if any, that Smith teaches "predefined retention models," Smith does not teach that each predefined retention model "corresponds to a different amount of retained data that is retained in one or more storage resources for at least some of the source data from the at least one data source," as currently claimed. In addition, the combination of Bivens et al. and Smith et al. does not suggest "adapting a usage of one or more storage resources that store the retained data .... based at least in part on ... the classification." 
As noted above, the Examiner acknowledges on page 13 that Bivens does not teach such a classification. Thus, Bivens cannot teach adapting a usage of storage resources using such a classification. 
In addition, Smith does not adapt a usage of storage resources using such a classification into a "predefined retention model."”
Examiner has considered this argument, and finds the argument to be not persuasive. 
Examiner identifies the Applicant’s recited claim limitations for the above argument:
	… obtaining a classification of data from the at least one data source into one of a plurality of predefined retention models …
	… adapting a usage of one or more storage resources that store the data from the at least one data source based at least in part on the representation and the classification …
Examiner notes that Applicant’s above arguments contain two sub-arguments: 1) Bivens does not teach obtaining a classification of data from the at least one data source, and adapting a usage of one or more storage resources based on the classification; and 2) Smith does not teach a classification into one of a plurality of predefined retention models. Examiner notes in both sub-arguments, Applicant does not provide further evidence other than asserting that neither the Bivens nor the Smith reference teach the recited claim limitations. Hence, Examiner provides further clarifications in each reference to address both of these sub-arguments in the following paragraphs.
Contrary to Applicant’s above assertion, Bivens teaches the following limitations: (“obtaining a classification of data from the at least one data source …” and “adapting a usage of one or more storage resources that store the data from the at least one data source based at least in part on the representation and the classification”), where the classification is performed by a workload classifier as shown in Bivens Figure 5. Under its broadest reasonable interpretation, the above recited limitations broadly recite obtaining a classification of data from an input data source, and based on the classification and a representation, adapt a usage of one or more storage resources that store retained data, where this adaptation of a storage resource usage broadly recites any series of steps or processes that modifies the storage resource usage. Bivens [0060]-[0061] describe the configuration parameters that are provided into the memory subsystem tool, where this memory subsystem tool is shown as Bivens Figure 4 element 360 and Figure 5 element 415. The input configuration parameters into the memory subsystem tool include the collected cache and memory access data and the corresponding system performance metric data, where this collected system performance metric data is sampled from a system (Bivens Figure 1 and 120; [0038]-[0039], [0040]-[0043], [0062]; Figure 4, element 390; and Figure 5, elements 410, 460, 465). As described in Bivens [0060]-[0061], the values for the configuration parameters are based on the solving the explicit function for the configuration parameters, where the explicit function is based on an analysis involving the fitting of the configuration parameters and the performance metrics for different workloads using the parametric density functions as taught in Bivens [0039]-[0042], with the parametric density functions described in Bivens [0043]-[0048]. As described in Bivens [0029], [0038], and Table II, the different workloads of interest are represented through different applications (or benchmarks), each with different performance statistics such as run times, cache line request probe rates. As indicated in the Non-Final Office Action mailed December 21, 2021, the fitting of the input configuration parameters and performance metrics into a parametric density function represents a process that generates a representation of the input, where this representation is the identified parametric density function. Referring to Bivens Figure 5, the workload configuration parameters are received and identified by the memory subsystem tool in elements 410, 425, and sent to a workload classifier 450, where the output of the workload classifier is sent to the memory performance engine to generate the explicit function to derive the best set of configuration parameters for the desired performance goal (such that the memory performance engine obtains a classification of data from the workload classifier), as taught in Bivens [0065]-[0066]: “The memory configuration subsystem tool 415 includes a workload parameter identification element 425 for identifying the applications running on the computing system and the workloads generated by those applications. The target applications and scenarios 460 are provided as input to the workload parameter identification element. The memory configuration subsystem tool 415 also includes a workload classifier 450 in communication with the workload parameter identification element 425 and a memory performance repository 445. … Both the memory performance repository and the workload classifier are in communication with the memory performance engine 430. The memory performance engine uses the input from the memory performance engine, the workload classifier and the target applications and scenarios 460, i.e., workload and performance parameters, to generate the explicit function of configuration parameters on performance metrics. … This explicit function is communicated to an application level performance characterization element 440 that combines the equation with the application level performance goal inputs 465. An optimization engine 455 is used to solve the explicit function based on the identified performance goals and to derive the best set of configuration parameters for the desired performance goal. The optimization engine 455 also receives configuration parameter constraint inputs 475 that are taken into account when determining the best configuration parameters. The desired configuration parameters are output as system configuration directives 420 for a given input scenario.”.  Given the identified parametric functions are based on performance metrics for different workloads, the memory subsystem uses the workload classifier to identify and associate the input data according to the different workloads. A person having ordinary skill in the art would understand that this workload classifier identifies and performs classification based on the different workloads being analyzed by the memory subsystem tool. Bivens [0063] further teaches that the updated system configuration parameters from the memory subsystem tool are sent to a configuration controller to implement the desired cache changes such as modifying size or line size of the cache, such that this process for determining these system configuration updates represents a way to adapt a usage of one or more storage resources based on the given input information: “The subsystem tool 360 is configured to execute methods for determining system configuration parameters in accordance with the present invention based on cache and memory access data and the performance numbers. The subsystem tool 360 communicates the desired or optimal cache and memory configuration parameters 380 for a given workload and application level performance goal to the configuration controller 340. The configuration controller 340 implements the necessary changes to the current cache and memory configuration parameters that are necessary to implement the desired cache and memory subsystem configuration parameters 380. These changes include modifying the size or line size of the cache …”. Hence, Applicant’s argument that the Bivens reference does not teach the above recited claim limitations under its broadest reasonable interpretation is not persuasive, and the existing prior art rejection is maintained.
As indicated in the Non-Final Office Action mailed December 21, 2021, while Bivens teaches classification by a workload classifier, and producing an output from that workload classifier sent to the memory performance engine (such that the memory performance engine obtains a classification of data), as well as teaching that the determined classifications by the workload classifier generate updated system configuration parameters for adapting a usage of one or more resources, Bivens does not explicitly describe classifying these different workloads or applications/benchmarks into a plurality of predefined retention models, as indicated by the following limitation: (“[classifying data] … into one of a plurality of predefined data retention models”). Under its broadest reasonable interpretation, a “predefined data retention model” broadly recites a model description with a list of characteristics related to data retention (where the term “data retention” broadly recites any properties, characteristics, configurations related to storing or retaining data), such that in the context of the recited limitation, this limitation broadly recites a process that classifies the data into a model description that contains a set of characteristics related to storing or retaining data. Contrary to Applicant’s above assertion, Smith teaches a control unit receiving workloads from host applications, where each workload represents a pattern of I/O memory reads and writes, and analyzing these patterns over a period of time to detect changes in workloads and to switch to the new corresponding workload profiles, as indicated in Smith [0032]-[0034]: “The storage system then begins operating … loads a data ingest application into its RAM … Control unit 122 continuously monitors I/O being processed by storage controller 120 … and tracks various parameters, including the number of read requests, number of write requests, size of read requests, size of write requests, and addresses indicated by each I/O request. Control unit 122 then categorizes the I/O processing workload into a category based on these characteristics. … In this example, the first type of l/O workload (i.e., the application messaging and responses) is associated with a "streaming ingest" type of workload characterized by large write requests directed to sequential addresses in memory. Therefore, control unit 122 loads a new profile for "streaming ingest" workloads into memory.” and [0037]: “… Control unit 122 then analyzes the new workload and loads a new profile for storage system 150 in order to tune storage system 150 for the new type of I/O workload.”. Smith further teaches a plurality of workloads based on category/type as shown in Smith Figure 4, where one of these I/O workload category/type is a “streaming ingest” workload (Smith Figure 4, 1st row, with the “Workload” column identifying a “classification”), where each row identifies an instantiation/representation that is interpreted as “one of a plurality of predefined retention models”. Hence, the control unit taught in Smith identifies predefined retention models through a I/O workload categorization process, and as such, Smith teaches a series of steps that classifies data into one of a plurality of predefined retention models (represented by the workload profiles shown in Smith Figure 4), where each workload category includes corresponding configuration memory configuration parameters to adapt a usage for one or more storage devices. The motivation to combine is taught in Smith, as loading these workload profiles into a system according to their configurations provides an efficient way for a system to load settings and quickly adapt to changes in I/O workloads in the system during run-time, thus not only allowing support of different workloads used by various applications, but also improving efficiency in the system by dynamically adjusting the I/O caching and queuing strategies used within the system, thereby improving usage of memory and computational resources (including bandwidth resources) in the system (Smith [0018]: “… control unit 122 loads these profiles from persistent memory 124 into volatile memory 126 (e.g., a Random Access Memory (RAM)), and utilizes loaded profiles to alter how storage system 150 functions. In this manner, storage controller 120 may adapt storage system 150 to different types of incoming I/O workloads from host 110. Adjusting storage system settings regularly based on a profile helps storage system 150 to adapt to changing conditions. For example, in a Storage As A Service (SAAS) environment, the I/O workload from host 110 may vary from ingest, sequential workload requests (e.g., for one client) to primarily transactional requests for a website ( e.g., for another client). For each of these workloads a different combination of I/O caching and queuing settings may be desired in order for storage system 150 to function efficiently. For example, it may be beneficial to coalesce I/O requests together in bandwidth-intensive ingest sequential workloads, because this enhances the overall bandwidth of the system.”). Hence, Applicant’s argument that the combination of the Bivens and Smith references do not teach the above recited claim limitations under its broadest reasonable interpretation is not persuasive, and the existing prior art rejection is maintained.
As noted above, Applicant’s amended claim limitations necessitate further examination and re-evaluation of the amended and related original claims. The updated claim mappings according to the Applicant’s amended claims are provided in the relevant sections indicated below.

Claim Objections
Claim 4 is objected to 
because of the following informality: a missing punctuation mark at the end of the claim: “wherein the fitting of the non-parametric model to at least some of the sampled data comprises representing the sampled data using one or more of a Gaussian Mixture model, a Gaussian Mixture model that captures temporal variance, and a Kernel Density Estimation model[.]”. Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 9, 15, and 19 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Regarding amended Claims 9, 15, and 19, 
The amended claims recite the following limitation: “wherein the classification determines a complexity of the at least one data model for the at least one data source and the predefined retention model for the sampled data from the at least one data source, wherein the complexity of the at least one data model is based at least in part on a number of components of the least one data model”, but the claims and the specification still fail to disclose the set of rules, steps, guidelines, or algorithm that describes how a classification determines a “complexity of the at least one data model” according to a number of components in a data model. While Applicant asserts that the cited paragraphs in the specification (p.13 lines 7-19) discuss model complexity, the cited paragraphs only discuss utilizing more powerful components to reduce time to find a reasonable model, or grouping sensors to reduce storage requirements, resulting in a model that contains fewer components (and hence represents a lower model complexity). However, the specification still fails to describe how a classification determines a complexity as recited in the claims, such that the classification would trigger a system to perform the steps described in Applicant’s specification p.13 lines 7-19 to reduce the model complexity. As indicated in the Non-Final Office Action mailed December 21, 2021, Applicant’s specification p.4 lines 28-31 states: “… the retention model classification 150 determines a complexity of the data model …”, but there is no indication in the Applicant’s specification p.13 lines 7-19 or anywhere else in the specification that indicates that the retention model classification is involved in determining a model complexity. The specification must describe and support the claims such that the public is informed of the boundaries of what constitutes infringement of the patent, as well as determining whether the claimed invention meets all the criteria for patentability by distinctly claiming the subject matter which the inventor regards as the invention. See MPEP 2163. Given that there is no support of this limitation present in the specification, this claim limitation in Claims 9, 15, and 19 fails to comply with the written description requirement. For purposes of examination, the classification aspect of the limitation will not carry any patentable weight, and hence this limitation will be interpreted as broadly reciting different models of different data model complexities, where the components of a model broadly recites any physical or logical properties or characteristics associated with a model.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.




The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 9-12, 15-17, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over 
Bivens et al., U.S. PGPUB 2013/0318305, published 11/28/2013 [hereafter referred as Bivens] in view of Smith et al., U.S. PGPUB 2015/0242133, published 8/27/2015 [hereafter referred as Smith].
Regarding amended Claim 1, 
Bivens teaches
(Currently Amended) A method, comprising: 
obtaining sampled data that was generated by sampling source data from at least one data source (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites obtaining sampled data from a source, where the data from the source is sampled through a sampling process. Bivens teaches collecting cache and memory access data on a computing system (where the computing system itself represents a data source), and the collected cache and memory access data represent data from various application workloads, where part of this collected data includes collecting system performance metrics by a histogram process reflecting different time ranges and granularity, such that the generation of these histogram data representations (based on the collected data) represents a process of sampling source data from at least one data source (Bivens [0039]: “… multi-resolution and multi-scale system performance statistics or data are collected 120 using either runs from an actual computing system or a simulated system … the system performance statistics are gathered as histogram data … is generated for a plurality of runs of cache size and line size within the cache … the histogram data are collected from the computing system during runtime of the computing system … over different time ranges and at varying granularity …”; Figure 1, element 120; [0038], [0062]; and Figure 4, element 390).); 
fitting at least one data model to at least some of the sampled data to obtain a representation of the sampled data from the at least one data source (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s specification (p.5 lines 13-17) and dependent claims 3-6, this limitation broadly recites performing a fitting process using one of a parametric, non-parametric, descriptive statistics, or time-series model to obtain a representation of the sampled data. Bivens teaches using an empirical density representation to identify and establish an appropriate parametric density function for the collected configuration parameters, where the parametric density functions are either a gamma density function or a shifted power-law density function (represented in terms of their respective probability density functions                         
                            
                                
                                    f
                                
                                
                                    p
                                    ,
                                    w
                                
                            
                            
                                
                                    x
                                
                            
                        
                    , and where for each different workload/benchmark application, the different parametric functions are applied to best fit the histogram data using a Maximum Likelihood Estimator, such that this process of fitting the received histogram data using parametric functions represents a process for fitting at least one data model to at least some of the sampled data to obtain a representation of the sampled data (Bivens Figure 1, elements 130-170; [0039], [0040]-[0043]: “Having collected the histogram data, these data are processed to estimate an empirical density for each arrangement of the configuration parameters 130. … the empirical density of the histogram data expresses the probability that the value of a given system performance statistic fails between a given set of values. … the empirical density for each arrangement of the configuration is used to establish a parametric density function for the different arrangements of the configuration parameters 140. From the parametric density function, a functional dependence between the density parameters in the parametric density function and the configuration parameters is determined 150. It is this dependence that is used to determine the density function of the system performance metrics as a function of the configuration parameters 160. Then the explicit function between performance metrics and configuration parameters is determined 170. … Any cache performance metric, p, for a workload, W, is modeled as a random variable                         
                            
                                
                                    X
                                
                                
                                    p
                                    ,
                                    w
                                
                            
                        
                    , where p can be the cache residency time, the single residency time or the inter-hit time. The probability density function of any metric p for any workload w is denoted by the function                         
                            
                                
                                    f
                                
                                
                                    p
                                    ,
                                    w
                                
                            
                            
                                
                                    x
                                
                            
                        
                    , and a parametric form for                         
                            
                                
                                    f
                                
                                
                                    p
                                    ,
                                    w
                                
                            
                        
                     is identified using the histogram data. For each benchmark, different families of parametric density functions are tried to best fit the histogram data using the Maximum Likelihood Estimator (MLE). The gamma density function and the shifted power-law density function are identified as the two candidate functions that closely model the empirical density.”; [0043]-[0045], including equations 4 and 5; and [0060]-[0061], Figure 4 element 360, and Figure 5 elements 415).); 
obtaining a classification of at least some of the sampled data from the at least one data source (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites obtaining a classification from sampled data from at least one data source. As indicated earlier, Bivens determines a parametric density function that best fits the empirical density used for producing optimal configuration memory and cache parameters to achieve a desired level of performance according to an application’s workload, where each application’s workload is represented through system performance statistics that are gathered as histogram data (representing sampled data) (Bivens Figure 1, elements 130-170; and [0039], [0040]-[0043]). As described in Bivens [0029], [0038], and Table II, the different workloads of interest are represented through different applications (or benchmarks), each with different performance statistics such as run times, cache line request probe rates. Bivens further teaches solving the parametric density function using the memory subsystem tool, which receives these workload parameters and provides the workload parameters as input into a workload classifier module that generates a classification output that is sent to the memory performance engine to generate the explicit function based on the fitting of the received performance metrics (histogram data) to a parametric function, such that the process in which the memory performance engine receives the classification output from the workload classifier corresponds to a process that obtains a classification of the sampled data from the at least one data source. A person having ordinary skill in the art would understand that this workload classifier identifies and performs classification based on the collected data gathered from the different workloads being analyzed by the memory subsystem tool (Bivens Figure 1, elements 170, 180, 185; Figure 4, element 360; Figure 5, element 415; and [0060]-[0061], [0065]: “The memory configuration subsystem tool 415 includes a workload parameter identification element 425 for identifying the applications running on the computing system and the workloads generated by those applications. The target applications and scenarios 460 are provided as input to the workload parameter identification element. The memory configuration subsystem tool 415 also includes a workload classifier 450 in communication with the workload parameter identification element 425 and a memory performance repository 445. … The memory performance engine uses the input from the memory performance engine, the workload classifier and the target applications and scenarios 460, i.e., workload and performance parameters, to generate the explicit function of configuration parameters on performance metrics.”).) …
… adapting a usage of the one or more storage resources that store the retained data from the at least one data source based at least in part on the representation and the classification (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites adapting a usage of one or more storage resources that store retained data, where this adaptation of a storage resource usage broadly recites any series of steps or processes that modifies the storage resource usage. As indicated earlier, Bivens teaches the memory performance engine using the output from the workload classifier to determine updated system configuration parameters to achieve a desired level of performance, where these updated configuration parameters are based on the workload as well as the system performance metrics collected from the computing system and sampled by a histogram data process (such that this data corresponds to “one or more storage resources that store the retained data from the at least one data source”) (Bivens Figure 1 and [0039], [0040]-[0043]; Figure 5, elements 410, 425, 450, 460; and [0060]-[0061]). Bivens further teaches that the updated system configuration parameters from the memory subsystem tool are sent to a configuration controller (or optimization engine) to implement the desired cache changes such as modifying size or line size of the cache, such that this process for determining these system configuration updates represents a way to adapt a usage of one or more storage resources based on the given input information (Bivens Figure 1, element 190; Figure 5, elements 440, 455; [0063]: “The configuration controller 340 implements the necessary changes to the current cache and memory configuration parameters … These changes include modifying the size or line size of the cache. These necessary changes are implemented during runtime of the computing system.”; and [0065]-[0066]: “The memory configuration subsystem tool 415 includes a workload parameter identification element 425 for identifying the applications running on the computing system and the workloads generated by those applications. … The memory configuration subsystem tool 415 also includes a workload classifier 450 in communication with the workload parameter identification element 425 and a memory performance repository 445. … Both the memory performance repository and the workload classifier are in communication with the memory performance engine 430. The memory performance engine uses the input from the memory performance engine, the workload classifier and the target applications and scenarios 460, i.e., workload and performance parameters, to generate the explicit function of configuration parameters on performance metrics. … This explicit function is communicated to an application level performance characterization element 440 that combines the equation with the application level performance goal inputs 465. An optimization engine 455 is used to solve the explicit function based on the identified performance goals and to derive the best set of configuration parameters for the desired performance goal. … The desired configuration parameters are output as system configuration directives 420 for a given input scenario.”).), 
wherein the method is performed by at least one processing device comprising a processor coupled to a memory (Examiner’s note: Bivens teaches the memory configuration subsystem tool is implemented as a module located within a computing system that contains a processor and associated memory (Bivens Figure 4 and [0061]: “Referring to FIG. 4, an exemplary embodiment of a computing system with a reconfigurable memory subsystem 300 in accordance with the present invention is illustrated. The computing system includes a central processing unit 310 and a memory subsystem including a main memory portion 330 and a cache 320 in communication with the processing unit.”).).  
While Bivens teaches a workload classifier performing a classification output, Bivens does not explicitly teach
… [classifying data] … into one of a plurality of predefined retention models …
… wherein each of the predefined retention models corresponds to a different amount of retained data that is retained in one or more storage resources for at least some of the source data from the at least one data source …
Smith teaches
… [classifying data] … into one of a plurality of predefined retention models (Examiner’s note: Under its broadest reasonable interpretation, a “predefined retention model” is interpreted as a model description that list a set of characteristics describing properties related to data retention and/or associated configuration settings that describe how to handle or apply the storage (or retention) of data. Smith teaches a control unit receiving a workload from applications on a host, where this workload represents a pattern of I/O memory reads and writes, and analyzing this pattern over a period of time to detect changes in workloads, such that the control unit identifies and classifies the I/O workload into one of the specified category/type (Smith Figure 1, element 122; [0021], [0026], [0032]-[0034]: “The storage system then begins operating … loads a data ingest application into its RAM … Control unit 122 continuously monitors I/O being processed by storage controller 120 … and tracks various parameters, including the number of read requests, number of write requests, size of read requests, size of write requests, and addresses indicated by each I/O request. Control unit 122 then categorizes the I/O processing workload into a category based on these characteristics. …  the first type of l/O workload (i.e., the application messaging and responses) is associated with a "streaming ingest" type of workload characterized by large write requests directed to sequential addresses in memory. … control unit 122 loads a new profile for "streaming ingest" workloads into memory. …”; and [0037]: “… Control unit 122 then analyzes the new workload and loads a new profile for storage system 150 in order to tune storage system 150 for the new type of I/O workload.”). Smith further teaches a plurality of workloads based on category/type as shown in Smith Figure 4, where one of these I/O workload category/type is a “streaming ingest” workload (Smith Figure 4, 1st row, with the “Workload” column identifying a “classification”), where each row identifies an instantiation/representation that is interpreted as “one of a plurality of predefined retention models”. Hence, the control unit taught in Smith identifies predefined retention models through a I/O workload categorization process, and as such, Smith teaches a series of steps that classifies data into one of a plurality of predefined retention models (represented by the workload profiles shown in Smith Figure 4), where each workload category includes corresponding configuration memory configuration parameters to adapt a usage for one or more storage devices.) …
… wherein each of the predefined retention models corresponds to a different amount of retained data that is retained in one or more storage resources for at least some of the source data from the at least one data source (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s specification p.5 lines 1-11, this limitation broadly recites each predefined retention model performs an adaptation to support different amounts of retained data in the one or more storage resources. As indicated earlier, Smith teaches a set of I/O workload category/types as shown in Smith Figure 4 (with each I/O workload category/type corresponding to “one of a plurality of predefined retention models”). Smith further teaches each workload profile has settings to transfer incoming data into permanent storage, and to allocate larger write cache sizes to enhance the ability of the storage system to increase the overall bandwidth, as well as supporting a storage controller command containing a “hint” field used for indicating workload durations, ETA of the workload, Service Level Agreement requirements, storage tiering requirements, durability or transiency of data (e.g., whether there are temporary files to be deleted after completion of a job). A person having ordinary skill in the art would understand that the different workload profiles with settings that support direct transfer to permanent storage and large write cache sizes, as well as supporting storage controller commands with “hint” field settings to specify other properties related to workload durations, SLA agreements, storage tiering requirements, and durability or transiency of data through a hint field, enables each I/O workload profile to support different amounts of data to be retained as well as specifying a degree of data retention (based on the usage of the “hint” field) for the one or more storage resources receiving those workload command request changes (Smith [0034]: “… Depending on the storage type, the settings may also allow for incoming data to be directly transferred to permanent storage. The new profile also includes a setting that allocates a much larger write cache size than the previous profile. These settings enhance the ability of the storage system 150 to increase its overall bandwidth.”; and [0039]: “… command 510 is a SAS OPEN Address Frame generated by host 110 and provided to storage controller 120 in order to “hint” that the type of I/O workload at storage controller 120 is about to change. The hint … explicitly indicates an upcoming type of I/O processing workload … from the host. … the hint also includes an (ETA) for the workload … the hint additionally includes an expected duration of the new workload. In further embodiments, hints provide information about Service Level Agreement (SLA) requirements, storage tiering requirements, durability or transiency of data (e.g., whether there are temporary files that will be deleted after a job completes), etc.”).) …
Both Bivens and Smith are analogous art since they both teach analyzing changes in I/O workloads based on memory/cache reads and writes, and applying memory configuration settings to adjust to I/O workload changes in a system.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the output of the workload classifier from the memory configuration subsystem tool taught in Bivens and additionally process this output to further classify the workload into different classes or categories associated with memory configuration profile settings taught in Smith as a way to define and represent predefined data retention models that can be used to adapt a usage of one or more storage resources. The motivation to combine is taught in Smith, as providing profiles to perform storage system adaptation provides an efficient way for a system to load settings and quickly adapt to changes in I/O workloads in the system during run-time, thus not only allowing support of different workloads used by various applications, but also improving efficiency in the system by dynamically adjusting the I/O caching and queuing strategies used within the system, thereby improving usage of memory and computational resources (including bandwidth resources) in the system (Smith [0018]: “… control unit 122 loads these profiles from persistent memory 124 into volatile memory 126 (e.g., a Random Access Memory (RAM)), and utilizes loaded profiles to alter how storage system 150 functions. In this manner, storage controller 120 may adapt storage system 150 to different types of incoming I/O workloads from host 110. Adjusting storage system settings regularly based on a profile helps storage system 150 to adapt to changing conditions. For example, in a Storage As A Service (SAAS) environment, the I/O workload from host 110 may vary from ingest, sequential workload requests (e.g., for one client) to primarily transactional requests for a website ( e.g., for another client). For each of these workloads a different combination of I/O caching and queuing settings may be desired in order for storage system 150 to function efficiently. For example, it may be beneficial to coalesce I/O requests together in bandwidth-intensive ingest sequential workloads, because this enhances the overall bandwidth of the system.”).
Regarding amended Claim 9, 
Bivens in view of Smith teaches
(Currently Amended) The method of claim 1, wherein the classification determines a complexity of the at least one data model for the at least one data source and the predefined retention model for the sampled data from the at least one data source, wherein the complexity of the at least one data model is based on at least in part on a number of components of the at least one data model (Examiner’s note: As indicated earlier, this limitation exhibits a 112(a) lack of written description issue, and hence for purposes of examination, the classification aspect of the limitation will not carry any patentable weight, and hence this limitation will be interpreted as broadly reciting different models of different data model complexities, where the components of a model broadly recites any physical or logical properties or characteristics associated with a model. As indicated earlier, Smith teaches analyzing a pattern of I/O memory reads and writes over a period of time from applications (corresponding to “sampled data from the at least one data source”) to determine a classification of the I/O workload into a specific category/type. Smith Table 4 shows a list of predefined retention models, with each row containing corresponding I/O characteristics and desired system characteristics associated with the classification of a retention model (listed under “I/O Characteristics” and “Desired System Characteristics” columns respectively, Smith [0021], [0033], [0038]), where these characteristics describe and represent the different I/O and system characteristics for each model (with each of the listed I/O and system characteristics broadly corresponding to components associated with a I/O workload category/type, thus corresponding to “… a complexity of the at least one data model for the at least one data source, wherein the complexity of the at least one data model is based on at least in part on a number of components of at least one data model”).).  
Regarding amended Claim 10, 
Bivens in view of Smith teaches
(Currently Amended) The method of claim 1, wherein the adapting the usage of the one or more storage resources comprises one or more of 
(i) varying a data retention model as a function of an age of the sampled data from the at least one data source; 
(ii) evicting data from a cache based at least in part on the representation (Examiner’s note: As indicated earlier, Bivens teaches modifying the size or line size of the cache as optimal configuration parameters derived from collected input data, where these configuration parameters represent dynamic adjustments in the memory configuration for the system determined by an optimization engine (Bivens Figure 5, element 455, with the optimized parameters being used for “adapting the usage of the one or more storage resources”). Bivens further teaches that a portion of the collected data used for determining the optimal configuration parameters is in the form of system performance metrics that measure cache evictions (e.g., cache residency time, single residency time, Bivens [0030]) and software parameters that implement cache evictions (e.g., LRU or random replacement algorithms, Bivens [0028]), such that the optimal configuration parameters will reflect the eviction cache characteristics of the collected data.); 
(iii) moving the retained data from the at least one data source to a different storage tier; and 
(iv) determining an amount of time to store the retained data from the at least one data source.  
Regarding amended Claim 11, 
Bivens in view of Smith teaches
(Currently Amended) The method of claim 1, wherein the plurality of predefined retention models comprises one or more of 
a lossy retention model that maintains a type of a probability density function, one or more parameters of the probability density function and one or more summary statistics (Examiner’s note: The claims do not provide details of the lossy retention model, and hence this claim limitation is being interpreted under its broadest reasonable interpretation. Smith Table 4 shows a list of predefined retention models, with associated pre-defined profiles (listed under “Profile Settings” column) that contains a list of memory configuration settings for configuring/re-configuring the attached storage devices according to different application workloads and changing conditions reflected in the workloads). As indicated earlier, Smith teaches a storage controller command to change a workload profile (where this change in workload profile is triggered through loading new applications with different workloads) contains a “hint” field that is used for indicating additional properties concerning the workload, such as workload durations, ETA of the workload, as well as Service Level Requirements, storage tiering requirements, or durability or transiency of data involving removing of files (Smith [0036], [0038]). As indicated earlier, this hint field represents a mechanism for which a predefined retention model can specify a degree of data retention for the model (whether it should retain all data or just a small set of data), and hence allows a predefined data retention model to be considered as being a “lossy retention model”, with the corresponding I/O characteristics and desired system characteristics associated with the classification (Smith [0021], [0033], [0038]) describing the different I/O and system characteristics for each model that are represented by their respective probability density function and their parameters (as taught in the Bivens reference), thus corresponding to “wherein the plurality of predefined retention models comprises one or more of … a lossy retention model that maintains a type of a probability density function, one or more parameters of the probability density function and one or more summary statistics”.); 
a subsample retention model that maintains a type of a probability density function, one or more parameters of the probability density function, a time interval, one or more summary statistics and a subsample of the source data from the at least one data source; and 
a complete retention model that maintains a type of a probability density function, one or more parameters of the probability density function, the source data from the at least one data source, and a time interval.  
Regarding amended Claim 12, 
Claim 12 recites an apparatus comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 1, and hence is rejected under similar rationale and motivations provided by Bivens and Smith as indicated in Claim 1. In addition, Bivens teaches a computing system that contains a processor and associated memory, where the computing system represents the apparatus recited in this claim (Bivens Figure 4 and [0061], [0067]-[0068]).
Regarding amended Claim 15, 
Claim 15 recites the apparatus of claim 12, where the apparatus further comprises of claim limitations that are similar in scope to corresponding claim limitations in Claim 9, and hence is rejected under similar rationale provided by Bivens in view of Smith as indicated in Claim 9, in view of rejections from Claim 12.
Regarding amended Claim 16, 
Claim 16 recites the apparatus of claim 12, where the apparatus further comprises of claim limitations that are similar in scope to corresponding claim limitations in Claim 10, and hence is rejected under similar rationale provided by Bivens in view of Smith as indicated in Claim 10, in view of rejections from Claim 12.
Regarding amended Claim 17, 
Claim 12 recites a non-transitory processor-readable storage medium storing program code of one or more software programs, where the program code when executed by the at least one processing device cause the at least one processing device to perform steps comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 1, and hence is rejected under similar rationale and motivations provided by Bivens and Smith as indicated in Claim 1. In addition, Bivens teaches a computing system that contains a processor and associated memory, where the associated memory includes computer readable storage medium containing program instructions for execution on a computer system, where the computer readable storage medium represents the non-transitory processor-readable storage medium recited in this claim (Bivens Figure 4 and [0061], [0067]-[0068], [0074]-[0075]).
Regarding amended Claim 19, 
Claim 19 recites the non-transitory processor-readable storage medium of claim 17, where the non-transitory processor-readable storage medium further comprises of claim limitations that are similar in scope to corresponding claim limitations in Claim 9, and hence is rejected under similar rationale provided by Bivens in view of Smith as indicated in Claim 9, in view of rejections from Claim 17.
Regarding amended Claim 20, 
Claim 20 recites the non-transitory processor-readable storage medium of claim 17, where the non-transitory processor-readable storage medium further comprises of claim limitations that are similar in scope to corresponding claim limitations in Claim 10, and hence is rejected under similar rationale provided by Bivens in view of Smith as indicated in Claim 10, in view of rejections from Claim 17.
Claims 2-8, 13-14, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over 
 Bivens et al., U.S. PGPUB 2013/0318305, published 11/28/2013 [hereafter referred as Bivens] in view of Smith et al., U.S. PGPUB 2015/0242133, published 8/27/2015 [hereafter referred as Smith] as applied to Claims 1, 12, and 17; in further view of Tekumalla et al., Copula-HDP-HMM: Non-parametric Modeling of Temporal Multivariate Data for I/O Efficient Bulk Cache Preloading, 2016 [hereafter referred as Tekumalla].
Regarding amended Claim 2, 
Bivens in view of Smith as applied to Claim 1 teaches
(Currently Amended) The method of claim 1, wherein the at least one data model comprises one or more of
a parametric model (Examiner’s note: Bivens teaches applying different parametric functions with associated probability density functions to best fit the empirical density representation for a given cache performance metric p for a workload w, where the different probability density functions for the parametric functions                         
                            
                                
                                    f
                                
                                
                                    p
                                    ,
                                    w
                                
                            
                            
                                
                                    x
                                
                            
                        
                     include a gamma density function and a shifted power-law density function to identify a parametric form for                         
                            
                                
                                    f
                                
                                
                                    p
                                    ,
                                    w
                                
                            
                        
                    , where this parametric form that is derived through this fitting process using the histogram and empirical data representations corresponds to “wherein the at least one data model comprises … a parametric model” (Bivens [0044]-[0045] equations 4 and 5 and Bivens [0043]: “The empirical density for different system configurations for these system performance statistics has a heavy tail, suggesting their modeling using functions having such form. Any cache performance metric, p, for a workload, w, is modeled as a random variable X, where p can be the cache residency time, the single residency time or the inter-hit time. The probability density function of any metric p for any workload w is denoted by the function                         
                            
                                
                                    f
                                
                                
                                    p
                                    ,
                                    w
                                
                            
                            
                                
                                    x
                                
                            
                        
                    , and a parametric form for                         
                            
                                
                                    f
                                
                                
                                    p
                                    ,
                                    w
                                
                            
                        
                     is identified using the histogram data.”).) … 
… a descriptive statistics model (Examiner’s note: Bivens teaches collecting performance statistics during run-time operation or via FPGA simulator, where the simulator captures system performance statistic metrics using different time ranges and with different granularity, through the use of probe packets with simulated addresses sent out on the cHT bus links connected to memory modules simulating cache accesses at different time intervals/clock cycles, where the measuring of the arrival times of the probe packets based on the included timestamps in each probe packet is used to simulate and capture this cache-based input data (Bivens Figure 2, element 215; Figure 3, element 210; and [0032]-[0035]). Bivens further teaches using this collected data to calculate the mean values for each of the system performance statistic metrics (cache residency time, single residency time, inter-hit residency time, where the calculated mean values for these statistics correspond to “descriptive statistics”), and are used to define the empirical density representation (Bivens [0042]-[0043]) for use in fitting the different parametric functions described earlier, thus allowing the fitted parametric model to correspond to “wherein the at least one data model comprises … a descriptive statistics model”.) …
… a time series model (Examiner’s note: As indicated earlier, Bivens teaches collection of performance statistics using a FPGA based large cache simulator, where the simulator captures system performance statistic metrics including cache miss ratios, cache residency time, single residency time, inter-hit time using different time ranges and with different granularity, through the use of probe packets with simulated addresses sent out on the cHT bus links connected to memory modules simulating cache accesses at different time intervals/clock cycles, where the measuring of the arrival times of the probe packets based on the included timestamps in each probe packet is used to simulate and capture this cache-based input data (Bivens Figure 2, element 215; Figure 3, element 210; and [0034]-[0035]), and are used to define the empirical density representation (Bivens [0042]-[0043]) for use in fitting the different parametric functions described earlier, thus allowing the fitted parametric model to correspond to “wherein the at least one data model comprises … a time series model”.) …
… decision trees …
… an ensemble of decision trees.  
However, Bivens in view of Smith does not teach
… a non-parametric model …
Tekumalla teaches
… a non-parametric model (Examiner’s note: Tekumalla teaches a process for collecting I/O trace data representing memory accesses from I/O workloads resulting from applications running in the system, where these traces are collected and aggregated into time-slices and sampled into histogram representations (Tekumalla p.779 Section 5.2 Spatio-Temporal Aggregation: From Raw Trace to Temporal Sequence of Count Vectors, with this process corresponding to “sampling data from at least one data source”), and using a Copula-HDP-HMM model, where this model is a temporal non-parametric mixture model with Gaussian copula (Tekumalla p.776 col.1 Section 2 Preliminaries and p.776 col.2 Section 3. Copula-HDP-HMM: Non-parametric Model for Temporal Multivariate Data, 1st paragraph: “Non-parametric temporal mixture models for multivariate data is an important problem with wide applicability. We propose a new technique, the Copula-HDP-HMM, a temporal Dirichlet Process mixture model for multivariate data with Gaussian Copula.”).) …
Both Bivens in view of Smith and Tekumalla are analogous art since they both teach fitting of collected data representing I/O memory/cache read and write accesses into data models. 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the method using a parametric data model taught in Bivens in view of Smith and further include the method using a non-parametric data model taught in Tekumalla as a way to perform data fitting using a non-parametric data model. The motivation to combine is taught in Tekumalla, as performing multivariate analysis and inference using extended rank likelihood estimation (e.g., data model fitting) on collected data that exhibits complex behavior and containing non-continuous values is computationally less expensive than other traditional methods (such as marginal estimation), as it avoids defining corresponding marginal distributions to fully represent and describe the data model (Tekumalla p.775 col.2 Our Contributions; p.776 col.2 Section 2 Preliminaries; and p.777 col.1 Section 4 Inference with Extended Rank Likelihood 1st-2nd paragraphs). Furthermore, aggregating the collected data into a higher-level trace representation allows it to be easily sampled into histograms, which allows the usage of the extended rank likelihood estimation technique (Tekumalla p.779 Section 5.2 Spatio-Temporal Aggregation: From Raw Trace to Temporal Sequence of Count Vectors), thereby reducing the computational complexity of the estimation algorithm, as well as providing a simpler and more flexible algorithm to estimate continuous, discrete, and non-continuous data, thus enabling the system to be more robust in modelling other various workloads (Tekumalla p.774 col.1 Abstract, 1st-2nd paragraphs: “We address bulk preloading by analyzing high-level spatio-temporal motifs from raw and noisy I/O traces by aggregating the trace into a temporal sequence of correlated count vectors. Such temporal multi-variate data from trace aggregation arise from a diverse set of workloads leading to diverse data distributions with complex spatio-temporal dependencies. … Motivated by this, we propose the Copula-HDP-HMM, a new Bayesian non-parametric modeling technique based on Gaussian Copula, suitable for temporal multivariate data with arbitrary marginals, avoiding limiting assumptions on the marginal distributions. We are not aware of prior work on copula based extensions of Bayesian non-parametric modeling algorithms for discrete data. Inference with copulas is hard when data is not continuous. We propose inference based on extended rank likelihood that circumvents specifying marginals, making our inference suitable for count data and even data with a combination of discrete and continuous marginals, enabling the use of Bayesian non-parametric modeling, for several data types, without assumptions on marginals.”).
Regarding amended Claim 3, 
Bivens in view of Smith, in further view of Tekumalla teaches
(Currently Amended) The method of claim 2, wherein the fitting of the parametric model to at least some of the sampled data comprises 
representing the sampled data using a probability distribution function (Examiner’s note: As indicated earlier, Bivens teaches applying the derived empirical density (where the empirical density is based on the histogram representations generated from the collected data, thus corresponding to “sampled data”) with different probability density functions (Bivens [0044]-[0045], equations 4 and 5) to determine the best fit to the empirical density, where this process of applying the empirical density representation to different probability density functions corresponds a process for “representing the sampled data using a probability distribution function” (Bivens [0043]: “The empirical density for different system configurations for these system performance statistics has a heavy tail, suggesting their modeling using functions having such form. Any cache performance metric, p, for a workload, w, is modeled as a random variable X, where p can be the cache residency time, the single residency time or the inter-hit time. The probability density function of any metric p for any workload w is denoted by the function f(x), and a parametric form for f is identified using the histogram data. For each benchmark, different families of parametric density functions are tried to best fit the histogram data … ”).) and 
determining one or more parameters of the probability distribution function (Examiner’s note: As indicated earlier, Bivens teaches applying the derived empirical density representation with different probability density functions to determine the best fit to the empirical density representation for a given cache performance metric p for a workload w, where performing a best fit using the probability density functions requires estimation of different parameters within the equations (α and β for the gamma density, and b and n for the shifted power-law density), where the estimation of these parameters correspond to “determining one or more parameters of the probability distribution function” (Bivens [0044]-[0045], equations 4 and 5 and [0047]-[0048]: “For both the density functions, two parameters need to be estimated. For gamma density the parameters are α and β, and for the shifted power law the parameters that need to be estimated are b and n … In addition, the parameters (                        
                            
                                
                                    α
                                
                                
                                    p
                                    ,
                                    w
                                
                            
                        
                    ,                         
                            
                                
                                    β
                                
                                
                                    p
                                    ,
                                    w
                                
                            
                        
                    ) in equation (4) and (                        
                            
                                
                                    [
                                    b
                                    ]
                                
                                
                                    p
                                    ,
                                    w
                                
                            
                        
                    ,                         
                            
                                
                                    [
                                    n
                                    ]
                                
                                
                                    p
                                    ,
                                    w
                                
                            
                        
                    ) in equation (5) for a given p and w depend on the system configuration parameters s and z. Results of fitting the two density functions to different cache statistics are provided.”).).  
Regarding amended Claim 4, 
Bivens in view of Smith, in even view of Tekumalla teaches
(Currently Amended) The method of claim 2, wherein the fitting of the non-parametric model to at least some of the sampled data comprises representing the sampled data using one or more of 
a Gaussian Mixture model, 
a Gaussian Mixture model that captures temporal variance (examiner’s note: As indicated earlier, Tekumalla teaches collecting I/O trace data representing memory accesses from diverse workloads with different data distributions and aggregating the collected data into time-slices and sampling them into histogram representations for use in a Copula-HDP-HMM data model. Tekumalla teaches the Copula-HDP-HMM model is defined as a temporal non-parametric mixture model with Gaussian copula that is used with the extended rank likelihood estimation technique to perform inference estimation (Tekumalla p.779 col.2 Section 5.3 HULK for Bulk Cache Preloading; p.778 Algorithm 2; p.778 Figure 2; and p.777 col.1 Section 4 Inference with Extended Rank Likelihood 2nd paragraph), such that this Copula-HDP-HMM model corresponds to fitting a non-parametric data model “using one or more of … a Gaussian Mixture model that captures temporal variance”.), and 
a Kernel Density Estimation model[.]
Regarding amended Claim 5, 
Bivens in view of Smith, in further view of Tekumalla teaches
(Currently Amended) The method of claim 2, wherein the fitting of the descriptive statistics model to at least some of the sampled data comprises recording one or more of 
predefined summary statistics of the representation of the sampled data (Examiner’s note: As indicated earlier, Bivens teaches calculating the mean values for the performance statistics that are derived from the collected data produced by a FPGA based large cache simulator, and these mean values are used to as part of the fitting to the different parametric models (Bivens [0042]-[0045]), where these mean values correspond to “recording one or more of … predefined summary statistics of the representation of the sampled data”.) and 
a time stamp of the first and last recorded value in the sampled data (Examiner’s note: As indicated earlier, Bivens teaches maintaining respective timestamps of probe packet arrivals in the FPGA based large cache simulator to collect the cache residency time based on accesses to the memory controller simulating cache hit or miss on storage elements, where inspection of these probe packets with associated timestamps includes capturing “first” and “last” respective timestamps of the corresponding probe packets, according to the starting time and end time representing the running duration of the FPGA based large cache simulator (Bivens [0034]: “In the current example, a timestamp of the probe packet arrival is maintained to collect the residency time each cache entry.”). Bivens further teaches the captures from this data are used to determine the memory and cache access data that are used to generate the histogram and empirical density representations used for the fitting of the parametric models (Bivens [0039]-[0045]), and as such, this time stamp monitoring process corresponds to “recording one or more of … a time stamp of the first and last recorded value in the sampled data”.).  
Regarding amended Claim 6, 
Bivens in view of Smith, in further view of Tekumalla teaches
(Currently Amended) The method of claim 2, wherein the fitting of the time series model to at least some of the sampled data comprises recording the source data and corresponding time stamps generated by the at least one data source (Examiner’s note: As indicated earlier, Bivens teaches maintaining respective timestamps of probe packet arrivals in the FPGA based large cache simulator to collect the cache residency time based on accesses to the memory controller simulating cache hit or miss on storage elements, which requires monitoring and collection of timestamps in the sampled data, where inspection of these probe packets with associated timestamps includes capturing first and last respective timestamps of the corresponding probe packets (as well as all timestamps in between), according to the starting time and end time representing the running duration of the FPGA based large cache simulator (Bivens [0034]). Bivens further teaches the captures from this data are used to determine the memory and cache access data that are used to generate the histogram and empirical density representations used for the fitting of the parametric models (Bivens [0039]-[0045]), and as such, this time stamp monitoring process corresponds to “recording the source data and corresponding time stamps generated by the at least one data source”.).  
Regarding amended Claim 7, 
Bivens in view of Smith, in further view of Tekumalla teaches
(Currently Amended) The method of claim 2, further comprising 
identifying a distribution drift based at least in part on at least one density function of one or more of 
the parametric model (Examiner’s note: Under its broadest reasonable interpretation, the term “distribution drift” is interpreted as any change or adjustment that is triggered by changes from an initial representation of the distribution (such as any change in parameters from the initial input data, or even a change in the source of the initial input data). As indicated earlier, Bivens teaches modeling a cache performance metric p for a workload w using different probability density functions                         
                            
                                
                                    f
                                
                                
                                    p
                                    ,
                                    w
                                
                            
                            
                                
                                    x
                                
                            
                        
                    , and determining the parametric form for                         
                            
                                
                                    f
                                
                                
                                    p
                                    ,
                                    w
                                
                            
                        
                     (where this parametric form corresponds to “the parametric model”) using the histogram and empirical data representations. As indicated earlier, Bivens further teaches that different probability density functions are used in order to determine a better fit, where for certain workload scenarios and cache line configurations, a gamma density function fits better than a shifted power-law density function (and vice versa) according to identified cache line sizes and workloads B-SE, B-DB, B-SJ (Bivens [0038], [0049]-[0051]). These cache line sizes and workloads are part of the collected data used to determine the empirical density representation, and as such, these changes in the cache line size and/or workload represent parameter changes in the initial input data as well as changes in the source of the initial input data (where the occurrence of these changes is interpreted as “a distribution drift”). Hence, the process of identifying a better fit using one of the specified parametric models according to these changes taught in Bivens [0049]-[0051] is interpreted as a process for “identifying a distribution drift based at least in part on at least one density function of one or more of … the parametric model”.) and the non-parametric model; and 
performing the fitting when a distribution drift is identified (Examiner’s note: As indicated earlier, Bivens teaches that different parametric density function are used for providing a better fit, where for certain workload scenarios and cache line configurations, a gamma density function fits better than a shifted power-law density function (and vice versa) according to identified cache line sizes and workloads B-SE, B-DB, B-SJ (Bivens [0038], [0049]-[0051]), where these different workload scenarios and different cache line configurations represent changes in the source of the initial input data and parameter changes in the initial input data (with the occurrence of these changes corresponding to “a distribution drift”). Hence, the process of establishing and re-fitting different parametric models to provide a better fit according to these changes taught in Bivens [0049]-[0051] correspond to a process for “performing the fitting when a distribution drift is identified”.).  
Regarding amended Claim 8, 
Bivens in view of Smith as applied to Claim 1 teaches
(Currently Amended) The method of claim 1.
However, Bivens in view of Smith does not teach
… further comprising grouping a plurality of the data sources into a group and representing the group using one data model.  
Tekumalla teaches
… further comprising grouping a plurality of the data sources into a group and representing the group using one data model (Examiner’s note: As indicated earlier, Tekumalla teaches collecting I/O trace data representing memory accesses from diverse workloads with different data distributions and aggregating the collected data into time-slices and sampling them into histogram representations, where this aggregation of collected data into time-slices and sampling them into histogram representations broadly represents a group of sampled data (Tekumalla p.779 Section 5.2 Spatio-Temporal Aggregation: From Raw Trace to Temporal Sequence of Count Vectors 1st-3rd paragraphs) for use in a Copula-HDP-HMM data model. Hence, the above method for performing spatial and temporal aggregation of traces corresponds to “grouping a plurality of the data sources into a group and representing the group using one data model”.).  
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the method using a parametric data model taught in Bivens in view of Smith and further include the method using a non-parametric data model taught in Tekumalla as a way to perform data fitting using a non-parametric data model. The motivation to combine is taught in Tekumalla, as provided in the prior art claim mapping of Claim 2 recited above.
Regarding amended Claim 13, 
Claim 13 recites the apparatus of claim 12, where the apparatus further comprises of claim limitations that are similar in scope to corresponding claim limitations in Claim 2, and hence is rejected under similar rationale and motivations provided by Bivens in view of Smith and Tekumalla as indicated in Claim 2, in view of rejections from Claim 12.
Regarding amended Claim 14, 
Claim 14 recites the apparatus of claim 12, where the apparatus further comprises of claim limitations that are similar in scope to corresponding claim limitations in Claim 8, and hence is rejected under similar rationale and motivations provided by Bivens in view of Smith and Tekumalla as indicated in Claim 8, in view of rejections from Claim 12.
Regarding amended Claim 18, 
Claim 18 recites the non-transitory processor-readable storage medium of claim 17, where the non-transitory processor-readable storage medium further comprises of claim limitations that are similar in scope to corresponding claim limitations in Claim 2, and hence is rejected under similar rationale and motivations provided by Bivens in view of Smith and Tekumalla as indicated in Claim 2, in view of rejections from Claim 17.

Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332. The examiner can normally be reached Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/WILLIAM WAI YIN KWAN/Examiner, Art Unit 2121                                                                                                                                                                                                        


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121