DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 7/28/22 has been entered. 

Remarks
This action is in response to the request for continuation received on 9/8/22.  Claims 1-11 and 13-20 are pending in the application.  Applicant’s arguments have been respectfully considered.
Claims 15-19 are rejected under 35 U.S.C. 101.
Claims 15-19 are rejected under 35 U.S.C. 112.
Claims 1-5 and 13-20 are rejected under 35 U.S.C. 103 as being unpatentable over Mueller et al. (US 2021/0326717), and further in view of Raman et al. (US 11,226,953).
Claims 6-11 are rejected under 35 U.S.C. 103 as being unpatentable over Mueller in view of Raman, and further in view of Derryberry et al. (US 2021/0389883).

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 15-19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the specification does not define what a computer readable storage medium is intended to be.  Therefore, the storage medium could be a signal.  MPEP 2106 says that a signal per se is not a process, machine, manufacture, or composition of matter and is therefore not statutory.
Director Kappos' memo dated 1-27-10 states the limitation “non-transitory” can be added to the claims to overcome the 35 U.S.C. 101 rejection and would not raise the issue of new matter.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.


Claims 15-19 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. The claims recite “a computer-readable storage medium” however the specification does not define this.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5 and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Mueller et al. (US 2021/0326717), and further in view of Raman et al. (US 11,226,953).

With respect to claim 1, Mueller teaches a computer-implemented method implemented in a database system, the method comprising: 
storing in a storage system a plurality of data items received via a communication interface (Mueller, pa 0033, object storage locations 117 may be folders, buckets, etc. storing files or objects); 
creating in the database system a virtual dataset that includes a plurality of changesets each including a respective plurality of data references identifying a respective subset of the plurality of data items (Mueller, [0049] As indicated, upon a modification to the object storage location 117, the storage service 114 (or object storage location 117 itself) may emit an event notification message directly or indirectly to the ML orchestrator 115 at circle (5), as described in detail earlier herein. The event notification message may identify the modified objects (e.g., files) such as the names and/or locations (e.g., a Uniform Resource Locator (URL)) and/or other attributes of added files, changed files, deleted files, etc.); 
receiving a request to create a learning dataset, the request including one or more query parameters (Mueller, pa 0103, model training system 120 uses one or more container images included in a training request (or a container image retrieved from the container data store 1070 in response to a received training request) to create and initialize a ML training container 1030 in a virtual machine instance 1022. For example, the model training system 120 creates a ML training container 1030 that includes the container image(s) and/or a top container layer.);
identifying a changeset in the plurality of changesets that includes a characteristic that is associated with the one or more query parameters (Mueller, pa 0104, the model training system 120 retrieves training data from the location indicated in the training request. For example, the location indicated in the training request can be a location in the training data store 1060);
determining a set of data references included in the changeset to use in a query of the plurality of data items (Mueller, pa 0104, the model training system 120 retrieves the training data from the indicated location in the training data store 1060.);
creating a learning dataset including a subset of the plurality of data items that is retrieved based on the set of data references included in the changeset (Mueller, pa 0103, the model training system 120 can initially retrieve a portion of the training data and provide the retrieved portion to the virtual machine instance 1022 training the machine learning model.); and
providing access to the learning dataset (Mueller, [0087] The operations 900 include, at block 925, initiating a plurality of ML training jobs, using at least the training dataset and a ML training service of the service provider network, to generate a plurality of ML models. The initiating may include use of a model training system of the provider network and may include causing an application including (or otherwise utilizing) an AutoML library to begin a model exploration/training task. The initiating may include using an AutoML service provided by the service provider network to begin an AutoML process. The training operations may be controlled by configuration information data provided by the user, e.g., via a UI, and stored at the one or more storage locations.).
Mueller doesn't expressly discuss creating in the database system a virtual dataset that includes a plurality of changesets each including a respective plurality of data references identifying a respective subset of the plurality of data items.
Raman teaches creating in the database system a virtual dataset that includes a plurality of changesets each including a respective plurality of data references identifying a respective subset of the plurality of data items (Raman, Fig. 1, changesets 200 & Col. 3 Li. 63-66, changesets of repository record details of effects of modifications made to a briefcase).
It would have been obvious at the effective filing date of the invention to a person having ordinary skill in the art to which said subject matter pertains to have modified Mueller with the teachings of Raman because it allows a user to understand the changes made to a repository over time (Raman, Col. 1 Li. 54-67).

With respect to claim 2, Mueller in view of Raman teaches the computer-implemented method recited in claim 1, the method further comprising: receiving via the communication interface a remote datastore query, the remote datastore query including one or more parameters for retrieving one or more data items from a remote datastore accessible via the internet (Mueller, Fig. 2 & pa 0046, external data system and curated data stores storing data for ML).

With respect to claim 3, Mueller in view of Raman teaches the computer-implemented method recited in claim 2, the method further comprising: updating the virtual dataset to include an indicated changeset, the indicated changeset identifying the remote datastore query (Mueller, pa 0046, the user 119 may drag and drop these files into a UI element (e.g., panel 222 of UI 200), and the files are sent to the CML service 103, which itself places the files into the object storage locations 117 on behalf of the user at circle (48).).

With respect to claim 4, Mueller in view of Raman teaches the computer-implemented method recited in claim 3, the method further comprising: creating a dataset view associated with the indicated changeset, the dataset view referencing the remote datastore query (Mueller, pa 0109, materializing search results as materialized views).

With respect to claim 5, Mueller in view of Raman teaches the computer-implemented method recited in claim 3, wherein the request query identifies the indicated changeset, and wherein the learning dataset includes the remote datastore query (Mueller, pa 0104, model training system 120 retrieves the training data from the indicated location in the training data store 1060).

With respect to claim 13, Mueller in view of Raman teaches the computer-implemented method recited in claim 1, wherein the storage system is located within an on-demand computing services environment configured to provide computing services to a plurality of organizations via the internet, and wherein creation of and access to the learning dataset is provided as a service via the internet (Mueller, Fig. 1 & pa 0025, A cloud can provide convenient, on-demand network access …Cloud computing can thus be considered as both the applications delivered as services over a publicly accessible network).

With respect to claim 14, Mueller in view of Raman teaches the computer-implemented method recited in claim 13, wherein the data items are stored in a multi-tenant database, each of the organizations corresponding to a respective tenant within the multi-tenant database, access to the virtual dataset being limited to a respective one of the organizations (Mueller, pa 0023, the CIVIL service 103 is implemented within a multi-tenant provider network 100 and operates as part of a ML service 110 to offer ML-related operations described herein as a web-service to users 119.).

	With respect to claims 15-19, the limitations are essentially the same as those of claims 1-5, and are rejected for the same reasons.

	With respect to claim 20, the limitations are essentially the same as those of claim 1, and are rejected for the same reasons.

Claims 6-11 are rejected under 35 U.S.C. 103 as being unpatentable over Mueller in view of Raman, and further in view of Derryberry et al. (US 2021/0389883).

With respect to claim 6, Mueller in view of Raman teaches the computer-implemented method recited in claim 1, as discussed above.
Derryberry teaches the method further comprising: updating a manifest file associated with the virtual dataset, the manifest file a new changeset that is associated with one or more data items (Derryberry, pa 0069, a virtual machine search index may be used to identify when the file was first created (e.g., corresponding with a first version of the file) and at what times the file was modified (e.g., corresponding with subsequent versions of the file). Each version of the file may be mapped to a particular version of the virtual machine that stores that version of the file.).
	It would have been obvious at the effective filing date of the invention to a person having ordinary skill in the art to which said subject matter pertains to have modified Mueller in view of Derryberry because it provides access to items that are stored in the system (Derryberry, pa 0147).

With respect to claim 7, Mueller in view of Raman and Derryberry teaches the computer-implemented method recited in claim 6, the method further comprising: detecting a data type associated with the one or more data items, the new changeset having a structure that depends upon the data type (Derryberry, pa 0235, object store abstraction & pa 0244, Generic indexing framework that allows any data type to be plugged in with minimal effort).

With respect to claim 8, Mueller in view of Raman and Derryberry teaches the computer-implemented method recited in claim 7, wherein the data type is selected from the group consisting of: image data, text data, video data, and audio data (Derryberry, pa 0273).

With respect to claim 9, Mueller in view of Raman and Derryberry teaches the computer-implemented method recited in claim 6, the method further comprising: determining a respective hash value for the one or more data items, each hash value being stored in the manifest file (Derryberry, pa 0208, The index includes a mapping from segment+offset−>hash.).

With respect to claim 10, Mueller in view of Raman and Derryberry teaches the computer-implemented method recited in claim 1, wherein the second plurality of data items constitute an indicated subset of data items received via the communication interface and associated with a designated one of the changesets, the method further comprising determining a respective hash value for each of the indicated subset of data items (Derryberry, pa 0110, When newly added content has a content hash that matches that of existing content, this layer does not store the newly added data, just a reference to the existing data.).
	It would have been obvious at the effective filing date of the invention to a person having ordinary skill in the art to which said subject matter pertains to have modified Mueller in view of Derryberry because it decreases storage space by implementing deduplication through hashing (Derryberry, pa 0005).

With respect to claim 11, Mueller in view of Raman and Derryberry teaches the computer-implemented method recited in claim 10, the method further comprising: determining whether each of the hash values is included in a plurality of comparison hash values, each of the comparison hash values being associated with a respective one of the plurality of data items (Derryberry, pa 0110, When newly added content has a content hash that matches that of existing content, this layer does not store the newly added data, just a reference to the existing data.).

Response to Arguments
35 U.S.C. 112 Rejections
Applicant's arguments filed 7/28/22 have been fully considered and are persuasive.  The 35 U.S.C. 112 rejection of claims 7 and 13 has been withdrawn. 

35 U.S.C. 101 Rejections
With respect to claims 15-19, Applicant’s arguments, filed 3/30/22, have been fully considered and raises new issues, as discussed above.

35 U.S.C. 103 Rejections
Applicant argues that Mueller in view of Arya does not teach “identifying a changeset in the plurality of changesets that includes a characteristic that is associated with the one or more query parameters; determining a set of data references included in the changeset to use in a query of the plurality of data items; creating a learning dataset including a subset of the plurality of data items that is retrieved based on the set of data references included in the changeset.” The Examiner respectfully disagrees.  Mueller was analyzed more thoroughly, and the discussion regarding model training was discovered in pa 0104.  Here, the model training system receives a training request, from a user to generate a model (Mueller, pa 0104-0106).  Through this training request, as mapped above, an appropriate data set can be created to train a model.

Conclusion                                                                                                                                                                                           Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRITTANY N ALLEN whose telephone number is (571)270-3566. The examiner can normally be reached M-F 9 am - 5:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on 571-272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/BRITTANY N ALLEN/Primary Examiner, Art Unit 2169