DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This action is in response to the amendments received on 3/30/22.  Claims 1-11 and 13-20 are pending in the application.  
Claims 15-19 are rejected under 35 U.S.C. 101.
Claim 7 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement.
Claim 13 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph.
Claims 1-5 and 13-20 are rejected under 35 U.S.C. 103 as being unpatentable over Mueller et al. (US 2021/0326717), and further in view of Arya et al. (US 2020/0349468).
Claims 6-11 are rejected under 35 U.S.C. 103 as being unpatentable over Mueller in view of Arya, and further in view of Derryberry et al. (US 2021/0389883).

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claim 7 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Claim 7 recites “the manifest file having a format that depends upon the data type”, however, there is no support for this feature in the specification.

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 13 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 13 recites the limitation "the storage system".  There is insufficient antecedent basis for this limitation in the claim.  Claim 1 discusses “a storage system” twice, once in each “storing” limitation.  It is unclear which storage system claim 13 is referring to.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 15-19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because claims 15-19 are directed towards systems that contain software only.  A system must contain hardware or structure to realize the claims functionality.  Thus, the claims are not directed to a process, machine, manufacture, or composition of matter and are therefore not statutory.	

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5 and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Mueller et al. (US 2021/0326717), and further in view of Arya et al. (US 2020/0349468).

With respect to claim 1, Mueller teaches a computer-implemented method implemented in a database system, the method comprising: 
storing in a storage system a plurality of data items received via a communication interface (Mueller, pa 0033, object storage locations 117 may be folders, buckets, etc. storing files or objects), each of the plurality of data items being associated with a respective label (Mueller, pa 0047, if the user 119 provides a partially or completely un-labeled training dataset, the CML service 103 may engage with the user 119 to create a job for a labeling service 143 at circle (4C), which could be an active learning type system, a human labeling service, or the like, that can provide labels for these samples.); 
creating in the database system a virtual dataset that includes a plurality of changesets each including a respective plurality of data references identifying a respective subset of the plurality of data items (Mueller, [0049] As indicated, upon a modification to the object storage location 117, the storage service 114 (or object storage location 117 itself) may emit an event notification message directly or indirectly to the ML orchestrator 115 at circle (5), as described in detail earlier herein. The event notification message may identify the modified objects (e.g., files) such as the names and/or locations (e.g., a Uniform Resource Locator (URL)) and/or other attributes of added files, changed files, deleted files, etc.); 
one or more query parameters identifying a designated one or more of the plurality of changesets (Mueller, pa [0053] The ML orchestrator 115 may also, based on the event notification message(s) received at circle (5), determine to perform a retraining of the models (e.g., when additional, different, or less training data is made available in the object storage location) & [0084] Optionally, at block 910, the operations 900 include deploying, within the multi-tenant service provider, an ML orchestrator associated with the account. The ML orchestrator may be code deployed as a function within an on-demand code execution service of the service provider network. The one or more object storage locations may be configured to, upon a modification to the storage location(s)—such as a write of a file, a deletion of a file, a modification of a file, etc.—to send an event notification to the ML orchestrator, which may identify the changed file/object.);
creating a learning dataset including a designated subset of the plurality of data items retrieved based on the respective pluralities of data references included in the designated changesets (Mueller, [0085] The operations 900 include, at block 915, detecting, by the ML orchestrator, that a training dataset has been stored at the object storage location. The detecting may comprise receiving an event notification message that was originated by the object storage locations/storage service providing the object storage locations, and determining that the event notification message includes an identifier of a file known (e.g., per a known extension, etc.) to be a training dataset. The training dataset may include one or more samples, where each sample includes one or more values corresponding to one or more attributes. [0086] At block 920, the operations 900 include determining a target variable to infer based on the training dataset. The determination may include identifying a particular “column” name or format within the training dataset, e.g., a name of “class” or a name having a matching pattern (e.g., having an asterisk or some other character(s) in the name), etc.); and
providing access to the learning dataset (Mueller, [0087] The operations 900 include, at block 925, initiating a plurality of ML training jobs, using at least the training dataset and a ML training service of the service provider network, to generate a plurality of ML models. The initiating may include use of a model training system of the provider network and may include causing an application including (or otherwise utilizing) an AutoML library to begin a model exploration/training task. The initiating may include using an AutoML service provided by the service provider network to begin an AutoML process. The training operations may be controlled by configuration information data provided by the user, e.g., via a UI, and stored at the one or more storage locations.).
Mueller doesn't expressly discuss each of the first plurality of data items being associated with a respective label and receiving a request to create a learning dataset.
Arya teaches storing in a storage system a plurality of data items, each of the plurality of data items being associated with a respective label (Arya, pa 00114, The electronic device 110 generates, utilizing a machine learning model, a set of labels corresponding to the dataset); 
receiving a request to create a learning dataset, the request including one or more query parameters identifying a designated one or more of the plurality of changesets (Arya, pa 0115, In an example, the virtual object (e.g., the package) is based at least in part on a particular query with SQL-like commands such as defining a selection of columns in the dataset and/or joining data from annotations and/or splits objects.);
creating a learning dataset including a designated subset of the plurality of data items retrieved (Arya, pa 0115, The electronic device generates a virtual object based at least in part on the subset of the dataset and the set of labels); and
providing access to the learning dataset (Arya, pa 0115, The electronic device 110 trains a second machine learning model using the virtual object and at least the subset of the dataset).
It would have been obvious at the effective filing date of the invention to a person having ordinary skill in the art to which said subject matter pertains to have modified Mueller with the teachings of Arya because it provides annotations for data that can be tailor to a respective ML model (Arya, pa 0028).

With respect to claim 2, Mueller in view of Arya teaches the computer-implemented method recited in claim 1, the method further comprising: receiving via the communication interface a remote datastore query, the remote datastore query including one or more parameters for retrieving a designated one or more data items from a remote datastore accessible via the internet (Mueller, Fig. 2 & pa 0046, external data system and curated data stores storing data for ML).

With respect to claim 3, Mueller in view of Arya teaches the computer-implemented method recited in claim 2, the method further comprising: updating the virtual dataset to include a indicated changeset, the indicated changeset identifying the remote datastore query (Arya, pa 0115, generating a virtual object based in part on a subset of the dataset, where the virtual object corresponds to a selection of data similar to a particular query of the dataset.  The virtual object is based at least in part on a particular query with SQL-like commands).

With respect to claim 4, Mueller in view of Arya teaches the computer-implemented method recited in claim 3, the method further comprising: creating a dataset view associated with the indicated changeset, the dataset view referencing the remote datastore query (Mueller, pa 0109, materializing search results as materialized views).

With respect to claim 5, Mueller in view of Arya teaches the computer-implemented method recited in claim 3, wherein the request query identifies the indicated changeset, and wherein the learning dataset includes the remote datastore query (Arya, pa 0115, generating a virtual object based in part on a subset of the dataset, where the virtual object corresponds to a selection of data similar to a particular query of the dataset.  The virtual object is based at least in part on a particular query with SQL-like commands).

With respect to claim 13, Mueller in view of Arya teaches the computer-implemented method recited in claim 1, wherein the storage system is located within an on-demand computing services environment configured to provide computing services to a plurality of organizations via the internet, and wherein creation of and access to the learning dataset is provided as a service via the internet (Mueller, Fig. 1 & pa 0025, A cloud can provide convenient, on-demand network access …Cloud computing can thus be considered as both the applications delivered as services over a publicly accessible network).

With respect to claim 14, Mueller in view of Arya teaches the computer-implemented method recited in claim 13, wherein the data items are stored in a multi-tenant database, each of the organizations corresponding to a respective tenant within the multi-tenant database, access to the virtual dataset being limited to a respective one of the organizations (Mueller, pa 0023, the CIVIL service 103 is implemented within a multi-tenant provider network 100 and operates as part of a ML service 110 to offer ML-related operations described herein as a web-service to users 119.).

	With respect to claims 15-19, the limitations are essentially the same as those of claims 1-5, and are rejected for the same reasons.

	With respect to claim 20, the limitations are essentially the same as those of claim 1, and are rejected for the same reasons.

Claims 6-11 are rejected under 35 U.S.C. 103 as being unpatentable over Mueller in view of Arya, and further in view of Derryberry et al. (US 2021/0389883).

With respect to claim 6, Mueller in view of Arya teaches the computer-implemented method recited in claim 1, as discussed above.
Derryberry teaches the method further comprising: updating a manifest file associated with the virtual dataset, the manifest file identifying each of the first changeset and the second changeset (Derryberry, pa 0069, a virtual machine search index may be used to identify when the file was first created (e.g., corresponding with a first version of the file) and at what times the file was modified (e.g., corresponding with subsequent versions of the file). Each version of the file may be mapped to a particular version of the virtual machine that stores that version of the file.).
	It would have been obvious at the effective filing date of the invention to a person having ordinary skill in the art to which said subject matter pertains to have modified Mueller in view of Derryberry because it provides access to items that are stored in the system (Derryberry, pa 0147).

With respect to claim 7, Mueller in view of Arya and Derryberry teaches the computer-implemented method recited in claim 6, the method further comprising: detecting a data type associated with the first plurality of data items, the manifest file having a format that depends upon the data type (Derryberry, pa 0235, object store abstraction & pa 0244, Generic indexing framework that allows any data type to be plugged in with minimal effort).

With respect to claim 8, Mueller in view of Arya and Derryberry teaches the computer-implemented method recited in claim 7, wherein the data type is selected from the group consisting of: image data, text data, video data, and audio data (Derryberry, pa 0273).

With respect to claim 9, Mueller in view of Arya and Derryberry teaches the computer-implemented method recited in claim 8, the method further comprising: determining a respective hash value for each of the first and second plurality of data items, each hash value being stored in the manifest file (Derryberry, pa 0208, The index includes a mapping from segment+offset−>hash.).

With respect to claim 10, Mueller in view of Arya and Derryberry teaches the computer-implemented method recited in claim 1, wherein the second plurality of data items constitute an indicated subset of data items received via the communication interface and associated with a designated one of the changesets, the method further comprising determining a respective hash value for each of the indicated subset of data items (Derryberry, pa 0110, When newly added content has a content hash that matches that of existing content, this layer does not store the newly added data, just a reference to the existing data.).
	It would have been obvious at the effective filing date of the invention to a person having ordinary skill in the art to which said subject matter pertains to have modified Mueller in view of Derryberry because it decreases storage space by implementing deduplication through hashing (Derryberry, pa 0005).

With respect to claim 11, Mueller in view of Arya and Derryberry teaches the computer-implemented method recited in claim 10, the method further comprising: determining whether each of the hash values is included in a plurality of comparison hash values, each of the comparison hash values being associated with a respective one of the plurality of data items (Derryberry, pa 0110, When newly added content has a content hash that matches that of existing content, this layer does not store the newly added data, just a reference to the existing data.).

Response to Arguments
35 U.S.C. 112 Rejections
Applicant's arguments filed 3/30/22 have been fully considered but they are not persuasive. Applicant stated that claim 7 was cancelled and claim 13 was amended, however, these changes were not made.

35 U.S.C. 101 Rejections
With respect to claims 15-19, Applicant's arguments filed 3/30/22 have been fully considered but they are not persuasive. Claim 15 was amended to include hardware elements, however, these elements are not positively recited as being part of the claimed system.

With respect to claim 20, Applicant’s arguments, filed 3/30/22, have been fully considered and are persuasive.  The 35 U.S.C. 101 of claim 20 has been withdrawn. 


35 U.S.C. 103 Rejections
	Applicant argues that Mueller does not teach that the updated training dataset is created by identifying a plurality of the event notification messages based on query parameters received in a request message and therefore does not teach creating a learning dataset by identifying a plurality of changesets in a virtual dataset based on query parameters received in a request message and then retrieving a subset of data items based on pluralities of data references included in the plurality of changesets. The Examiner respectfully disagrees.  The claims do not recite the limitations as quoted in the arguments.  The event notification messages provide data references of a subset of data items representing changesets (Mueller, pa 0049).  These modifications are then used to retrain the models, providing a new learning dataset (Mueller, pa 0053).  Arya teaches that a virtual object used to train a machine learning model is created based on query parameters (Arya, pa 0115).  Therefore, Mueller in view of Arya teach the claim limitations.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRITTANY N ALLEN whose telephone number is (571)270-3566. The examiner can normally be reached M-F 9 am - 5:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on 571-272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/BRITTANY N ALLEN/Primary Examiner, Art Unit 2169