95DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
This communication is responsive to the original application filed on 7/31/2019. This action is Non-Final. Claims 1 – 20 are pending and have been examined.  
Drawings
The applicant’s drawings submitted are acceptable for examination purposes. 
Specification
The applicant’s specification submitted is acceptable for examination purposes. 
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1 – 20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1 – 20 of copending Application No. 16/528,612 (reference application). Although the claims at issue are not identical, they are not patentably distinct from each other.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented. The subject matter claimed in the instant application is fully disclosed in the referenced copending application and would be covered by any patent granted on that copending application since the referenced copending application and the instant application are claiming common subject matter.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 – 20 are rejected under 35 U.S.C. 103 as being unpatentable over Dirac et al., U.S. Patent Application Publication No.: 2015/0379430 (Hereinafter “Dirac”), and further in view of George et al., U.S. Patent Application Publication No.: 2016/0334998 (Hereinafter “George”).
Regarding claim 1, Dirac teaches, a method for managing data, the method comprising: 
obtaining data from a host (Dirac [0097]: A given provider network may include numerous data centers hosting various resource pools, such as collections of physical and/or virtualized computer servers, storage devices, networking equipment and the like, needed to implement, configure and distribute the infrastructure and services offered by the provider.); 
applying an erasure coding procedure (Dirac [0105]: erasure coding) to the data to obtain a plurality of data chunks and at least one parity chunk (Dirac [0163]: The concatenated address space of data set 1804 may then be sub-divided into a plurality of contiguous chunks, as indicated in chunk mapping 1806.);
deduplicating (Dirac [Abstract]: At a machine learning service, a determination is made that an analysis to detect whether at least a portion of contents of one or more observation records of a first data set are duplicated in a second set of observation records is to be performed.  A duplication metric is obtained, indicative of a non-zero probability that one or more observation records of the second set are duplicates of respective observation records of the first set.) the plurality of data chunks to obtain a plurality of deduplicated data chunks (Dirac [0204]: In various embodiments, consistent splits may be performed at the chunk level, at the observation record level, or at some combination of chunk and record levels, using consistency metadata of the kind described above.  In at least one embodiment, after a chunk-level split is performed, the records of the individual chunks in the training set or the test set may be shuffled prior to use for training/evaluating a model.); 
generating storage metadata associated with the plurality of deduplicated data chunks and the at least one parity chunk (Dirac [0202]: The plan generator may determine a set of consistency metadata 3152, e.g., metadata that may be shared among related jobs that are inserted in the MLS job queue for the requested split iterations.  The metadata 3152 may comprise the client-provided seed values 3120, for example.); 
Dirac does not clearly teach, storing the storage metadata in an accelerator pool; storing, across a plurality of fault domains, the plurality of deduplicated data chunks and the at least one parity chunk; and initiating storage metadata distribution on the storage metadata across the plurality of fault domains. However, George [0017] teaches, “Object storage involves storing chunks of data in an object, with each object including metadata and a unique identifier.  Distributed storage systems can also be applied to other types of data storage such as block storage and file storage, for example.  In block storage, data can be stored in blocks (or volumes) where each block acts as an individual hard drive.  File storage is generally a hierarchical way of organizing files containing data such that an individual file can be located by a path to that file.  Certain metadata describing the file and its contents is also typically stored in the file system.  In distributed storage systems, multiple replicas of data in any suitable type of structure (e.g., objects, files, blocks, etc.) can be maintained in order to provide fault tolerance and high availability.”
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to incorporate the teaching of Dirac et al. to the George’s system by adding the feature of storage metadata. Ordinary skilled artisan would have been motivated to do so to provide Dirac’s system with enhanced data storage. (See George [Abstract], [0017], [0028]). In addition, the references (Dirac and George) teach features that are analogous art and they are directed to the same field of endeavor, such as data storage. This close relation suggests a high expectation of success when combined.
Regarding claim 2, the method of claim 1, further comprising: identifying a storage metadata failure of the storage metadata in the accelerator pool; sending a storage metadata request to at least one fault domain; obtaining a storage metadata response from the at least one fault domain; and performing a storage metadata reconstruction of storage metadata on the accelerator pool using at least the storage metadata response (George [0017]: Object storage involves storing chunks of data in an object, with each object including metadata and a unique identifier.  Distributed storage systems can also be applied to other types of data storage such as block storage and file storage, for example.  In block storage, data can be stored in blocks (or volumes) where each block acts as an individual hard drive.  File storage is generally a hierarchical way of organizing files containing data such that an individual file can be located by a path to that file.  Certain metadata describing the file and its contents is also typically stored in the file system.  In distributed storage systems, multiple replicas of data in any suitable type of structure (e.g., objects, files, blocks, etc.) can be maintained in order to provide fault tolerance and high availability.).
Regarding claim 3, the method of claim 2, wherein the storage metadata response includes at least a portion of the storage metadata (Dirac [0203]: In some implementations, a job object created by the plan generator 3180 may include a reference or pointer to the consistency metadata to be used for that job.  In another implementation, at least a portion of the consistency metadata 3152 may be included within a job object.  When a job is executed, the metadata 3152 may be used to ensure that the input data set is split consistently.). 
Regarding claim 4, the method of claim 1, wherein each deduplicated data chunk of the plurality of data chunks is stored in a data node of each fault domain of the plurality of fault domains, and wherein the storage metadata is stored in the data node of each fault domain of the plurality of fault domains (Dirac [0169]: In at least some embodiments, there need not be a 1:1 relationship between chunks and MLS servers--e.g., a given MLS server may be configurable to store multiple chunks of a data set.  In some embodiments, partial chunks or subsets of chunks may also be stored at an MLS server--e.g., the number of chunks stored in a given server's memory need not be an integer.  In various embodiments, in addition to chunk-level filtering operations, intra-chunk and/or cross-chunk filtering operations (e.g., at the observation record level) may be performed as described below in further detail, which may help to further reduce the loss of statistical quality.).
Regarding claim 5, the method of claim 1, wherein each deduplicated data chunk of the plurality of data chunks is stored in a unique fault domain of the plurality of fault domains, and wherein a copy of the storage metadata is stored in each fault domain of the plurality of fault domains (Dirac [0347]: FIG. 70 illustrates an example duplicate detector that may utilize space-efficient representations of machine learning data sets to determine whether one data set is likely to include duplicate observation records of another data set at a machine learning service, according to at least some embodiments.).
Regarding claim 6, the method of claim 1,
wherein storing the plurality of deduplicated data chunks and the at least one parity chunk comprises: storing a deduplicated data chunk of the plurality of deduplicated data chunks on a first data node in a fault domain of the plurality of fault domains (Dirac [0348]: In some embodiments, the alternate representation may be generated and stored in parallel with the training of the model, so that, for example, only a single pass through the training data set 7002 may be needed for both (a) training the model and (b) creating and storing the alternate representation 7030.  The alternate representation may require much less (e.g., orders of magnitude less) storage or memory than is occupied by the training data set itself in some implementations.),
wherein initiating storage metadata distribution on the storage metadata across the plurality of fault domains comprises: initiating storage of a copy of the storage metadata on a second data node in the fault domain (Dirac [0179]: In one embodiment, the OR extraction request 2401 may include compression metadata 2406, indicating for example the compression algorithm used for the data set, the sizes of the units or blocks in which the compressed data is stored (which may differ from the sizes of the chunks on which chunk-level in-memory filtering operations are to be performed), and other information that may be necessary to correctly de-compress the data set.  Decryption metadata 2408 such as keys, credentials, and/or an indication of the encryption algorithm used on the data set may be included in a request 2401 in some embodiments.  Authorization/authentication metadata 2410 to be used to be able to obtain read access to the data set may be provided by the client in request 2401 in some implementations and for certain types of data sources.  Such metadata may include, for example, an account name or user name and a corresponding set of credentials, or an identifier and password for a security container (similar to the security containers 390 shown in FIG. 3).).
Regarding claim 7, the method of claim 1, wherein a non-accelerator pool comprises the plurality of fault domains (Dirac [0092]: The data plane of the MLS may include, for example, at least a subset of the servers of pool(s) 185, storage devices that are used to store input data sets, intermediate results or final results (some of which may be part of the MLS artifact repository), and the network pathways used for transferring client input data and results.).
Regarding claim 8, the method of claim 1, wherein the storage metadata includes at least location information of: at least one of the plurality of deduplicated data chunks and of the at least one parity chunk (Dirac [0185]: After the first filtering operation of the sequence is performed in memory at the MLS servers, the remaining filtering operations (if any) may be performed in place in the depicted embodiment, e.g., without copying the chunks to persistent storage or re-reading the chunks for their original source locations (element 2519).).
Regarding claim 9, Dirac teaches, a non-transitory computer readable medium comprising computer readable program code, which when executed by a computer processor enables the computer processor to perform a method for managing data, the method comprising:
obtaining data from a host (Dirac [0097]: A given provider network may include numerous data centers hosting various resource pools, such as collections of physical and/or virtualized computer servers, storage devices, networking equipment and the like, needed to implement, configure and distribute the infrastructure and services offered by the provider.); 
applying an erasure coding procedure (Dirac [0105]: erasure coding) to the data to obtain a plurality of data chunks and at least one parity chunk (Dirac [0163]: The concatenated address space of data set 1804 may then be sub-divided into a plurality of contiguous chunks, as indicated in chunk mapping 1806.);
deduplicating (Dirac [Abstract]: At a machine learning service, a determination is made that an analysis to detect whether at least a portion of contents of one or more observation records of a first data set are duplicated in a second set of observation records is to be performed.  A duplication metric is obtained, indicative of a non-zero probability that one or more observation records of the second set are duplicates of respective observation records of the first set.) the plurality of data chunks to obtain a plurality of deduplicated data chunks (Dirac [0204]: In various embodiments, consistent splits may be performed at the chunk level, at the observation record level, or at some combination of chunk and record levels, using consistency metadata of the kind described above.  In at least one embodiment, after a chunk-level split is performed, the records of the individual chunks in the training set or the test set may be shuffled prior to use for training/evaluating a model.);
generating storage metadata associated with the plurality of deduplicated data chunks and the at least one parity chunk (Dirac [0202]: The plan generator may determine a set of consistency metadata 3152, e.g., metadata that may be shared among related jobs that are inserted in the MLS job queue for the requested split iterations.  The metadata 3152 may comprise the client-provided seed values 3120, for example.);
Dirac does not clearly teach, storing the storage metadata in an accelerator pool; storing, across a plurality of fault domains, the plurality of deduplicated data chunks and the at least one parity chunk; and initiating storage metadata distribution on the storage metadata across the plurality of fault domains. However, George [0017] teaches, “Object storage involves storing chunks of data in an object, with each object including metadata and a unique identifier.  Distributed storage systems can also be applied to other types of data storage such as block storage and file storage, for example.  In block storage, data can be stored in blocks (or volumes) where each block acts as an individual hard drive.  File storage is generally a hierarchical way of organizing files containing data such that an individual file can be located by a path to that file.  Certain metadata describing the file and its contents is also typically stored in the file system.  In distributed storage systems, multiple replicas of data in any suitable type of structure (e.g., objects, files, blocks, etc.) can be maintained in order to provide fault tolerance and high availability.”
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to incorporate the teaching of Dirac et al. to the George’s system by adding the feature of storage metadata. Ordinary skilled artisan would have been motivated to do so to provide Dirac’s system with enhanced data storage. (See George [Abstract], [0017], [0028]). In addition, the references (Dirac and George) teach features that are analogous art and they are directed to the same field of endeavor, such as data storage. This close relation suggests a high expectation of success when combined.
Regarding claim 10, the non-transitory computer readable medium of claim 9, the method further comprising:
identifying a storage metadata failure of the storage metadata in the accelerator pool; sending a storage metadata request to at least one fault domain; obtaining a storage metadata response from the at least one fault domain; and performing a storage metadata reconstruction of storage metadata on the accelerator pool using at least the storage metadata response (George [0017]: Object storage involves storing chunks of data in an object, with each object including metadata and a unique identifier.  Distributed storage systems can also be applied to other types of data storage such as block storage and file storage, for example.  In block storage, data can be stored in blocks (or volumes) where each block acts as an individual hard drive.  File storage is generally a hierarchical way of organizing files containing data such that an individual file can be located by a path to that file.  Certain metadata describing the file and its contents is also typically stored in the file system.  In distributed storage systems, multiple replicas of data in any suitable type of structure (e.g., objects, files, blocks, etc.) can be maintained in order to provide fault tolerance and high availability.).
Regarding claim 11, the non-transitory computer readable medium of claim 10, wherein the storage metadata response includes at least a portion of the storage metadata (Dirac [0203]: In some implementations, a job object created by the plan generator 3180 may include a reference or pointer to the consistency metadata to be used for that job.  In another implementation, at least a portion of the consistency metadata 3152 may be included within a job object.  When a job is executed, the metadata 3152 may be used to ensure that the input data set is split consistently.).
Regarding claim 12, the non-transitory computer readable medium of claim 9, wherein each deduplicated data chunk of the plurality of data chunks is stored in a data node of each fault domain of the plurality of fault domains, and wherein the storage metadata is stored in the data node of each fault domain of the plurality of fault domains (Dirac [0169]: In at least some embodiments, there need not be a 1:1 relationship between chunks and MLS servers--e.g., a given MLS server may be configurable to store multiple chunks of a data set.  In some embodiments, partial chunks or subsets of chunks may also be stored at an MLS server--e.g., the number of chunks stored in a given server's memory need not be an integer.  In various embodiments, in addition to chunk-level filtering operations, intra-chunk and/or cross-chunk filtering operations (e.g., at the observation record level) may be performed as described below in further detail, which may help to further reduce the loss of statistical quality.).
Regarding claim 13, the non-transitory computer readable medium of claim 9, wherein each deduplicated data chunk of the plurality of data chunks is stored in a unique fault domain of the plurality of fault domains, and wherein a copy of the storage metadata is stored in a second data node of each fault domain of the plurality of fault domains (Dirac [0347]: FIG. 70 illustrates an example duplicate detector that may utilize space-efficient representations of machine learning data sets to determine whether one data set is likely to include duplicate observation records of another data set at a machine learning service, according to at least some embodiments.).
Regarding claim 14, the non-transitory computer readable medium of claim 9, 
wherein storing the plurality of deduplicated data chunks and the at least one parity chunk comprises: storing a deduplicated data chunk of the plurality of deduplicated data chunks on a first data node in a fault domain of the plurality of fault domains (Dirac [0348]: In some embodiments, the alternate representation may be generated and stored in parallel with the training of the model, so that, for example, only a single pass through the training data set 7002 may be needed for both (a) training the model and (b) creating and storing the alternate representation 7030.  The alternate representation may require much less (e.g., orders of magnitude less) storage or memory than is occupied by the training data set itself in some implementations.), 
wherein initiating storage metadata distribution on the storage metadata across the plurality of fault domains comprises: initiating storage of a copy of the storage metadata on a second data node in the fault domain (Dirac [0179]: In one embodiment, the OR extraction request 2401 may include compression metadata 2406, indicating for example the compression algorithm used for the data set, the sizes of the units or blocks in which the compressed data is stored (which may differ from the sizes of the chunks on which chunk-level in-memory filtering operations are to be performed), and other information that may be necessary to correctly de-compress the data set.  Decryption metadata 2408 such as keys, credentials, and/or an indication of the encryption algorithm used on the data set may be included in a request 2401 in some embodiments.  Authorization/authentication metadata 2410 to be used to be able to obtain read access to the data set may be provided by the client in request 2401 in some implementations and for certain types of data sources.  Such metadata may include, for example, an account name or user name and a corresponding set of credentials, or an identifier and password for a security container (similar to the security containers 390 shown in FIG. 3).).
Regarding claim 15, the non-transitory computer readable medium of claim 9, wherein a non-accelerator pool comprises the plurality of fault domains (Dirac [0092]: The data plane of the MLS may include, for example, at least a subset of the servers of pool(s) 185, storage devices that are used to store input data sets, intermediate results or final results (some of which may be part of the MLS artifact repository), and the network pathways used for transferring client input data and results.). 
Regarding claim 16, the non-transitory computer readable medium of claim 9, wherein the storage metadata includes at least location information of: at least one of the plurality of deduplicated data chunks and of the at least one parity chunk (Dirac [0185]: After the first filtering operation of the sequence is performed in memory at the MLS servers, the remaining filtering operations (if any) may be performed in place in the depicted embodiment, e.g., without copying the chunks to persistent storage or re-reading the chunks for their original source locations (element 2519).).
Regarding claim 17, Dirac teaches, a data cluster, comprising: 
a host; and an accelerator pool comprising (Dirac [0097]: A given provider network may include numerous data centers hosting various resource pools, such as collections of physical and/or virtualized computer servers, storage devices, networking equipment and the like, needed to implement, configure and distribute the infrastructure and services offered by the provider.) a plurality of data nodes (Dirac [0216]: In addition to the predicates to be evaluated at each node, a respective predictive utility metric (PUM) 3434 may also be generated for some or all of the nodes of tree 3433 in the depicted embodiment and stored in persistent storage--e.g., PUM 3434A may be computed and stored for node N1, PUM 3434B for node N2, and so on.  Generally speaking, the PUM of a given node may be indicative of the relative contribution or usefulness of that node with respect to the predictions that can be made using all the nodes.), 
wherein a data node of the plurality of data nodes comprises a processor and memory comprising instructions, which when executed by the processor perform a method, the method comprising (Dirac [0372]: In the illustrated embodiment, computing device 9000 includes one or more processors 9010 coupled to a system memory 9020 (which may comprise both non-volatile and volatile memory modules) via an input/output (I/O) interface 9030.): 
obtaining data from the host (Dirac [0097]: A given provider network may include numerous data centers hosting various resource pools, such as collections of physical and/or virtualized computer servers, storage devices, networking equipment and the like, needed to implement, configure and distribute the infrastructure and services offered by the provider.); 
applying an erasure coding procedure (Dirac [0105]: erasure coding) to the data to obtain a plurality of data chunks and at least one parity chunk (Dirac [0163]: The concatenated address space of data set 1804 may then be sub-divided into a plurality of contiguous chunks, as indicated in chunk mapping 1806.);
deduplicating (Dirac [Abstract]: At a machine learning service, a determination is made that an analysis to detect whether at least a portion of contents of one or more observation records of a first data set are duplicated in a second set of observation records is to be performed.  A duplication metric is obtained, indicative of a non-zero probability that one or more observation records of the second set are duplicates of respective observation records of the first set.) the plurality of data chunks to obtain a plurality of deduplicated data chunks (Dirac [0204]: In various embodiments, consistent splits may be performed at the chunk level, at the observation record level, or at some combination of chunk and record levels, using consistency metadata of the kind described above.  In at least one embodiment, after a chunk-level split is performed, the records of the individual chunks in the training set or the test set may be shuffled prior to use for training/evaluating a model.);
generating storage metadata associated with the plurality of deduplicated data chunks and the at least one parity chunk (Dirac [0202]: The plan generator may determine a set of consistency metadata 3152, e.g., metadata that may be shared among related jobs that are inserted in the MLS job queue for the requested split iterations.  The metadata 3152 may comprise the client-provided seed values 3120, for example.);
Dirac does not clearly teach, storing the storage metadata in the accelerator pool; storing, across a plurality of fault domains, the plurality of deduplicated data chunks and the at least one parity chunk, wherein a non- accelerator pool comprises the plurality of fault domains; and initiating storage metadata distribution on the storage metadata across the plurality of fault domains. However, George [0017] teaches, “Object storage involves storing chunks of data in an object, with each object including metadata and a unique identifier.  Distributed storage systems can also be applied to other types of data storage such as block storage and file storage, for example.  In block storage, data can be stored in blocks (or volumes) where each block acts as an individual hard drive.  File storage is generally a hierarchical way of organizing files containing data such that an individual file can be located by a path to that file.  Certain metadata describing the file and its contents is also typically stored in the file system.  In distributed storage systems, multiple replicas of data in any suitable type of structure (e.g., objects, files, blocks, etc.) can be maintained in order to provide fault tolerance and high availability.”
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to incorporate the teaching of Dirac et al. to the George’s system by adding the feature of storage metadata. Ordinary skilled artisan would have been motivated to do so to provide Dirac’s system with enhanced data storage. (See George [Abstract], [0017], [0028]). In addition, the references (Dirac and George) teach features that are analogous art and they are directed to the same field of endeavor, such as data storage. This close relation suggests a high expectation of success when combined.
Regarding claim 18, the data cluster of claim 17, the method further comprising: identifying a storage metadata failure of the storage metadata in the accelerator pool; sending a storage metadata request to at least one fault domain; obtaining a storage metadata response from the at least one fault domain; and performing a storage metadata reconstruction of storage metadata on the accelerator pool using at least the storage metadata response (George [0017]: Object storage involves storing chunks of data in an object, with each object including metadata and a unique identifier.  Distributed storage systems can also be applied to other types of data storage such as block storage and file storage, for example.  In block storage, data can be stored in blocks (or volumes) where each block acts as an individual hard drive.  File storage is generally a hierarchical way of organizing files containing data such that an individual file can be located by a path to that file.  Certain metadata describing the file and its contents is also typically stored in the file system.  In distributed storage systems, multiple replicas of data in any suitable type of structure (e.g., objects, files, blocks, etc.) can be maintained in order to provide fault tolerance and high availability.).
Regarding claim 19, the data cluster of claim 17, wherein each deduplicated data chunk of the plurality of data chunks is stored in a first data node of a fault domain of the plurality of fault domains, and wherein the storage metadata is stored in a second data node of each fault domain of the plurality of fault domains (Dirac [0169]: In at least some embodiments, there need not be a 1:1 relationship between chunks and MLS servers--e.g., a given MLS server may be configurable to store multiple chunks of a data set.  In some embodiments, partial chunks or subsets of chunks may also be stored at an MLS server--e.g., the number of chunks stored in a given server's memory need not be an integer.  In various embodiments, in addition to chunk-level filtering operations, intra-chunk and/or cross-chunk filtering operations (e.g., at the observation record level) may be performed as described below in further detail, which may help to further reduce the loss of statistical quality.).
Regarding claim 20, the data cluster of claim 17, 
wherein storing the plurality of deduplicated data chunks and the at least one parity chunk comprises: storing a deduplicated data chunk of the plurality of deduplicated data chunks on a first data node in a fault domain of the plurality of fault domains (Dirac [0348]: In some embodiments, the alternate representation may be generated and stored in parallel with the training of the model, so that, for example, only a single pass through the training data set 7002 may be needed for both (a) training the model and (b) creating and storing the alternate representation 7030.  The alternate representation may require much less (e.g., orders of magnitude less) storage or memory than is occupied by the training data set itself in some implementations.), 
wherein initiating storage metadata distribution on the storage metadata across the plurality of fault domains comprises: initiating storage of a copy of the storage metadata on a second data node in the fault domain (Dirac [0179]: In one embodiment, the OR extraction request 2401 may include compression metadata 2406, indicating for example the compression algorithm used for the data set, the sizes of the units or blocks in which the compressed data is stored (which may differ from the sizes of the chunks on which chunk-level in-memory filtering operations are to be performed), and other information that may be necessary to correctly de-compress the data set.  Decryption metadata 2408 such as keys, credentials, and/or an indication of the encryption algorithm used on the data set may be included in a request 2401 in some embodiments.  Authorization/authentication metadata 2410 to be used to be able to obtain read access to the data set may be provided by the client in request 2401 in some implementations and for certain types of data sources.  Such metadata may include, for example, an account name or user name and a corresponding set of credentials, or an identifier and password for a security container (similar to the security containers 390 shown in FIG. 3).). 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Redlich, US 2009/0254572, Digital Information Infrastructure and Method 
Becker-Szendy, US 2011/0302446, Monitoring lost data in a storage system
Deenadhayalan, US 2008/0282105, Data Integrity Validation in Storage Systems
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SABA AHMED whose telephone number is (571)270-0236.  The examiner can normally be reached on MON – FRI: 9AM – 5PM EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain Alam can be reached on 571-272-3978. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SABA AHMED/
Examiner, Art Unit 2154

/HOSAIN T ALAM/Supervisory Patent Examiner, Art Unit 2154