DETAILED ACTION
Claims 1, 3-6, 8-11, 13-14 and 16-21 are pending in this action.  Of those, claims 6, 14, 17-21 are objected to as depending on a rejected claim.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 23 Nov 21 has been entered.

Response to Amendment
All rejections are withdrawn.  See new grounds of rejection, presented below.

Response to Argument
The arguments were considered, but are moot in view of new grounds of rejection.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1, 3, 8-10 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Apache Spark Checkpoints and Metadata Checkpoints, as used in versions on or before 2.4.4 (hereinafter Spark Checkpoints)
w/evidence of features from:
Koineczny, Spark Streaming checkpointing and Write Ahead Logs
(https://www.waitingforcode.com/apache-spark-streaming/spark-streaming-checkpointing-and-write-ahead-logs/read) hereinafter Spark-Konieczny+

Konieczny, Metadata Checkpoint (https://www.waitingforcode.com/apache-spark-streaming/metadata-checkpoint/read) hereinafter Spark-Konieczny2+

Laskowski, The Internals of Spark Structured Streaming (Apache Spark 2.4.4), specifically citing the following+
Introduction (https://jaceklaskowski.gitbooks.io/spark-structured-streaming/content/) attesting authorship to Jacek Laskowski
StreamExecution (https://jaceklaskowski.gitbooks.io/spark-structured-streaming/content/spark-sql-streaming-StreamExecution.html) hereinafter Spark- LaskowskiStreamExecution
MetadataLogFileIndex (https://jaceklaskowski.gitbooks.io/spark-structured-streaming/content/spark-sql-streaming-MetadataLogFileIndex.html) hereinafter Spark-LaskowskiMetadataLogFileIndex
FileStreamSinkLog (https://jaceklaskowski.gitbooks.io/spark-structured-streaming/content/spark-sql-streaming-FileStreamSinkLog.html) hereinafter Spark-LaskowskiFileStreamSinkLog
SinkFileStatus (https://jaceklaskowski.gitbooks.io/spark-structured-streaming/content/spark-sql-streaming-SinkFileStatus.html) hereinafter Spark-LaskowskiSinkFileStatus

Lucas Kim et al., Spark Stream DStream RDDs order (https://stackoverflow.com/questions/42729950/spark-stream-dstream-rdds-order) hereinafter Spark-LucasKim+
With respect to claim 1, Spark Checkpoints performed A method comprising: 
receiving, at a source storage system, one or more updates to one or more datasets stored within the source storage system (Spark-Konieczny, Section Checkpoint writing workflow, Metadata checkpoint may be triggered by Dstream operations and batches.); 
generating, based on the one or more updates to the one or more datasets, metadata describing the one or more updates to one or more datasets stored within the source storage system, wherein the metadata does not store data included in the one or more updates to the one or more datasets (Spark-Konieczny2, Section Does metadata checkpoint store RDD?  Attests that metadata checkpoint does not store data, and provides documentary evidence of this) and includes references to data within the source storage system (The following chapters from the Laskowki 
generating, based on the metadata describing the one or more updates to the one or more datasets, a checkpoint comprising a portion of the metadata, wherein the checkpoint describes an ordered application of the one or more updates to the one or more datasets (Spark-Konieczny, Section Importance of checkpoints, metadata stored includes Dstream operations.  Spark-LucasKim, “In Spark, operations such as map, filter, flatMap, and coalesce ensure order.” “It works the same in streaming.”  Thus, Dstream operations are an ordered application of updates to the RDDs in a stream in relation to a previous checkpoint.) in relation to a previous checkpoint (implicit in the art of checkpoints); and 
sending, from the source storage system to a target storage system and independent from sending the checkpoint to the target storage system, data corresponding to the checkpoint (Spark-Koineczny, Section Importance of checkpoints, a data checkpoint (as opposed to a metadata checkpoint) relates to the RDDs.  Also, metadata checkpoint is presumed to be generated off a Dstream or batch processing, so streaming presumptively occurs), wherein the source storage system and the target storage system are different storage systems (implicit to streaming).



With respect to claim 9, the claim is directed to an apparatus claim with analogous limitations to claim 1 (specifically, the “independently receiving data” aspect), and is mapped to Spark Checkpoints accordingly.

With respect to claims 3, 10, Spark Checkpoints performed generating an ordered log of metadata comprising one or more checkpoints (Spark-LaskowskiStreamExecution Metadata Log and Metadata Checkpoint are synonymous.  A Spark metadata checkpoint inherently comprises itself.  Since some Dstream operations are ordered, the log also must be ordered.).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 4, 5, 11, 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Spark Checkpoints as applied to claims 1, 3, 8, 10, in view of Guney et al. (US 2017/0083598 A1) hereinafter Guney
With respect to claims 4, 11, respectively dependent upon 3, 10, Spark Checkpoints does not teach wherein a quantity of updates to the one or more datasets described by a given checkpoint is specified by a configurable data replication setting of the first storage system.  

Guney teaches wherein a quantity of updates to the one or more datasets described by a given checkpoint is specified by a configurable data replication setting of the first storage system (Fig. 10 and [0131]-[0143] generally describes a method of updating multiple servers using checkpoints.  [0144]-[0146] explains there is a “normal rate” of updates and a “fallen behind” state based on a “roll call” that may necessitate a “catchup rate” of data updates, which is generally described by Fig. 11.  [0149] provides that an expected number of updates per unit time for a normal state, the same for a fallen behind state, and update interval, is stored in “a configuration database or other similar data store”  Thus, it shows an RPO with respect to updates per checkpoint defined by unit time.).  

Spark is an analytics framework for large scale data processing including checkpoint support, and Guney uses data analytics on large scale data updates via checkpoints.  It would have been obvious to those of ordinary skill in the art at the time of filing to combine the teachings of Spark Checkpoints and Guney in order to maintain data coherence in a distributed environment with respect to redundant data.

With respect to claim 5, 13 respectively dependent upon claims 4, 11, Guney teaches the configurable data replication setting is a target recovery point objective ([0143]-[0146] the “normal state” is the target RPO).

Claim 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Spark Checkpoints in view of Fleming et al. (US 6,023,772) hereinafter Fleming


receiving, from the second storage system, the data corresponding to the ordered log of metadata (Col 10 lines 23-34, each operation in a log is numbered, thus ordered.) responsive to a fault at the first storage system (compare Fig. 3 and Fig. 4.  Col 6 lines 23-27, Fig. 3 is directed to normal operation and Fig. 4 is failover/recovery.  Fig. 3 elements 54-56 show that operations performed on primary are logged and replicated on the secondary.  Fig. 4 element 45 during primary failure, the secondary unit to the primary.  Abstract, failed drive take role of secondary (outputting its application messages as the recover-unit output messages); and 
generating, on a source data repository of the first storage system and based on the ordered log of metadata received from the second storage system, at least a portion of the one or more datasets in accordance with the one or more updates corresponding to a specified point in time (per above, since roles have switched, the “source data repository” is now responsible for replicating per the log.).

The references are directed to checkpointing.  It would have been obvious to those of ordinary skill in the art at the time of filing to combine the teachings of the references in order to permit the distributed database as a whole to continue to operate despite an unexpected failure of one source.

Allowable Subject Matter
Claims 6, 14 and 21 (but see next objection with respect to claim 21) are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The reasons for allowance for the 100ms claim are still valid.  Additionally, the checkpoint interval for Spark is typically measured on the order of seconds, not milliseconds.

Claims 17-21 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claim 17 requires at least one location of data is available based on garbage collection on the source data repository using both a reference table maintained by a storage controller and also a list of references within the ordered log of metadata.

Lakhamraju was previously used to address the claims because Lakhamraju itself describes an implementation of garbage collection as part of a reorganization process in an object oriented database.  Garbage collection is typically an affair of memory for a single storage system, and thus does not fit the limitations of the source and target systems being different storage systems.

There are a number of references that address both “garbage collection” and “recovery” but those typically use the term “recovery” as in “recovery of allocated memory locations.”  That is not the same kind of recovery as used in the Fleming reference because the former involves “removing” 1 whereas the failover environment involves restoring data to a consistent state (which typically involves adding data as well as removing it.).

Removing the references describing source/target as part of the garbage collector itself from consideration (such as Lakhamraju), the examiner was unable to find references that determine location of data based on garbage collection in the manner claimed.

Remarks
“Multiple reference” anticipation rejections are permitted in limited circumstances.  MPEP 2131.01.  In re Epstein, 32 F.3d 1559, 1567, (Fed. Cir. 1994) (where a prior art anticipation rejection, based on the cumulative disclosure of multiple abstracts to teach the features of a claim under a theory of in use/on sale, was affirmed).  Versions of Spark were publically available before the filing date, and the multiple documents are being used of those features.  Note that per Epstein, the use of documents directed to different versions of software can be still used to meet the preponderance of the evidence standard if the features of the software are not the kind that would tend to change between versions.

The USPTO’s new search tool, SEARCH, is still being rolled out and developed, and does not (yet) support searching the Brief Description of Drawings (.drwd.) so EAST was also used for that limited purpose.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
 StreamExecution – Base of Streaming Query Executions (http://spark.coolplayer.net/?p=3134)+
A copy of a page from the Laskowski, The Internals of Spark Structured Streaming.  It is mostly identical except (1) the embedded .png files are broken links, (2) the post date is 1 Sept 15.  This establishes evidence of a publication date for Laskowski.  

It was necessary to use this page to establish a date of publication because the Laskowski reference itself has no publication date, and the archive.org captures that the examiner found were dated 2020 at the earliest.

The “about the author” page does not give a name, but provides an email address of sunbiaobiaooo@gmail.com.  In absence of better evidence, the author of this web page is considered as “Sun Biaobiao.”

Rami et al., How to set checkpoint Interval for spark streaming checkpointing?  (https://stackoverflow.com/questions/37444437/how-to-set-checkpoint-interval-for-spark-streaming-checkpointing)+
Typical interval of checkpoints is measured on the order of seconds (10-15 seconds).  Output code in script by pangpang shows interval measurements measured in milliseconds, but checkpoint interval (where premature sampling provides a null result) is 60,000 ms.  This is magnitudes above the claimed 100ms or below.



Spark News (https://spark.apache.org/news/index.html)+
Provides the public availability of various versions of Apache Spark.  Per the press releases, Spark 2.4.4 is the last version that predates applicants’ claim to priority to the provisional application, as well as the actual filing date.  Note, that per the press release order, version 2.3.4 (9 Sept 19) appear to have been available after 2.4.4 (1 Sept 19), but this does not set up an intervening scenario since the provisional application was filed on 13 Sept 19, which postdates any version of Spark referred to in any of the evidentiary references.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASON G LIAO whose telephone number is (571)270-3775. The examiner can normally be reached M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara Kyle can be reached on 571-272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.






/JASON G LIAO/Primary Examiner, Art Unit 2156                                                                                                                                                                                                        15 Jan 22


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 “Removing” is in quotes because from the perspective of the actual magnetic storage, this isn’t strictly true because the bits in the memory location are not usually changed.  But from the perspective of the application, that data location (and thus the data) should no longer be accessed.