DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-4, 7-14, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over US 20200050594 A1; Tidwell; Kenny et al. (hereinafter Tidwell) in view of Johnson; Theodore et al.; US 20180139118 A1 (hereinafter Johnson) and US 20210037112 A1; ANKIREDDYPALLE; Ramachandra Reddy et al. (hereinafter Ank).
Regarding claim 1, Tidwell teaches A computer-implemented method comprising: … a data lake storing a plurality of data lake records partitioned across a plurality of data lake partitions each identified by a respective data lake partition identifier retrieving from a data lake update service a subset of the data lake partition identifiers (Tidwell [0036] Listener 120 is a component of the ECMS 102 that receives raw data streams and writes the raw data streams to a data lake 130. Listener 120 listens for raw data streams from many different data sources 105A-N. Listener 120 creates a separate raw data stream record in the data lake 130 for each retrieving one or more of the data lake records via a communication interface using one or more queries that each include a respective one or more of the subset of the data lake partition identifiers; (Tidwell [0037] Data lake 130 is a large object-based data store 135 accompanied by a processing engine (data store interface 135) to operate on data in the data store 135. Data lake 130 may be capable of storing and operating on any type of data, regardless of a format of that data. Data lake 130 stores data such as raw data streams in a native format of the data. Examples of data lakes include Azure Data Lake.RTM., Kafka.RTM., Rabbit MQ.RTM., and Hadoop.RTM.. Data store interface 135 receives read and write requests, and performs reads to the data store 140 and writes from the data store 140 responsive to those read and write requests. For example, data store interface 135 may receive write requests from listener 120 to write messages containing log data of a raw data stream to a raw data stream record. Data store interface 135 may also respond to read and write requests from indexer 150.  [0059] Log separator 315 retrieves raw log data 305 from raw data stream records in the data lake 130. The raw log data 305 may be log data having an original format that the log data had when it was initially created, or close thereto. Alternatively, the raw log data may be log data that has been minimally modified (e.g., by tagging the log data with a source ID and/or a source type). The raw log data 305 may be retrieved by issuing read commands to data store interface 135 of the data lake 130. Responsive to receiving raw log data 305, log separator 315 determines whether the source type is known for the data source object associated with the raw log data 305. In one embodiment, log separator 315 determines the data source generating a transformed one or more records by applying a transformation function to the retrieved one or more records; and transmitting the transformed one or more records to a downstream data service, the transformed one or more records being associated with the designated period of time (Ank [0301] After another period of time (e.g., as defined by the storage policy), the media agent 144 can convert the file F1 from the native format into a secondary copy format. The media agent 144 may stored the converted file F1 in the same location on the low speed drive(s) 320 or on a different location on the low speed drive(s) 320. Thus, the file F1 can be stored in a secondary copy format on the low speed drive(s) 320 at (4). At some later time (e.g., as defined by the storage policy), the media agent 144 can move the file F1 in the secondary copy format to one or more of the secondary storage devices 108, which can be local to the system 300 or located remotely from the receiving at a computing device having a processor and memory a transform request identifying a temporal checkpoint associated with a data lake storing a plurality of data lake records partitioned across a plurality of data lake partitions each identified by a respective data lake partition identifier; (Johnson [0003] In one example, the present disclosure discloses a device, method and computer-readable medium for recovering a replica in an operator in a data streaming processing system. A method may obtain a checkpoint in an input data stream, determine a maximum-timestamp at the checkpoint in the input data stream, calculate a completeness point that is greater than the maximum-timestamp for an output data stream and process data records from the checkpoint onwards that have a respective timestamp that is greater than or equal to the completeness point that was calculated to generate a new replica to replace a failed replica.[0062] In one embodiment, the method may include obtaining a checkpoint in an input data stream, determining a maximum-timestamp at the checkpoint in the input data stream, each associated with a respective timestamp value later than the temporal checkpoint, each of the subset of the data lake partition identifiers associated with a respective data lake partition including a respective data lake record updated after the temporal checkpoint (Johnson [0003] In one example, the present disclosure discloses a device, method and computer-readable medium for recovering a replica in an operator in a data streaming processing system. A method may obtain a checkpoint in an input data stream, determine a maximum-timestamp at the checkpoint in the input data stream, calculate a completeness point that is greater than the maximum-timestamp for an output data stream and process data records from the checkpoint onwards that have a respective timestamp that is greater than or equal to the completeness point that was calculated to generate a new replica to replace a failed replica.[0062] In one embodiment, the method may include obtaining a checkpoint in an input data stream, determining a maximum-timestamp at the checkpoint in the input data stream, calculating a completeness point that is greater than the maximum-timestamp for an output data stream and processing data records from the checkpoint onwards that have a respective timestamp that is greater than or equal to the completeness point  [0075-0079] further elaborate on the checkpoint/maximum threshold point related corresponding records)								Therefore, it would have been obvious to one of ordinary skill in the art before the 
Corresponding system claim 12 is rejected similarly as claim 1 above. Additional Limitations: Device with processor(s) and memory (Tidewell [FIG.13]Device with processor(s) and memory )
Corresponding product claim 18 is rejected similarly as claim 1 above. Additional Limitations: computer readable medium capable of reading and executing instructions (Tidewell [FIG.13] computer readable medium capable of reading and executing instructions )
Regarding claim 2, the combination of Tidwell, Johnson and Ank teach The method recited in claim 1, wherein a designated one of the data lake partition identifiers is associated with a timestamp that identifies a time at which the respective partition was most recently updated. ( Ank [0308] The snapshot manager 
Corresponding system claim 13 is rejected similarly as claim 2 above
Corresponding product claim 19 is rejected similarly as claim 2 above
Regarding claim 3, the combination of Tidwell, Johnson and Ank teach The method recited in claim 1, the method comprising: wherein a designated one of the data lake partition identifiers is associated with a time window identifier that identifies a period of time during which the data lake partition was most recently updated. (Ank [0085] Metadata can include, without limitation, one or more of the 
Corresponding system claim 14 is rejected similarly as claim 3 above
Corresponding product claim 20 is rejected similarly as claim 3 above.
Regarding claim 3, the combination of Tidwell, Johnson and Ank teach The method recited in claim 1, the method comprising: wherein a designated one of the data lake partition identifiers is associated with a time window identifier that identifies a period of time during which the data lake partition was most recently updated. (Ank [0085] Metadata can include, without limitation, one or more of the following: the data owner (e.g., the client or user that generates the data), the last modified time (e.g., the time of the most recent modification of the data object), a data object name (e.g., a file name), a data object size (e.g., a number of bytes of data), information about the content (e.g., an indication as to the existence of a particular search term), user-supplied tags, to/from information for email (e.g., an email sender, recipient, etc.), creation date, file type (e.g., format or application type), last accessed time, application type (e.g., type of application that generated the data object), location/network (e.g., a current, past or future location of the data object and network pathways to/from the data object), geographic location (e.g., GPS coordinates), frequency of change (e.g., a period in which the data object is modified), business unit 
Corresponding system claim 14 is rejected similarly as claim 3 above
Corresponding product claim 20 is rejected similarly as claim 3 above.
Regarding claim 4, the combination of Tidwell, Johnson and Ank teach The method recited in claim 1, wherein a designated one of the data lake partition identifiers is associated with a pointer to a file in the data lake. (Ank [0173] Some types of snapshots do not actually create another physical copy of all the data as it existed at the particular point in time, but may simply create pointers that map files and directories to specific memory locations (e.g., to specific disk blocks) where the data resides as it existed at the particular point in time. For example, a snapshot copy may include a set of pointers derived from the file system or from an application. In some other cases, the snapshot may be created at the block-level, such that creation of the snapshot occurs without awareness of the file system. Each pointer points to a respective stored data block, so that collectively, the set of pointers reflect the storage location and state of the data object (e.g., file(s) or volume(s) or data set(s)) at the point in time when the snapshot copy was created. [0174] An initial snapshot may use only a small amount of disk space needed to record a mapping or other data structure representing or otherwise tracking the blocks that correspond to the current state of the file system. Additional disk space is usually required only when files and directories change later on. Furthermore, when files change, typically only the pointers which map to blocks are copied, not the blocks themselves. For example for "copy-on-write" snapshots, when a block changes in primary storage, the block is copied to secondary storage or cached in primary storage before the block is overwritten in primary storage, and the pointer to that block is changed to reflect the new location of that block. The snapshot mapping of file system data may also be updated to reflect the changed block(s) at that particular point in time.)
Corresponding system claim 15 is rejected similarly as claim 4 above
Regarding claim 7, the combination of Tidwell, Johnson and Ank teach The method recited in claim 1, wherein the data lake records are stored in one or more third-party cloud computing storage systems (Tidwell [0029] The various computing devices 115, 125, 145, 155, 170 may be connected via one or more networks, which may include a local area network (LAN), a wide area network (WAN) such as the Internet, and or a combination thereof. Additionally, computing devices 115 may be connected to one or more data sources 105A, 105B through 105N via one or more networks. Client computing devices 180 and/or third party computing devices 182 executing third party services 185 may be connected to computing devices 170 via one or more networks.   [0032] For some data sources 105A-N, the listener 120 periodically queries the data source 105A-N for the raw data stream containing the log data. For example, data source 105N may include an account of a third party service such Salesforce.com.RTM., DropBox.RTM., Box.RTM., and so on. In such an instance, listener 120 uses provided account credentials to log into an account of a customer and query the third party service for log data. [0035] In some instances, enterprises may be configured to collect log data for third party systems such as SIEMs. In such an embodiment, the enterprises may additionally send the log data to listener 120. Alternatively, or additionally, listener 120 may receive the log data directly from the SIEMs. Such log data may be received before and/or after the SIEMs operate on the log data. )
Regarding claim 8, the combination of Tidwell, Johnson and Ank teach The method recited in claim 1, wherein transmitting the transformed one or more records comprises writing the transformed one or more records to a database 
Regarding claim 9, the combination of Tidwell, Johnson and Ank teach The method recited in claim 1, the method comprising: updating the data lake service to identify a time checkpoint associated with the transformed one or more records (Ank [0085] Metadata can include, without limitation, one or more of the following: the data owner (e.g., the client or user that generates the data), the last modified time (e.g., the time of the most recent modification of the data object), a data object name (e.g., a file name), a data object size (e.g., a number of bytes of data), information about the content (e.g., an indication as to the existence of a particular search term), user-supplied tags, to/from information for email (e.g., an email sender, recipient, etc.), creation date, file type (e.g., format or application type), last accessed time, application type (e.g., type of application that generated the data object), location/network (e.g., a current, past or future location of the data object and network pathways to/from the data object), geographic location (e.g., GPS coordinates), frequency of change (e.g., a period in which the data object is modified), business unit (e.g., a group or department that generates, manages or is otherwise associated with the data object), aging information (e.g., a schedule, such as a time period, in which the data object is migrated to secondary or long term storage), boot sectors, partition layouts, file location within a file folder directory structure, user permissions, owners, groups, access control lists (ACLs), system metadata (e.g., registry information), combinations of the same or other similar information related to the data object. In addition to metadata generated by or related to file systems and operating systems, some applications 110 and/or other components of system 100 maintain indices of metadata for data objects [0229] frequency with which primary data 112 or a secondary 
Regarding claim 10, the combination of Tidwell, Johnson and Ank teach The method recited in claim 1, wherein the data lake is accessible via an on-demand computing services environment providing computing services to a plurality of organizations via the internet ( Tidewell [0029] The various computing devices 115, 125, 145, 155, 170 may be connected via one or more networks, which may include a local area network (LAN), a wide area network (WAN) such as the Internet, and or a combination thereof. Additionally, computing devices 115 may be connected to one or more data sources 105A, 105B through 105N via one or more networks. Client computing devices 180 and/or third party computing devices 182 executing third party services 185 may be connected to computing devices 170 via one or more networks. [0030] Data sources 105A-N are providers of raw data streams of log data. Data sources 105A-N may be devices in an enterprise environment (e.g., on a network of an enterprise) that produce log data. Examples of such devices include computing devices 
Regarding claim 11, the combination of Tidwell, Johnson and Ank teach The method recited in claim 10, wherein the computing services environment includes a multitenant database that stores information associated with the plurality of organizations ( Tidewell [0029] The various computing devices 115, 125, 145, 155, 170 may be connected via one or more networks, which may include a local area network (LAN), a wide area network (WAN) such as the Internet, and or a combination thereof. Additionally, computing devices 115 may be connected to one or more data sources 105A, 105B through 105N via one or more networks. Client computing devices 180 and/or third party computing devices 182 executing third party services 185 may be connected to computing devices 170 via one or more networks. [0030] Data sources 105A-N are providers of raw data streams of log data. Data sources 105A-N may be devices in an enterprise environment (e.g., on a network of an enterprise) that produce log data. Examples of such devices include computing devices (e.g., server computing devices) that generate system logs, firewalls, routers, identity management systems, 
Claims 5 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over US 20200050594 A1; Tidwell; Kenny et al. (hereinafter Tidwell) in view of Johnson; Theodore et al.; US 20180139118 A1 (hereinafter Johnson) and US 20210037112 A1; ANKIREDDYPALLE; Ramachandra Reddy et al. (hereinafter Ank) and Armbrust et al. Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores. PVLDB, 13(12): 3411-3424, 2020.DOI: https://doi.org/10.14778/3415478.3415560 (hereinafter Armbrust)
Regarding claim 5, the combination of Tidwell and Ank teach The method recited in claim 4, wherein the pointer to the file is a partition key in (Ank [0173] Some types of snapshots do not actually create another physical copy of all the data as it existed at the particular point in time, but may simply create pointers that map files and directories to specific memory locations (e.g., to specific disk blocks) where the data resides as it existed at the particular point in time. For example, a snapshot copy may include a set of pointers derived from the file system or from an application. In some other cases, the snapshot may be created at the block-level, such that creation of a Delta Lake change log table (Armbrust [AB.] In this paper, we present Delta Lake, an open source ACID table storage layer over cloud object stores initially developed at Databricks. Delta Lake uses a transaction log that is compacted into Apache Parquet format to provide ACID properties, time travel, and significantly faster metadata operations for large tabular datasets (e.g., the ability to quickly search billions of table partitions for those relevant to a query). It also leverages this design to provide high-level features such as automatic data layout optimization, upserts, caching, and audit logs. Delta Lake tables can be accessed from Apache 
Corresponding system claim 16 is rejected similarly as claim 5 above
Claims 6 and 17 rejected under 35 U.S.C. 103 as being unpatentable over US 20200050594 A1; Tidwell; Kenny et al. (hereinafter Tidwell) in view of Johnson; Theodore et al.; US 20180139118 A1 (hereinafter Johnson), US 20210037112 A1; ANKIREDDYPALLE; Ramachandra Reddy et al. (hereinafter Ank),  Armbrust et al. Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores. PVLDB, 13(12): 3411-3424, 2020.DOI: https://doi.org/10.14778/3415478.3415560 (hereinafter Armbrust) and US 20190132280 A1; Meuninck; Troy et al (hereinafter Troy)
Regarding claim 6, the combination of Tidwell, Armbrust, Johnson and Ank teach The method recited in claim 4, wherein the pointer to the file is 					the combination lack explicitly teaching a URI independent of a file system underlying the data lake 											However Troy helps teach a URI independent of a file system underlying the data lake (Troy [0031] In some embodiments, the internet protocol (IP) address of the resource(s) within a VPC (e.g., the data lake 122, the data lake 132, the collection of cloud applications 126A-N, the collection of cloud processors 136A-N, and/or the SS 140) may change (a)periodically. Thus, a VPC can include a VPC DNS recursor, such as the VPC DNS recursor 124 for VPCDP1 120 and the VPC DNS recursor 134 for the VPCDP2 130, that can receive and query for DNS zone changes within the VPC, such as by determining an IP address for a unique private resource uniform resource identifier (URI) that is associated with access to one or more of the resources within and/or accessible via the VPC, such as the VPCDP1 120. In some instances, a VPC DNS recursor can provide the unique private resource URI to the PE of the data resource community 110 (e.g., the proxy application 214 of the PE A 210A). Because the IP address associated with the unique private resource URI may change a VPC DNS recursor, such as the VPC DNS recursor 124, may not release or broadcast the IP address associated with the unique private resource URI for the particular resource of the data resource community 110 to data partner enterprise networks (e.g., DPEN1 202A and DPEN2 202B) in order to maintain a federated security policy. Instead, the provider edge (e.g., PE A 210A) of the data resource community 110 can advertise or otherwise provide a BGP update message informing the data partner enterprise 
Corresponding system claim 17 is rejected similarly as claim 6 above
Response to Arguments
Applicant's arguments filed 12/20/2021 have been fully considered
35 USC § 103: 
Regarding Applicant’s Argument (page(s): 7-9): Examiner’s response:- Applicant’s arguments, filed 12/20/2021, with respect to the rejection(s) of under 35 USC § 103  have been fully considered and are persuasive. Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Johnson; Theodore et al.; US 20180139118 A1 (hereinafter Johnson). The examiner recommends further elaborating on the “transformation function” and the corresponding propagation of data based on said “transformation function” in the independent claims.
Conclusion
Applicant’s amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR E 136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.


Any inquiry concerning this communication or earlier communications from the examiner should be directed to ARYAN D TOUGHIRY whose telephone number is (571)272-5212. The examiner can normally be reached Monday - Friday, 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aleksandr Kerzhner can be reached on (571) 270-1760. The fax phone 
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ARYAN D TOUGHIRY/Examiner, Art Unit 2165                                                                                                                                                                                                        
/William B Partridge/Primary Examiner, Art Unit 2183