DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This Final Office Action is in response to amendment filed on 01/26/2021.  Amended claims 1-12, and newly added claims 16-20, filed on 01/26/2021 are being considered on the merits.  
In response to the last Office Action: 
Claims 1-12 have been amended.
Claims 16-20 have been newly added.
Claims 1-20 remain pending in this application.

Response to Arguments
The applicant’s remarks and/or arguments, filed on 01/26/2021 have been fully considered. 
The examiner is entitled to give claim limitations their broadest reasonable interpretation in light of the specification. See MPEP 2111 [R-1] Interpretation of Claims-Broadest Reasonable Interpretation. The applicant always has the opportunity to amend the claims during prosecution, and broad interpretation by the examiner reduces the possibility that the claim, once issued, will be interpreted more broadly than is justified. In re Prater, 162 USPQ 541,550-51 (CCPA 1969).

Applicant's below arguments in the applicant’s remarks of amended independent claims 1, 8, and 11, found on pages 10 and filed on 01/26/2021, have been fully considered but they are not persuasive.

Applicant stated: “Applicant's amended independent claims 1, 8, and 11 are distinguished from Baumgartner at least because Baumgartner fails to disclose, a bypass stage determined based on any of a completion time, an amount of data, and a processing power at a stage of the analytics workflow relative to one or more remaining stages of the analytics workflow. As indicated above, Baumgartner analyzes a particular stage by itself, and not relative to one or more remaining stages. Additionally, Baumgartner fails to disclose, at least, a bypass stage determined based on any of a completion time, an amount of data, and a processing power at a stage."
Regarding the aforementioned claim limitations, Examiner respectfully disagrees.  Examiner asserts that the aforementioned limitation of amended independent claims , 8 and 11, as drafted and given the broadest reasonable interpretation, are disclosed by the combination of BAUMGARTNER and NAEF cited prior arts.  In Particular, BAUMGARTNER discloses determining a bypass stage of the analytics workflow in BAUMGARTNER Fig. 6, Para. [0062]: “The lower part of FIG. 6 shows a record R with multiple data columns. A set of data columns of the record R is being processed in the stage 2, denoted as non-bulk data N, and a set of data columns, denoted as Bypassing bulk columns B, has been identified for bypassing and is not processed in this stage 2. Accordingly, the record R is separated into the two sets of data columns, forming new records, which additionally contain an identification column containing information about the separation of the record R.”; and in Para. [0063]: “The set of data columns N is passed to the transformation of this stage 2 and processed into a transformed record N'. The general structure of the record N is identical to the one of the record R, only that the input data columns, which are bypassed, are removed from this record. Therefore, the record N' can be processed by the stage 2 without modifications to the stage 2. The set of data columns B bypasses the transformation. An output record R' is formed by joining the transformed record N' with the bypassed record B according to the information provided in the identification columns.”; and in 
Further details are provided in the set forth 35 USC 103 rejection below.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised 

Claims 1-2, 5-8, 10-11, and 13-17 are rejected under 35 U.S.C. 103 as being unpatentable over US Patent Application Publication (US 2012/0154405 A1) issued to Baumgartner et al. (hereinafter as “BAUMGARTNER”), and in view of US Patent Application Publication (US 2009/0171990 A1) issued to Naef et al. (hereinafter as “NAEF”).
Regarding claim 1 (Currently Amended), BAUMGARTNER teaches a device comprising: a processing resource; and a machine-readable storage medium encoded with instructions executable by the processing resource for analytics workflow (BAUMGARTNER Para. [0009], lines (1-5): “…, a computer program product comprising a computer-readable medium including a computer-readable program code, wherein the computer-readable program code is adapted to execute one of the above methods”; and Para. [0010], lines (1-6): “…, the present invention provides a system for executing one of the above methods comprising a storage device for storing a computer usable program code and a processor for executing the computer usable program code to execute the method according to one of the above method claims”), the machine-readable storage medium comprising instructions to: 
based on any of a completion time, an amount of data, and a processing power at a stage of the analytics workflow relative to one or more remaining stages of the analytics workflow (BAUMGARTNER Fig. 6, Para. [0062], lines (1-9): “The lower part of FIG. 6 shows a record R with multiple data columns. A set of data columns of the record R is being processed in the stage 2, denoted as non-bulk data N, and a set of data columns, denoted as Bypassing bulk columns B, has been identified for bypassing and is not processed in this stage 2. Accordingly, the record R is separated into the two sets of data columns, forming new records, which additionally contain an identification column containing information about the separation of the record R.”; and 
Para. [0063], lines (1-10): “The set of data columns N is passed to the transformation of this stage 2 and processed into a transformed record N'. The general structure of the record N is identical to the one of the record R, only that the input data columns, which are bypassed, are removed from this record. Therefore, the record N' can be processed by the stage 2 without modifications to the stage 2. The set of data columns B bypasses the transformation. An output record R' is formed by joining the transformed record N' with the bypassed record B according to the information provided in the identification columns.”); 
(BAUMGARTNER  Para. [0022], lines (2-5): “…, detect bulk data by evaluating the processing of data columns in each stage of the ETL process to identify for each unused data column sets of subsequent stages in which the data column is not processed”; and
Para. [0033], lines (1-7): “…, the present invention comprises adding status information indicating a processing status of a data column to the list. The status information can be used to mark data columns, for which a processing has already been detected in a stage.”); 
based on a determination of similarity, (BAUMGARTNER  Para. [0025], lines (15-23): “Whenever a data column is passed through a first stage to a second stage unmodified, it will be identified as a bulk column. The same happens, when the data column is passed through a second stage without being modified. Accordingly, an embodiment may identify, that the data column is not processed in both the first stage and the second stage, so that the bulk data can be bypassed immediately from the input of the first stage to the output of the second stage.”,
the examiner notes that upon the identification of bulk data, i.e. determination of a similarity, a bypass operation is performed on the bulk data from an input of a first stage to an output of a second stage).  

However, BAUMGARTNER  does not explicitly teach 
But NAEF teaches (NAEF Fig. 1, Para. [0030], lines (1-9): “…, a system 10 of identifying potentially similar content for performing or enabling data reduction includes a similarity identifier component 12. Similarity identifier component 12 is operable to compare one or more content to be processed 14 with one or more known content 16, based on comparing respective workflow process metadata 18 and 20, to identify a subset of potentially similar content 22 for use by a data reduction component 24.”; and 
Para [0036], lines (11-15): “…, data reduction component 24 is operable to efficiently reduce a size of the data for processing by eliminating one or more redundant data components prior to the processing of the content.”, 
the examiner notes that the data reduction component elimination of the data component prior to processing to that of interrupting the execution of the workflow analysis); 
(NAEF Fig. 1, Para. Fig. 1, Para. [0030], lines (1-9): “…, a system 10 of identifying potentially similar content for performing or enabling data reduction includes a similarity identifier component 12. Similarity identifier component 12 is operable to compare one or more content to be processed 14 with one or more known content 16, based on comparing respective workflow process metadata 18 and 20, to identify a subset of potentially similar content 22 for use by a data reduction component 24.”, the examiner notes that the system compares one or more context to known contents, i.e. stored insight data).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of BAUMGARTNER  (disclosing stage-based data processing) to include the teachings of NAEF (disclosing processing and identification of similar contents) and arrive at a method to manipulate data processing based on data identification.  One of ordinary skill in the art would have been motivated to make this combination because by identifying data of interest, i.e. when similar content is identified, thereby data reduction/interruption is operable to efficiently reduce a size of the data for processing by eliminating one or more redundant data components prior to the processing of the content, as recognized by (NAEF, Abstract, Para. [0034]-[0036]). In addition, the references of BAUMGARTNER  and NAEF teach features that are directed to analogous art and they are directed to the same field of endeavor of data analysis and processing.

Regarding claim 2 (Currently Amended), the combination of BAUMGARTNER and NAEF teach the limitations of claim 1.  Further, NAEF teaches the machine-readable storage medium further comprising instructions to: 
based on a determination of dissimilarity, (NAEF Fig. 1, Para. [0033], lines (1-21): “Data reduction component 24 identifies the same data components and/or different data components found in the subset of potentially similar content 22. …, the one or more different data components 28 found in the subset of data to be processed, when compared with the subset of known data, represent new or unknown data. As such, data reduction component 24 replaces the content to be processed 14 with a reduced data representation 30 that includes the identified one or more different data components 28, or the one or more tokens 26 representing same data components, or some combination of both. As such, reduced data representation 30 has a smaller overall data size, which may be in terms of storage space and/or network bandwidth, than the original content to be processed 14.”; and
 Para. [0035], lines (14-20): “…, because the subset of potentially similar content 22 includes potentially similar workflow processing metadata 18 and 20, similarity identifier component 12 increases the likelihood of data reduction component 24 being able to identify similarities or differences and thus reduce a size of the data for the respective content to be processed 14.”), and 
(NAEF Fig. 1, Para. [0034], lines (1-14): “Content processing component 32 obtains and processes the reduced data representation 30, thereby generating processed content 34. As such, content processing component 30 may include any type of logic, such as logic operable to perform any operation on content to be processed 14. Thus, processed content 34 may include, for example, content that has been transferred, synchronized, de-duplicated, backed-up, or any other operation performable by content processing logic and benefiting from the reduced data size of the content. Additionally, it should be noted that in some aspects, one or more of similarity identifier component 12, data reduction component 24 or content processing component 32 may be implemented within the same or by different modules or by the same or by different computing devices.”).  

Regarding claim 5 (Currently Amended), the combination of BAUMGARTNER and NAEF teach the limitations of claim 1.  Further, NAEF teaches wherein the instructions to perform the similarity analysis further comprise: 
determining a type of the insights data (NAEF Para. [0040], lines (3-8): “…, workflow processing metadata 44 may include, but is not limited to, any data that identifies or describes a workflow process associated with or application to a respective data component, an identification or description of a data component, an identification or description of a data component type, …”); and 
based on the insights data type, performing a data reduction on the insights data, and identifying a similarity algorithm (NAEF Fig. 1, Para. Fig. 1, Para. [0030], lines (1-9): “…, a system 10 of identifying potentially similar content for performing or enabling data reduction includes a similarity identifier component 12. Similarity identifier component 12 is operable to compare one or more content to be processed 14 with one or more known content 16, based on comparing respective workflow process metadata 18 and 20, to identify a subset of potentially similar content 22 for use by a data reduction component 24.”).  

Regarding claim 6 (Currently Amended), the combination of BAUMGARTNER and NAEF teach the limitations of claim 1.  Further, BAUMGARTNER teaches wherein the instructions to perform the bypass operation further comprise: preparing an output data associated with the raw data based on a previously generated analyzed data associated with the stored insights data (BAUMGARTNER Fig. 9/10, Para. [0021], lines (1-2): “FIG. 10 shows the comparative example of FIG. 9 with bypassing of the bulk data columns being applied.”; and
Para. [0025], lines (15-23): “Whenever a data column is passed through a first stage to a second stage unmodified, it will be identified as a bulk column. The same happens, when the data column is passed through a second stage without being modified. Accordingly, an embodiment may identify, that the data column is not processed in both the first stage and the second stage, so that the bulk data can be bypassed immediately from the input of the first stage to the output of the second stage.”).  

Regarding claim 7 (Currently Amended), the combination of BAUMGARTNER and NAEF teach the limitations of claim 1.  Further, NAEF teaches wherein the similarity analysis and the bypass operation occur at one of a storage layer and an application framework (NAEF Fig. 1, Para. [0043], lines (20-24): “…, workflow processing metadata 44 may be stored within the content, linked to the content, or stored separately from the content. In any case, system 10 includes the ability of similarity identifier component 12 (FIG. 1) to access the respective workflow processing metadata 44.”).  

Regarding claim 8 (Currently Amended) , BAUMGARTNER teaches a machine-readable storage medium encoded with instructions executable by a processing resource for analytics workflow (BAUMGARTNER Para. [0009], lines (1-5): “…, a computer program product comprising a computer-readable medium including a computer-readable program code, wherein the computer-readable program code is adapted to execute one of the above methods”; and Para. [0010], lines (1-6): “…, the present invention provides a system for executing one of the above methods comprising a storage device for storing a computer usable program code and a processor for executing the computer usable program code to execute the method according to one of the above method claims”), the machine-readable storage medium comprising instructions to: 
(BAUMGARTNER  Para. [0022], lines (2-5): “…, detect bulk data by evaluating the processing of data columns in each stage of the ETL process to identify for each unused data column sets of subsequent stages in which the data column is not processed”; and
Para. [0033], lines (1-7): “…, the present invention comprises adding status information indicating a processing status of a data column to the list. The status information can be used to mark data columns, for which a processing has already been detected in a stage.”); 
(BAUMGARTNER  Para. [0025], lines (15-23): “Whenever a data column is passed through a first stage to a second stage unmodified, it will be identified as a bulk column. The same happens, when the data column is passed through a second stage without being modified. Accordingly, an embodiment may identify, that the data column is not processed in both the first stage and the second stage, so that the bulk data can be bypassed immediately from the input of the first stage to the output of the second stage.”,
the examiner notes that upon the identification of bulk data, i.e. determination of a similarity, a bypass operation is performed on the bulk data from an input of a first stage to an output of a second stage).
However, BAUMGARTNER  does not explicitly teach , the bypass stage being determined based on any of a completion time, an amount of data, and a processing power at a stage of the analytics workflow relative to one or more remaining stages of the analytics workflow; 
But NAEF teaches , the bypass stage being determined based on any of a completion time, an amount of data, and a processing power at a stage of the analytics workflow relative to one or more remaining stages of the analytics workflow (NAEF Fig. 1, Para. [0030], lines (1-9): “…, a system 10 of identifying potentially similar content for performing or enabling data reduction includes a similarity identifier component 12. Similarity identifier component 12 is operable to compare one or more content to be processed 14 with one or more known content 16, based on comparing respective workflow process metadata 18 and 20, to identify a subset of potentially similar content 22 for use by a data reduction component 24.”; and 
Para [0036], lines (11-15): “…, data reduction component 24 is operable to efficiently reduce a size of the data for processing by eliminating one or more redundant data components prior to the processing of the content.”, 
the examiner notes that the data reduction component elimination of the data component prior to processing to that of interrupting the execution of the workflow analysis); 
(NAEF Fig. 1, Para. Fig. 1, Para. [0030], lines (1-9): “…, a system 10 of identifying potentially similar content for performing or enabling data reduction includes a similarity identifier component 12. Similarity identifier component 12 is operable to compare one or more content to be processed 14 with one or more known content 16, based on comparing respective workflow process metadata 18 and 20, to identify a subset of potentially similar content 22 for use by a data reduction component 24.”, the examiner notes that the system compares one or more context to known contents, i.e. stored insight data); 
based on a determination of dissimilarity, (NAEF Fig. 1, Para. [0033], lines (1-21): “Data reduction component 24 identifies the same data components and/or different data components found in the subset of potentially similar content 22. …, the one or more different data components 28 found in the subset of data to be processed, when compared with the subset of known data, represent new or unknown data. As such, data reduction component 24 replaces the content to be processed 14 with a reduced data representation 30 that includes the identified one or more different data components 28, or the one or more tokens 26 representing same data components, or some combination of both. As such, reduced data representation 30 has a smaller overall data size, which may be in terms of storage space and/or network bandwidth, than the original content to be processed 14.”; and
 Para. [0035], lines (14-20): “…, because the subset of potentially similar content 22 includes potentially similar workflow processing metadata 18 and 20, similarity identifier component 12 increases the likelihood of data reduction component 24 being able to identify similarities or differences and thus reduce a size of the data for the respective content to be processed 14.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of BAUMGARTNER  (disclosing stage-based data processing) to include the teachings of NAEF (disclosing processing and identification of similar contents) and arrive at a method to manipulate data processing based on data identification.  One of ordinary skill in the art would have been motivated to make this combination because by identifying data of interest, i.e. when similar content is identified, thereby data reduction/interruption is operable to efficiently reduce a size of the data for processing by eliminating one or more redundant data components prior to the processing of the content, as recognized by (NAEF, Abstract, Para. [0034]-[0036]). In addition, the references of BAUMGARTNER  and NAEF teach features that are directed to analogous art and they are directed to the same field of endeavor of data analysis and processing.  

Regarding claim 10 (Currently Amended), the combination of BAUMGARTNER and NAEF teach the limitations of claim 8.  Further, BAUMGARTNER teaches  The machine-readable storage medium of claim 8, wherein the instructions to perform the bypass operation further comprise: 
preparing an output data associated with the raw data based on a previously generated analyzed data associated with the stored insights data (BAUMGARTNER Fig. 9/10, Para. [0021], lines (1-2): “FIG. 10 shows the comparative example of FIG. 9 with bypassing of the bulk data columns being applied.”; and
Para. [0025], lines (15-23): “Whenever a data column is passed through a first stage to a second stage unmodified, it will be identified as a bulk column. The same happens, when the data column is passed through a second stage without being modified. Accordingly, an embodiment may identify, that the data column is not processed in both the first stage and the second stage, so that the bulk data can be bypassed immediately from the input of the first stage to the output of the second stage.”); and 
removing the raw data (BAUMGARTNER Para. [0063], lines (1-5): “The set of data columns N is passed to the transformation of this stage 2 and processed into a transformed record N'. The general structure of the record N is identical to the one of the record R, only that the input data columns, which are bypassed, are removed from this record.”).  

Regarding claim 11 (Currently Amended), BAUMGARTNER teaches a method for analytics workflow (BAUMGARTNER Para. [0010], lines (1-6): “…, the present invention provides a system for executing one of the above methods comprising a storage device for storing a computer usable program code and a processor for executing the computer usable program code to execute the method according to one of the above method claims”), comprising: 
monitoring execution of the analytics workflow upon receipt of a raw data (BAUMGARTNER  Para. [0022], lines (2-5): “…, detect bulk data by evaluating the processing of data columns in each stage of the ETL process to identify for each unused data column sets of subsequent stages in which the data column is not processed”; and
Para. [0033], lines (1-7): “…, the present invention comprises adding status information indicating a processing status of a data column to the list. The status information can be used to mark data columns, for which a processing has already been detected in a stage.”); 
based on a determination of similarity, performing a bypass operation to bypass a remainder of the analytics workflow (BAUMGARTNER  Para. [0025], lines (15-23): “Whenever a data column is passed through a first stage to a second stage unmodified, it will be identified as a bulk column. The same happens, when the data column is passed through a second stage without being modified. Accordingly, an embodiment may identify, that the data column is not processed in both the first stage and the second stage, so that the bulk data can be bypassed immediately from the input of the first stage to the output of the second stage.”,
the examiner notes that upon the identification of bulk data, i.e. determination of a similarity, a bypass operation is performed on the bulk data from an input of a first stage to an output of a second stage).
However, BAUMGARTNER  does not explicitly teach based on a determination of dissimilarity, storing the insights data as stored insights data in the insights data repository, identifying a threshold for the stored insights data, storing the threshold with the stored insights data, wherein each stored insights data in the insights data repository has an associated threshold, and executing the remainder of the analytics workflow to generate analyzed data; interrupting the execution of the analytics workflow at , the bypass stage determined based on any of a completion time, an amount of data, and a processing power at a stage of the analytics workflow relative to one or more remaining stages of the analytics workflow;  performing a similarity analysis to compare the insights data to a stored insights data in an insights data repository; based on a determination of dissimilarity, storing the insights data as stored insights data in the insights data repository, identifying a threshold for the stored insights data, storing the threshold with the 
But, NAEF teaches based on a determination of dissimilarity, storing the insights data as stored insights data in the insights data repository, identifying a threshold for the stored insights data, storing the threshold with the stored insights data, wherein each stored insights data in the insights data repository has an associated threshold, and executing the remainder of the analytics workflow to generate analyzed data (NAEF Fig. 1, Para. [0033], lines (1-21): “Data reduction component 24 identifies the same data components and/or different data components found in the subset of potentially similar content 22. …, the one or more different data components 28 found in the subset of data to be processed, when compared with the subset of known data, represent new or unknown data. As such, data reduction component 24 replaces the content to be processed 14 with a reduced data representation 30 that includes the identified one or more different data components 28, or the one or more tokens 26 representing same data components, or some combination of both. As such, reduced data representation 30 has a smaller overall data size, which may be in terms of storage space and/or network bandwidth, than the original content to be processed 14.”; and
 Para. [0035], lines (14-20): “…, because the subset of potentially similar content 22 includes potentially similar workflow processing metadata 18 and 20, similarity identifier component 12 increases the likelihood of data reduction component 24 being able to identify similarities or differences and thus reduce a size of the data for the respective content to be processed 14.”)
interrupting the execution of the analytics workflow at , the bypass stage determined based on any of a completion time, an amount of data, and a processing power at a stage of the analytics workflow relative to one or more remaining stages of the analytics workflow (NAEF Fig. 1, Para. [0030], lines (1-9): “…, a system 10 of identifying potentially similar content for performing or enabling data reduction includes a similarity identifier component 12. Similarity identifier component 12 is operable to compare one or more content to be processed 14 with one or more known content 16, based on comparing respective workflow process metadata 18 and 20, to identify a subset of potentially similar content 22 for use by a data reduction component 24.”; and 
Para [0036], lines (11-15): “…, data reduction component 24 is operable to efficiently reduce a size of the data for processing by eliminating one or more redundant data components prior to the processing of the content.”, 
the examiner notes that the data reduction component elimination of the data component prior to processing to that of interrupting the execution of the workflow analysis); 
performing a similarity analysis to compare the insights data to a stored insights data in an insights data repository (NAEF Fig. 1, Para. Fig. 1, Para. [0030], lines (1-9): “…, a system 10 of identifying potentially similar content for performing or enabling data reduction includes a similarity identifier component 12. Similarity identifier component 12 is operable to compare one or more content to be processed 14 with one or more known content 16, based on comparing respective workflow process metadata 18 and 20, to identify a subset of potentially similar content 22 for use by a data reduction component 24.”, the examiner notes that the system compares one or more context to known contents, i.e. stored insight data); 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of BAUMGARTNER  (disclosing stage-based data processing) to include the teachings of NAEF (disclosing processing and identification of similar contents) and arrive at a method to manipulate data processing based on data identification.  One of ordinary skill in the art would have been motivated to make this combination because by identifying data of interest, i.e. when similar content is identified, thereby data reduction/interruption is operable to efficiently reduce a size of the data for processing by eliminating one or more redundant data components prior to 

Regarding claim 13 (Original), the combination of BAUMGARTNER and NAEF teach the limitations of claim 11.  Further, NAEF teaches wherein performing the similarity analysis further comprises: 
determining a type of insights data (NAEF Para. [0040], lines (3-8): “…, workflow processing metadata 44 may include, but is not limited to, any data that identifies or describes a workflow process associated with or application to a respective data component, an identification or description of a data component, an identification or description of a data component type, …”); and 
based on the insights data type, performing a data reduction on the insights data, and identifying a similarity algorithm (NAEF Fig. 1, Para. Fig. 1, Para. [0030], lines (1-9): “…, a system 10 of identifying potentially similar content for performing or enabling data reduction includes a similarity identifier component 12. Similarity identifier component 12 is operable to compare one or more content to be processed 14 with one or more known content 16, based on comparing respective workflow process metadata 18 and 20, to identify a subset of potentially similar content 22 for use by a data reduction component 24.”).  

Regarding claim 14 (Original), the combination of BAUMGARTNER and NAEF teach the limitations of claim 11.  Further, BAUMGARTNER teaches wherein performing the bypass operation further comprises: 
preparing an output data associated with the raw data based on a previously generated analyzed data associated with the stored insights data (BAUMGARTNER Fig. 9/10, Para. [0021], lines (1-2): “FIG. 10 shows the comparative example of FIG. 9 with bypassing of the bulk data columns being applied.”; and
Para. [0025], lines (15-23): “Whenever a data column is passed through a first stage to a second stage unmodified, it will be identified as a bulk column. The same happens, when the data column is passed through a second stage without being modified. Accordingly, an embodiment may identify, that the data column is not processed in both the first stage and the second stage, so that the bulk data can be bypassed immediately from the input of the first stage to the output of the second stage.”); and 
removing the raw data (BAUMGARTNER Para. [0063], lines (1-5): “The set of data columns N is passed to the transformation of this stage 2 and processed into a transformed record N'. The general structure of the record N is identical to the one of the record R, only that the input data columns, which are bypassed, are removed from this record.”).    

Regarding claim 15 (Original), the combination of BAUMGARTNER and NAEF teach the limitations of claim 14.  Further, BAUMGARTNER teaches wherein performing the bypass operation further comprises: removing an intermediate data after preparing the output data (BAUMGARTNER Para. [0063], lines (1-5): “The set of data columns N is passed to the transformation of this stage 2 and processed into a transformed record N'. The general structure of the record N is identical to the one of the record R, only that the input data columns, which are bypassed, are removed from this record.”).  

Regarding claim 16 (New), the combination of BAUMGARTNER and NAEF teach the limitations of claim 1.  Further, BAUMGARTNER wherein the determination of the bypass stage is based on a number of input/output operations at the stage and the one or more remaining stages of the analytics workflow (BAUMGARTNER Abstract, lines (1-8): “Reroutable data columns are identified in an ETL process by receiving an ETL process definition describing a set of processing stages and how each processing stage output data column is a result of a function that operates on a set of input data columns, representing the ETL process definition as a directed graph with nodes representing processing stages and links representing data flow between processing stages, …”).  

Regarding claim 17 (New), the combination of BAUMGARTNER and NAEF teach the limitations of claim 1.  Further, BAUMGARTNER wherein the determination of the bypass stage is in response to a processing power or latency of the analytics workflow exceeding a threshold amount (BAUMGARTNER Para. [0034], lines (1-18): “…, each node has a set of multiple input data columns and rerouting the input data columns identified to be reroutable from the one outmost node to the other outmost node along the directed graph comprises separating the set of multiple input data columns into a set of processing data columns, which are provided to the respective stage, and a set of bulk data columns, which are involved only as input data in identity functions and bypass the respective stage, each comprising meta-information for joining the set of bulk data columns and a set of processed data columns, the latter based on the set of processing data columns. The meta-information allows the separation of the set of input data columns into the processing data columns and the bulk data columns, so that the set of processed data columns can be adjoined to the set of bulk data columns. If the stage is not the second of the outmost stages for the respective data column, the data column is joined later, when its second outmost stage has been reached.”, 
the examiner notes that based on the stage determination reaching an outmost stage, i.e. a threshold, the data column is joined later, i.e. latency factor).  

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over US Patent Application Publication (US 2012/0154405 A1) issued to Baumgartner et al. (hereinafter as “BAUMGARTNER”), in view of US Patent Application Publication (US 2009/0171990 A1) issued to Naef et al. (hereinafter as “NAEF”), and in view of US Patent (US 10,089,365 B1) issued to Khanna et al. (hereinafter as “KHANNA”).
Regarding claim 4 (Currently Amended), the combination of BAUMGARTNER and NAEF teach the limitations of claim 2. 
However,  the combination of BAUMGARTNER and NAEF do not explicitly teach wherein the instructions to store the insights data further comprise: identifying a threshold for the stored insights data; and storing the threshold with the stored insights data, wherein each stored insights data in the insights data repository has an associated threshold.
But KHANNA teaches identifying a threshold for the stored insights data (KHANNA Fig. 2, Col. 9, lines (57-59): “The insight capture module 210 may present a user interface for capturing an insight based on an events associated with an activity threshold on a topic”); and 
storing the threshold with the stored insights data, wherein each stored insights data in the insights data repository has an associated threshold (KHANNA Fig. 2, Col. 9, lines (57-65): “The insight capture module 210 may present a user interface for capturing an insight based on an events associated with an activity threshold on a topic. For example, insight capture module 210 detects an increased activity on a topic (that is identified as relevant for a user), the insight capture module 210 prompts the user with a template to add/extend the conversation by providing additional information for an existing insight or by adding a new insight.”).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combinations of BAUMGARTNER  (disclosing stage-based data processing) and NAEF (disclosing processing and identification of similar contents), to include the teachings of KHANNA (disclosing delivery of data objects associated with content items representing insights) and arrive at a method to identify a threshold for associated insights data.  One of ordinary skill .

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over US Patent Application Publication (US 2012/0154405 A1) issued to Baumgartner et al. (hereinafter as “BAUMGARTNER”), in view of US Patent Application Publication (US 2009/0171990 A1) issued to Naef et al. (hereinafter as “NAEF”), and in view of US Patent (US 7,966,327 B2) issued to Li et al. (hereinafter as “LI”).
Regarding claim 19 (New), the combination of BAUMGARTNER and NAEF teach the limitations of claim 6.  However, the combination of BAUMGARTNER and NAEF do not explicitly teach wherein the machine-readable storage medium further comprises instructions to apply a reducing algorithm to the insights data based on a type of the insights data, the application of the reducing algorithm comprising:  in response to the insights data being of a time series data type, applying data transformation functions or dimension reduction algorithms including a Haar wavelet or a Fourier transform; in response to the insights data being of an image data type, applying filtering or Locality Sensitive Hashing (LSH); and in response to the insights data being of a text data type, performing text summarization algorithms.
But, LI teaches in response to the insights data being of a time series data type, applying data transformation functions or dimension reduction algorithms including a Haar wavelet or a Fourier transform (LI Col. 2, lines (29-36): “Similarity searching on time series or sequence data have been investigated recently. Range searches and nearest neighbor searches in whole matching and subsequence matching have been the principal queries of interest for time series data. For whole matching, several techniques have been proposed to transform the time sequence to the frequency domain by using DFT (Discrete Fourier Transform) and wavelets to reduce dimensions.”); 
in response to the insights data being of an image data type, applying filtering or Locality Sensitive Hashing (LSH) (LI Col. 12, lines (42-49): “Previous filtering methods do not work well. The first kind of filtering is to index individual regions and combine the filtering results of all the regions to form the candidate image set. This approach is not effective, because it loses the information of image-level similarity. The second kind is to use a technique to embed EMD into L.sub.1 distance and then use Locality Sensitive Hashing (LSH) to find the nearest neighbor(s) in the latter space.”); and 
in response to the insights data being of a text data type, performing text summarization algorithms (LI Col. 7, lines (38-44): “The proposed system encourages users to use the content-based similarity search capability to search and manage massive amounts of feature-rich data instead of using the traditional file system interface. It may be useful to combine the similarity search capability with a search range constrained by attributes (such as time, size, data type, owner, and so on) and user-defined annotations.”).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combinations of BAUMGARTNER  (disclosing stage-based data processing) and NAEF (disclosing processing and identification of similar contents), to include the teachings of LI (disclosing similarity search methods) and arrive at a method to compute similarities based on data input and multiple similarity algorithms.  One of ordinary skill in the art would have been motivated to make this combination because when a content-addressable and searchable storage system includes as images, audio or scientific data, an efficient feature extraction and data segmentation based on similarity analysis can provide the system users with the desired analytical results, as recognized by (LI, Abstract, Col. 5). In addition, the references of BAUMGARTNER, NAEF and LI .

Claim 20 are rejected under 35 U.S.C. 103 as being unpatentable over US Patent Application Publication (US 2012/0154405 A1) issued to Baumgartner et al. (hereinafter as “BAUMGARTNER”), in view of US Patent Application Publication (US 2009/0171990 A1) issued to Naef et al. (hereinafter as “NAEF”), and in view of US Patent (US 10,402,414 B1) issued to Kutzkov et al. (hereinafter as “KUTZKOV”).
Regarding claim 20 (New), the combination of BAUMGARTNER and NAEF teach the limitations of claim 6.  However, the combination of BAUMGARTNER and NAEF do not explicitly teach wherein: the similarity algorithm includes a Jaccard similarity algorithm or a cosine similarity algorithm; and the removing of the raw data comprises moving the raw data from a primary storage to a secondary storage or a tertiary storage.
But, KUTZKOV teaches wherein: the similarity algorithm includes a Jaccard similarity algorithm or a cosine similarity algorithm (KUTZKOV Col. 1, lines (29-40): “There are different similarity measure definitions that have been applied. See <<http://reference.wolfram.com/language/guide/DistanceAndSimilarityMeasures.html>> (accessed Jan. 29, 2015) for an overview. Some of the similarity measure definitions like Hamming distance, Jaccard similarity, Dice similarity, etc, assume binary data as input. For many problems however, this assumption is not justified and one needs to handle weighted features. Arguably, the three most widely used similarity measures for weighted data are Euclidean distance, cosine similarity and Pearson correlation.”); and 
the removing of the raw data comprises moving the raw data from a primary storage to a secondary storage or a tertiary storage (KUTZKOV Col. 2, lines (1-9): “The problem to compute the similarity between two objects by the above definitions is trivial if it is possible to store the objects in main memory. However, for massive datasets with high-dimensional objects, it is often the case that it is not possible to store all of the objects in main memory. Therefore, one aims to efficiently compute compact sketches or summaries of the objects that will lead to considerable space savings.”; and Fig. 3, Col. 4, lines (48-54): “FIG. 3 schematically shows an exemplary system according to an embodiment of the invention. The system can be made up of a computer or computational processing unit, a server or a network of computers and/or servers, which apply the sketching algorithm to preferably streaming data from a data source 310, which is, for example, stored on or available via a network.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combinations of BAUMGARTNER  (disclosing stage-based data processing) and NAEF (disclosing processing and identification of similar contents), to include the teachings of KUTZKOV (disclosing methods of similarity estimation in massive datasets) and arrive at a method to compute similarities based on a Jaccard similarity algorithm or a cosine similarity algorithm.  One of ordinary skill in the art would have been motivated to make this combination because by combining a novel manner for the estimation of similarities of both the inner product of two vectors, i.e. stages in a workflow, and the 2-norm of a vector, thereby having these estimates makes it possible to compute an estimation of Jaccard or cosine similarity, as recognized by (KUTZKOV, Abstract, Col. 3). In addition, the references of BAUMGARTNER, NAEF and KUTZKOV teach features that are directed to analogous art and they are directed to the same field of endeavor of data analysis and processing.

Allowable Subject Matter
Claims 3, 9, 12 and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter.  As detailed above, BAUMGARTNER discloses a bypass stage in an analytics workflow system, where records are being processed in stages and a bypassing decision is made based on an identification containing information about the records being analyzed in a transformation process, where data is monitored to detect bulk data by evaluating the processing of data columns in each stage of the ETL process to identify for each unused data column sets of subsequent stages in which the data column is not processed. Further, based on a determination of similarity, a stage bypass is performed so that the data column is not processed in both the first stage and the second stage, so that the bulk data can be bypassed immediately from the input of the first stage to the output of the second stage.  Additionally, NAEF discloses a system to interrupt the execution of the analytics workflow by performing a ssimilarity identification to identify a subset of potentially similar content and eliminating one or more redundant data components prior to the processing of content, wherein the data reduction component elimination of the data component prior to processing to that of interrupting the execution of the workflow analysis.
However, none of the above prior arts, individually or in combination, disclose that the determination of the bypass stage involves, computing a time savings associated with bypassing the analytics workflow at a stage of the analytics workflow; computing a time cost associated with performing the similarity analysis at the stage; and maximizing a differential between the time savings and the time cost, wherein the time savings is based on a nonlinear relationship between a volume of data at the stage and a time cost associated with a computational process at the stage.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
SRINIVAS et al. ; (US-2016/0203279 A1); “Differential analysis based on a workflow policy”.
Gibbons et al. ; (US-2006/0259451-A1); “employing a bypass function in a workflow system”.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Zuheir Mheir whose telephone number is (571)272-4151.  The examiner can normally be reached on Monday - Friday 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
6/05/2021

/ZUHEIR A MHEIR/Patent Examiner, Art Unit 2162          


/PIERRE M VITAL/Supervisory Patent Examiner, Art Unit 2162