Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Application 16/865,662 filed 5/4/2020 has been examined.
In this Office Action, claims 1-20 are currently pending.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an
abstract idea without significantly more.
Claim 1 recites:
characterizing sources using samples of data.
The limitation of characterizing sources using samples of data, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting a processor, nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the processor language, characterizing in the context of this claim encompasses the user manually determining “characterizations” using generic “samples” of data. Similarly, the limitations of estimating a time of completion, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. For example, but for the processor language, characterizing and estimating in the context of this claim encompasses the user 

Further, these concepts also recite “Certain Methods of Organizing Human Activity”; (such as
commercial or legal interactions (including agreements in the form of contracts; legal
obligations; advertising, marketing or sales activities or behaviors; business relations) where
characterizing sources using generic samples is a method of human activity in commercial or legal interactions.

Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application. In particular, the claim only
recites one additional element – using a processor to perform both the characterizing and estimating steps. The processor in both steps is recited at a high level of generality (i.e., as a generic processor performing a generic computer function of “characterizing”) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
The claim does not include additional elements that are sufficient to amount to significantly more
than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a processor to perform
both the characterizing and estimating steps amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an 

Dependent claims 2-7 are merely add further details of the abstract steps/elements recited in
claim 1 without integrating the idea into a practical application; or including an improvement to
another technology or technical field, an improvement to the functioning of the computer itself,
or meaningful limitations beyond generally linking the use of an abstract idea to a particular
technological environment. Therefore, dependent claims 2-7 are also directed towards
nonstatutory subject matter.

As per independent claims 8 and 14, are also rejected as ineligible subject matter under 35
U.S.C. 101 for substantially the same reasons as the method claim(s) 1. The components (i.e.,
Medium/method described in independent claims 8 and 14 do not provide for integrating the
abstract idea into a practical application. At best, the claim(s) are merely providing alternate
environments to implement the abstract idea.

Dependent claims 9-13 and 15-20 merely add further details of the abstract steps/elements
recited in claim 1 without integrating the idea into a practical application; or including an
improvement to another technology or technical field, an improvement to the functioning of the
computer itself, or meaningful limitations beyond generally linking the use of an abstract idea to
a particular technological environment. Therefore, dependent claims 9-13 and 15-20 are also
directed towards non-statutory subject matter.




Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claims 1, 5-7, 11-14, 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over 
Dhayapule et al., US Pub. No. 2019/0079981.

As to claim 1 (and substantially similar claim 8 and claim 14), Dhayapule discloses:  
an apparatus comprising:
at least one processor;
(Dhayapule [0006])
a memory coupled to the at least one processor; 
(Dhayapule [0007]) and
a data harvester residing in the memory and executed by the at least one
processor, 
(Dhayapule teaches collecting metrics using a metrics collection agent, i.e. a “data harvester” see [0026] a data source metrics collection agent process 122a ... 122n (which will be referred 
see also abstract: Network metrics are obtained that indicate, for each pair of participating ETL servers, an average data transmission speed and a unit cost)

wherein the data harvester characterizes a plurality of data sources in a target
system, 
(Dhayapule teachaes [0048] With embodiments, the DMA process collects data source metrics for each data source through all the participating ETL servers, including the local ETL server itself.)

samples a subset of data in the plurality of data sources, 
(Dhayapule teaches collecting metrics, i.e. samples, see [0077] These estimated unit cost and speed are periodically recalculated based on the metrics collected over the network.; see also [0048] With embodiments, the DMA process collects data source metrics for each data source through all the participating ETL servers, including the local ETL server itself. The collected data source metrics include information such as: data source mappings and data table statistics.)

and estimates a time of completion based on the samples
(Dhayapule teaches determining estimated total execution time using various metrics see [0041] The total job execution cost and execution time may be calculated using an estimated size of the source data and intermediate data, unit cost and speed of data extraction/ loading from/to the data sources, unit cost and speed of the projected data transmission on the communication 
See also [0063] For minimum time, one of the distributed job execution plans with minimum estimated total execution time will be selected. For minimum cost, one of the distributed job execution plans with minimum estimated total cost will be selected. For minimum time within a set maximum cost, one of the distributed job execution plans with minimum estimated total execution time among all the distributed job execution plans with a cost that is less than or equal to the set maximum cost will be selected.;)

It would have been obvious to one having ordinary skill in the art at the time the time of the effective filing date to collecting metrics for completion time estimates as taught by Dhayapule since it was known in the art that collecting metrics for an optimized distributed job execution plan facilitates the automatic real-time execution of the distributed job segments provide allow efficient and optimal utilization of infrastructure and resources (remote and local) and avoid inconsistent data join or consolidation across remote networks due to the lack of an automatic and holistic data integration process and minimize data shipments and recalculation based on different optimization criteria and reduce network congestion compared to the ad-hoc way of shipping arbitrary intermediate data to/from remote systems for a similar purpose (Dhayapule [0106-0109]).

As to claim 5, Dhayapule discloses the apparatus of claim 1 wherein the data harvester generates based on machine learning at least one recommendation based on historical data stored in a knowledgebase regarding at least one previous run of the data harvester (Dhayapule [0110] Embodiments provide a technique balancing ETL jobs across distributed systems based on optimization criteria by: determining an optimization plan for an ETL job to meet an optimization criteria across a plurality of distributed systems, wherein the optimization plan 
predictions to balance workloads across the plurality of systems.).

As to claim 6, Dhayapule discloses the apparatus of claim 5 wherein a user selects at least one adjustment based on the at last one recommendation, and in response, the data harvester continues harvesting data according to the user-selected at least one adjustment (Dhayapule [0030] If desired, the automatically generated data source/workload mapping tables
may be manually updated by users to take into consideration environment changes and/or operational preferences.;
See also [0063] In certain embodiments, the DBO engine 120a .. . 120n finds multiple distributed job execution plans and selects a distributed job execution plan from these that satisfies selected optimization criteria, such as "minimum time", "minimum cost" or something in between: "minimum time within a set maximum cost" and "minimum cost with a set maximum time".).

As to claim 7, Dhayapulediscloses the apparatus of claim 6 wherein the data harvester selects a next sample, estimates time for processing the selected next sample, and performs data harvesting on the selected next sample (Dhayapule teaches determining estimated total execution time using various metrics see [0041] The total job execution cost and execution time may be calculated using an estimated size of the source data and intermediate data, unit cost and speed of data extraction/ loading from/to the data sources, unit cost and speed of the projected data transmission on the communication channels in between ETL servers, unit cost and speed of the computational stages on each ETL server, and the accessibility of a data source from each ETL server.;


Referring to claim 11, this dependent claim recites similar limitations as claim 5;
therefore, the arguments above regarding claim 5 are also applicable to claim 11.

Referring to claim 12, this dependent claim recites similar limitations as claim 6;
therefore, the arguments above regarding claim 6 are also applicable to claim 12.

Referring to claim 13, this dependent claim recites similar limitations as claim 7;
therefore, the arguments above regarding claim 7 are also applicable to claim 13.

Referring to claim 18, this dependent claim recites similar limitations as claim 5;
therefore, the arguments above regarding claim 5 are also applicable to claim 18.

Referring to claim 19, this dependent claim recites similar limitations as claim 6;
therefore, the arguments above regarding claim 6 are also applicable to claim 19.

Referring to claim 20, this dependent claim recites similar limitations as claim 7;
therefore, the arguments above regarding claim 7 are also applicable to claim 20.



Claims 2-4, 9-10, 15-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over 
Dhayapule et al., US Pub. No. 2019/0079981, in view of Bishop et al., US Pub. No.:  2015/0052158.

As to claim 2, Dhayapule discloses the apparatus of claim 1 wherein the data harvester characterizes a selected data source according to a plurality of the following:
total size of the selected data source;
(Dhayapule [0078] The data source metrics table 600 of FIG. 6 shows an estimated number of rows in the table, an estimated row size for the table, and an estimated data extraction and
loading speed. In certain embodiments, the estimated data size for each table is the multiplication of the estimated row size and the estimated number of rows for that table; see also [0077] cost function of a local link is tied to the unit cost of running a job in the local ETL server, the size of the data to move);
and
network characteristics of the selected data source
(Dhayapule [0110] Embodiments maintain statistical metadata about
data sources, tables, and data extraction speed, network
speed, data transaction costs [ over the network], data processing
speed, and data processing costs.  ).

Dhayapule does not disclose:
total number of documents in the selected data source;
types of documents in the selected data source; 

however, Bishop discloses:

(Bishop [0038] User interface screen 300 may further include summary information 312 (e.g., total number of objects in the information set, size of the information set in megabytes, data and time created, description, etc.), "Details" radio-button option 320 to enable a user to view details (e.g., ancestry, execution log, data objects) of the selected information set, and other information and/or controls. For example, the user may review ancestry 330 of an information set named "Word does only" to see that the information set was created from a system-provided information set containing all data objects by selecting files with a" .doc", ".docx", or any other suitable extension indicating a Word document from that systemprovided information set.)

types of documents in the selected data source; 
(Bishop [0038] User interface screen 300 may further include summary information 312 (e.g., total number of objects in the information set, size of the information set in megabytes, data and time created, description, etc.), "Details" radio-button option 320 to enable a user to view details (e.g., ancestry, execution log, data objects) of the selected information set, and other information and/or controls. For example, the user may review ancestry 330 of an information set named "Word does only" to see that the information set was created from a system-provided information set containing all data objects by selecting files with a" .doc", ".docx", or any other suitable extension indicating a Word document from that systemprovided information set.;
See also [0031] The gateway system further holds additional, and in some cases, more detailed metadata and status information about application- level objects, and also maintains certain types of data that are aggregated from the data server systems.)

It would have been obvious to one having ordinary skill in the art at the time the time of the effective filing date to collecting summary information/metadata as taught by Bishop since it was 

As to claim 3, Bishop discloses under the rationale above, the apparatus of claim 2 wherein, when the selected data source includes a plurality of mailboxes, the data harvester further characterizes the selected data source according to organization, mailbox sizes and attachment percentages (Bishop [0035] A user interface (UI) may allow a user to define rules for criteria (e.g., specifying an identity, department, organization, vendor, product, custodian, object properties, attributes, etc.) to encapsulate indexed data, create an information set of the indexed data meeting the criteria, adjust the criteria to form a new information set, perform set operations ( e.g., comparison, identifying changes, union, intersection, complement, symmetric difference, etc.) on information sets, present reports of the results of the operations, and convert the criteria to adaptors with filters to retrieve the data satisfying the criteria.).

As to claim 4, Bishop discloses under the rationale above, the apparatus of claim 1 wherein the data harvester determines from the samples a likelihood of success of harvesting data, and when the likelihood of success is below a threshold, the data harvester selects a new sample size based on adjusted parameters for the data harvester (Bishop [0073] The collected information is persisted into the database (DS DB 2570) or a file system (DS FS 2580). The data server systems can employ multiple processes to carry out the data expansion operation on plural objects simultaneously. The success/failure status of the data expansion operation on
member objects of the information set is written into the database (DS DB 2570) as audit records at step 2615.).

As to claim 9, Dhayapule discloses the article of manufacture of claim 8 wherein the data harvester characterizes a selected data source according to a plurality of the following:
total size of the selected data source;
(Dhayapule [0078] The data source metrics table 600 of FIG. 6 shows an estimated number of
rows in the table, an estimated row size for the table, and an estimated data extraction and
loading speed. In certain embodiments, the estimated data size for each table is the
multiplication of the estimated row size and the estimated number of rows for that table; see
also [0077] cost function of a local link is tied to the unit cost of running a job in the local ETL
server, the size of the data to move);
network characteristics of the selected data source;
(Dhayapule [0110] Embodiments maintain statistical metadata about
data sources, tables, and data extraction speed, network
speed, data transaction costs [ over the network], data processing
speed, and data processing costs. ).


Dhayapule does not disclose:
total number of documents in the selected data source;
types of documents in the selected data source;
and
when the selected data source includes a plurality of mailboxes, according to
organization, mailbox sizes and attachment percentages;

however, Bishop discloses:
total number of documents in the selected data source;

total number of objects in the information set, size of the information set in megabytes, data
and time created, description, etc.), "Details" radio-button option 320 to enable a user to view
details (e.g., ancestry, execution log, data objects) of the selected information set, and other
information and/or controls. For example, the user may review ancestry 330 of an information
set named "Word does only" to see that the information set was created from a system-provided
information set containing all data objects by selecting files with a" .doc", ".docx", or any other
suitable extension indicating a Word document from that system provided information set.)

types of documents in the selected data source;
(Bishop [0038] User interface screen 300 may further include summary information 312 (e.g.,
total number of objects in the information set, size of the information set in megabytes, data and
time created, description, etc.), "Details" radio-button option 320 to enable a user to view details
(e.g., ancestry, execution log, data objects) of the selected information set, and other
information and/or controls. For example, the user may review ancestry 330 of an information
set named "Word does only" to see that the information set was created from a system-provided
information set containing all data objects by selecting files with a" .doc", ".docx", or any other suitable extension indicating a Word document from that systemprovided information
set.;
See also [0031] The gateway system further holds additional, and in some cases, more detailed
metadata and status information about application- level objects, and also maintains certain
types of data that are aggregated from the data server systems.)

and
when the selected data source includes a plurality of mailboxes, according to
organization, mailbox sizes and attachment percentages;

(e.g., specifying an identity, department, organization, vendor, product, custodian, object
properties, attributes, etc.) to encapsulate indexed data, create an information set of the indexed
data meeting the criteria, adjust the criteria to form a new information set, perform set
operations ( e.g., comparison, identifying changes, union, intersection, complement, symmetric
difference, etc.) on information sets, present reports of the results of the operations, and convert
the criteria to adaptors with filters to retrieve the data satisfying the criteria.).

It would have been obvious to one having ordinary skill in the art at the time the time of the
effective filing date to collecting summary information/metadata as taught by Bishop since it was
known in the art that collecting/harvesting systems collect metadata/summary information to
ensure efficiency, where costly or time-consuming operations are only performed when
necessary and only to a specified subset of the total data under management. For example,
Information Technology (IT) personnel may start by harvesting only system metadata from file,
email, collaboration, or other servers (Bishop [0025]).

Referring to claim 10, this dependent claim recites similar limitations as claim 4;
therefore, the arguments above regarding claim 4 are also applicable to claim 10.

Referring to claim 15, this dependent claim recites similar limitations as claim 2;
therefore, the arguments above regarding claim 2 are also applicable to claim 15.

Referring to claim 16, this dependent claim recites similar limitations as claim 3;
therefore, the arguments above regarding claim 3 are also applicable to claim 16.

Referring to claim 17, this dependent claim recites similar limitations as claim 4;



CONTACT INFORMATION
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EVAN S ASPINWALL whose telephone number is (571)270-7723. The examiner can normally be reached Monday-Friday 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Neveen Abel-Jalil can be reached on 571-270-0474. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Evan Aspinwall/Primary Examiner, Art Unit 2152                                                                                                                                                                                                         3/7/2022