Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Application 16/584,585 filed 9/26/2019 has been examined.
In this Office Action, claims 1-20 are currently pending.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.



Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an
abstract idea without significantly more.
Claim 1 recites: 
extracting tabular data with a domain specific library.
The limitation of extracting tabular data with a domain specific library, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting a processor, nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the processor language, extracting in the context of this claim encompasses the user manually extracting/converting generic table data using a generic library. Similarly, the limitations of detecting; recognizing; and mapping, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind 
Further, these concepts also recite “Certain Methods of Organizing Human Activity”; (such as
commercial or legal interactions (including agreements in the form of contracts; legal
obligations; advertising, marketing or sales activities or behaviors; business relations) where
extracting tabular data with a domain specific library is a method of human activity in commercial or legal interactions.
Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application. In particular, the claim only
recites one additional element – using processors/databases to perform both the detecting; recognizing; and mapping and extracting steps. The databases/processor in both steps is recited at a highlevel of generality (i.e., as a generic processor performing a generic computer function of extracting/mapping tabular data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more
than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a databases/processor to perform


Dependent claims 2-7 are merely add further details of the abstract steps/elements recited in
claim 1 without integrating the idea into a practical application; or including an improvement to
another technology or technical field, an improvement to the functioning of the computer itself,
or meaningful limitations beyond generally linking the use of an abstract idea to a particular
technological environment. Therefore, dependent claims 2-7 are also directed towards
nonstatutory subject matter.

As per independent claims 8 and 15, are also rejected as ineligible subject matter under 35
U.S.C. 101 for substantially the same reasons as the method claim(s) 1. The components (i.e.,
system/apparatus described in independent claims 8 and 15 do not provide for integrating the
abstract idea into a practical application. At best, the claim(s) are merely providing alternate
environments to implement the abstract idea.

Dependent claims 9-14, 16-20 merely add further details of the abstract steps/elements
recited in claim 1 without integrating the idea into a practical application; or including an
improvement to another technology or technical field, an improvement to the functioning of the
computer itself, or meaningful limitations beyond generally linking the use of an abstract idea to
a particular technological environment. Therefore, dependent claims 9-14, 16-20 are also
directed towards non-statutory subject matter.



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-3, 8-10, 15-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over 
Ackner et al., US Pub. No. 10,204,119.

As to claim 1, Ackner discloses a method of recognizing data in a table area from unstructured data 
(Ackner col. 2 ln. 50-54: Techniques for inferring a schema for a data input file are described herein. In an embodiment, a server computer system receives a data input file. The server computer system selects a subset of the file for performing the inference.)

comprising:

detecting, through at least one processor, at least one table area from an input stream of unstructured data 

col. 10 ln. 7-10: Identified header data may be used to identify names of columns in the data input file. For example, a header may include column names separated by column delimiters. After identifying the header data, the server computer system may search for a row of header data that includes column delimiters.;
see also col. 13 ln. 9-16: Validation may be performed against the sample excerpt as well as either the input data file or an entire dataset comprising the input data file, thereby identifying
any errors that may occur using the candidate schema. Identifying errors may include identifying data that does not conform to a data format type, identifying jagged rows in the data input file, and identifying effects of changes to the candidate schema.)

received over a computer network;
(Ackner col. 5 ln. 60-65: At step 250, a data input file is received. For example, host computing device 130 may upload a data input file over network 100 to server computer system 110. Additionally or alternatively, host computing device may send data to server computer system 110 identifying one or more data input files stored in data repository 124 to be used as the data input file)

recognizing, through at least one processor, at least one table header associated with the detected at least one table area;
(Ackner teaches identifying headers for columns and rows, i.e. a table area see  col. 4 ln. 54-57: Header identification instructions 120, in an embodiment, cause the server computer system 110 to identify one or more lines of header data in the data input file.;
see also col. 10 ln. 7-10: Identified header data may be used to identify names of columns in the data input file. For example, a header may include column names separated by column 

determining, through at least one processor, at least one column delimiter associated with each column of the detected at least one table area;
(Ackner teaches identifying column delimiters see col. 4 ln. 47-51: column delimiter identification instructions 116 causes the server computer system 110 to identify one or more symbols in the
data input file and store data identifying the one or more symbols as column delimiters.;
see also col. 10 ln. 7-10: Identified header data may be used to identify names of columns in the data input file. For example, a header may include column names separated by column delimiters. After identifying the header data, the server computer system may search for a row of header data that includes column delimiters.)

extracting, through at least one processor, at least one tabular data associated with the detected at least one table area
(Ackner teaches translating/converting input files with rows and columns, i.e. extracting tabular data areas see col. 7 ln. 40-50: At step 275, the column delimiter, row delimiter, and plurality of data format types are used to generate a candidate schema for the data input file. The candidate schema comprises a schema for translating received files into rows and columns that may be stored in a database and/or a columnar data store. The candidate schema identifies row 45 delimiters, column delimiters, and data format types for the columns of a data input file. The candidate schema may then be applied to the data input file to convert the data input file into a form that can be stored in the data repository of the server computer in an optimized manner )

 in association with at least one domain specific library; and

See also col. 6 ln. 44-48: The server computer system may attempt to match patterns in the
file to character encodings stored in the library. The server computer system may select an encoding that matches the highest percentage of patterns from the data input file.)

mapping, through at least one processor, the extracted tabular data to at least one target schema to store onto a relational database
(Ackner teaches translating/converting input files using a schema, i.e. mapping extracted tabular data to a schema see col. 7 ln. 40-50: At step 275, the column delimiter, row delimiter, and plurality of data format types are used to generate a candidate schema for the data input file. The candidate schema comprises a schema for translating received files into rows and columns that may be stored in a database and/or a columnar data store. The candidate schema identifies row delimiters, column delimiters, and data format types for the columns of a data input file. The candidate schema may then be applied to the data input file to convert the data input file into a form that can be stored in the data repository of the server computer in an optimized manner;
See also col. 2 ln. 58-60: Using the identified row delimiter, column delimiter, and plurality of data types, the server computer system generates a candidate schema.;).


It would have been obvious to one having ordinary skill in the art at the time the time of the effective filing date to apply schema inference and header identification as taught by Acker since it was known in the art that data translation systems provide header identification as Identifying the header data in the file can serve allows the server computer system to identify

identify column names.  (Acker col. 9 ln. 55-60).

As to claim 2, Ackner discloses the method of claim 1, wherein the at least one tabular data is the data in the detected at least one table area
(Ackner Fig. 2. See also sample excepts, i.e. data detected in a table area at col. 6 ln. 49-57: At step 255, a sample excerpt from the data input file is selected. The server computer system may be programmed or configured to select a particular portion of each dataset, such as by a size of file and/or a number of rows in the data input file. For example, the server computer system may initially identify the row delimiter described in step 260. Using the row delimiter, the server computer system may identify a plurality of rows of a dataset stored in the data input file.).

As to claim 3, Ackner discloses the method of claim 1, wherein the at least one domain specific library includes one or more of column vocabulary or rules to determine one or more column names
(Ackner teaches extracting the column names based on header analysis, i.e. a column “vocabulary” see Col. 10 ln. 7-21: Identified header data may be used to identify names of columns in the data input file. For example, a header may include column names separated by column delimiters. After identifying the header data, the server computer system may search for a row of header data that includes column delimiters. In an embodiment, the server computer system searches for rows that have a same number of instances of a column delimiter as the mode of instances of the column delimiter in the non-header rows. By identifying a row in the header information with the same number of column delimiters as the majority of other rows, the server computer system ensures that the column names can be matched to individual columns. 
See col. 9 ln. 50-60: The server computer system may identify header data in the sample excerpt. For example, if the sample excerpt includes a first portion of a data input file, the likelihood is the sample excerpt includes header data, such as a name of the document, names of the columns, or additional data stored with the document. Identifying the header data in the 
file can serve allows the server computer system to identify header information to a host computing device, improve the server computer's ability to identify column delimiters and data format types, and allow the server computer system to identify column names.).

Referring to claim 9, this dependent claim recites similar limitations as claim 2;
therefore, the arguments above regarding claim 2 are also applicable to claim 9.

Referring to claim 10, this dependent claim recites similar limitations as claim 3;
therefore, the arguments above regarding claim 3 are also applicable to claim 10.


Referring to claim 16, this dependent claim recites similar limitations as claim 2;
therefore, the arguments above regarding claim 2 are also applicable to claim 16.

Referring to claim 17, this dependent claim recites similar limitations as claim 3;
therefore, the arguments above regarding claim 3 are also applicable to claim 17.




Claims 4-7, 11-14 and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ackner et al., US Pub. No. 10,204,119, in view of Griffith et al., US Pub. No. 2019/0050445.

As to claim 4, Ackner does not disclose:
removing, through at least one processor, at least one outlier associated with the at
least one tabular data;


However, Griffith discloses the method of claim 1, further comprising:
removing, through at least one processor, at least one outlier associated with the at
least one tabular data
(Griffith teaches excluding non-compliant data, i.e. removing outliers see [0176] A non-compliant data attribute may be referred to as a data attribute that may be non-compliant with one or more values set forth in the analyzation data. For example, a detected numeric value that is more than 4 standard deviations from a mean value for a subset of data ( e.g., a column of data) may be deemed "an outlier" or "out-ofrange," and, thus, deemed non-compliant with a range of valid numeric values;
see also [0092] Other patterns or groupings of data may be identified as being non-conforming to an inferred set of data, and thereby be excluded from further consideration as a portion of the set of data.;
[0160] In yet another example, data representing a property may define "a numeric outlier" as an anomaly in dataset 2305a. In this case, the value of the data attribute may define
a threshold value ( or range of values) specifying that a numeric value in a cell in dataset 2305a is an "outlier" or "out-of-range," and thus may not be a valid value. …data remediation interface 2302 may present a user input selection with which interface 2302 may invoke an action to 
See also [0132] While not shown in FIG. 17, the system of layer files may be adaptive to add or remove data items).

It would have been obvious to one having ordinary skill in the art at the time the time of the effective filing date to apply automatic actions to outliers as taught by Griffith since it was known in the art that data analysis systems provide a dataset analyzer that may be configured
to automatically detect an anomalous condition, predict which one of several actions that may remediate the condition ( e.g., based on confidence levels a specific anomaly is identified and that the corrective action will remediate the problem), and automatically implement the corrective action where a user need not engage in ingestion of dataset (Griffith [0169]).

As to claim 5, Griffith discloses, under the rationale above, the method of claim 4, wherein the outlier is at least one of a document related text, watermark, and noise
(Griffith teaches detecting noise/non-compliant values in data [0182] Rows 2628 and 2631 set forth values NOISE_N and NOISE_T that may represent "noise" or gibberish. For example, a value of NOISE_N may include a likely placeholder number, such as Jenny's phone number "867-5309" from a song, and a value of NOISE_S may include likely placeholder text, such as "asdf' or "qwerty," respectively.;
See also [0182] For example, a dataset analyzer may identify a subset of data predominantly being numeric in nature, but detects a value that is non-numeric (e.g., text, other non-numbered characters, or non-N/A values).; see also [0175] as well as noise text or inadvertent text, such as "asdfasdf' or "qwerty," which may serve as placeholders.;.).


(Griffith [0049] and a dataset ingestion controller 120, which, in tum, is shown to include an
inference engine 132, a format converter 134, and a layer data generator 136. In some examples, format converter 134 may be configured to receive data representing a set of data
104 having, for example, a particular data format, and may be further configured to convert dataset 104 into a collaborative data format for storage in a portion of data arrangement
142a in repository 140. Set of data 104 may be received in the following examples of data formats: CSV, XML, JSON, XLS, MySQL, binary, free-form, unstructured data formats (e.g., data extract from a PDF file using optical character recognition), etc., among others).

As to claim 7, Griffith discloses, under the rationale above, the method of claim 6, wherein the digitized document is one or more of an
image file or PDF (Griffith [0049] and a dataset ingestion controller 120, which, in tum, is shown to include an inference engine 132, a format converter 134, and a layer data generator 136. In some examples, format converter 134 may be configured to receive data representing a set of data 104 having, for example, a particular data format, and may be further configured to convert dataset 104 into a collaborative data format for storage in a portion of data arrangement
142a in repository 140. Set of data 104 may be received in the following examples of data formats: CSV, XML, JSON, XLS, MySQL, binary, free-form, unstructured data formats (e.g., data extract from a PDF file using optical character recognition), etc., among others).

Referring to claim 11, this dependent claim recites similar limitations as claim 4;
therefore, the arguments above regarding claim 4 are also applicable to claim 11.


therefore, the arguments above regarding claim 5 are also applicable to claim 12.

Referring to claim 13, this dependent claim recites similar limitations as claim 6;
therefore, the arguments above regarding claim 6 are also applicable to claim 13.

Referring to claim 14, this dependent claim recites similar limitations as claim 7;
therefore, the arguments above regarding claim 7 are also applicable to claim 14.

Referring to claim 18, this dependent claim recites similar limitations as claim 4;
therefore, the arguments above regarding claim 4 are also applicable to claim 18.

Referring to claim 19, this dependent claim recites similar limitations as claim 5;
therefore, the arguments above regarding claim 5 are also applicable to claim 19.

As to claim 20, Griffith discloses, under the rationale above, the apparatus of claim 15, wherein:
the unstructured data comprises one or more of a free form document, digitized
document, scanned document, document with a predefined layout, or document without a
predefined layout; 
(Griffith [0049] and a dataset ingestion controller 120, which, in tum, is shown to include an
inference engine 132, a format converter 134, and a layer data generator 136. In some examples, format converter 134 may be configured to receive data representing a set of data
104 having, for example, a particular data format, and may be further configured to convert dataset 104 into a collaborative data format for storage in a portion of data arrangement

and
the digitized document comprises one or more of an image file or PDF
(Griffith [0049] and a dataset ingestion controller 120, which, in tum, is shown to include an inference engine 132, a format converter 134, and a layer data generator 136. In some examples, format converter 134 may be configured to receive data representing a set of data 104 having, for example, a particular data format, and may be further configured to convert dataset 104 into a collaborative data format for storage in a portion of data arrangement
142a in repository 140. Set of data 104 may be received in the following examples of data formats: CSV, XML, JSON, XLS, MySQL, binary, free-form, unstructured data formats (e.g., data extract from a PDF file using optical character recognition), etc., among others).


CONTACT INFORMATION
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EVAN S ASPINWALL whose telephone number is (571)270-7723.  The examiner can normally be reached on Monday-Friday 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Neveen Abel-Jalil can be reached on 571-270-0474.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.




/Evan Aspinwall/Primary Examiner, Art Unit 2152                                                                                                                                                                                                        3/22/2021