DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Information Disclosure Statement
The information disclosure statement (IDS) was filed on 10/10/2019.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Specification
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.

Drawings
The drawings filed 10/10/2019 were accepted. 

Claim Objections
Claims 2-5, 9-12, and 16-19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 6-8, 13-15, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Buisson (US 20190171704 A1; filed 12/1/2017) in view of Xin (US 20190065460 A1; filed 8/31/2017) and Singh (US 20110249905 A1; filed 1/15/2010).

With regards to claim 1, Buisson discloses a computer-implemented method comprising: … term included in each cell from a plurality of cells associated with one or more tabular structures in one or more documents; … with the one or more tabular structures from the one or more documents from the one or more documents based on the context of each cell from the plurality of cells associated with the one or more tabular structures in the one or more documents, wherein the one or more tabular structures in the one or more documents is identified by parsing the one or more documents (Buisson, paragraph 26: “extract a potential data zone 12 associated with the detected table headers 11 by using raw text analysis and/or parsing logic to identify and classify data 12 that is part of the table”); …; and detecting one or more semantic headers associated with the one or more tabular structures from the one or more documents … (Buisson, paragraph 14: “detecting and extracting table data from input documents by using semantic analysis to identify table headers and raw text analysis to extract table data.”).
However, Buisson does not disclose receiving a custom pre-trained embeddings model, wherein the received custom pre-trained embeddings model provides a context associated with each term… computing one or more dot product values associated… generating one or more similarity feature values based on the computed one or more dot product values, wherein the computed one or more dot product values are normalized; and detecting one or more semantic pieces of content based on the one or more similarity feature values. 
Xin teaches receiving a custom pre-trained embeddings model, wherein the received custom pre-trained embeddings model provides a context associated with each term (Xin, abstract: “The character-to-word expressions are combined with pre-trained word embeddings. The combined character-to-word expressions and pre-trained word embeddings are fed into one or more bidirectional long short-term memories to learn contextual information for each of the plurality of words.”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined Buisson and Xin such that the headers are identified at least in part using a neural network that receives a pre-trained embeddings model. This would have enabled the invention to more accurately determine semantics without having to train it (Xin, paragraph 42: “it is difficult to train randomly initialized word embeddings to accurately capture the word semantics. Therefore, in an example embodiment, word embeddings pretrained on large-scale unlabeled corpora are leveraged.”).
Singh teaches computing one or more dot product values associated… generating one or more similarity feature values based on the computed one or more dot product values, wherein the computed one or more dot product values are normalized; and detecting one or more semantic pieces of content based on the one or more similarity feature values (Singh, paragraph 179: “The ranking system calculates the cosine distance of two vectors V1 and V2 as: cos θ=(V1·V2)/(|V1|*|V2|)” Singh, paragraph 180: “When the cosine distance nears 0, that means the vectors are orthogonal and when it nears 1 it means the vectors are in the same direction or similar.” Singh, paragraph 181: “The class which has the maximum cosine distance with the form is the class to which the form is classified.”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined Buisson and Singh such that the normalized dot product (or cosine distance) is used to determine how similar text objects are to each other, and can be used to determine a header in a table. This would have enabled the invention to correctly identify and label data (Singh, paragraph 124: “The data extraction system 452 uses OCR data extraction, non-OCR visual recognition, contextual feature matching, business intelligence and output formatting, all with machine learning elements, to accurately extract and present data from a wide range of documents.”).

	With regards to claim 6, which depends on claim 1, Buisson does not disclose yet Singh teaches combining the generated one or more similarity feature values associated with the one or more tabular structures; transmitting, to a machine learning (ML) classifier, the combined one or more similarity feature values associated with the one or more tabular structures; and classifying a plurality of contents associated with a plurality of cells from the one or more tabular structures (Singh, paragraph 165: “A feature vector that consists of all OCR text that matches the dictionary is passed to an SVM-based classifier to determine the document class.”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined Buisson and Singh such that the similarity feature values (or cosine distances) are sent to a machine learning classifier to be classified. This would have enabled the invention to correctly identify and label data (Singh, paragraph 124: “The data extraction system 452 

With regards to claim 7, which depends on claim 1, Buisson does not disclose yet Singh teaches identifying one or more groups of similar records in a clustering model based on the generated one or more similarity feature values (Singh, paragraph 168: “The K-means algorithm is applied to each block to generate 150 clusters. The mean of these clusters is taken as the representative codeword for that cluster.”); labeling the one or more tabular structures based on the identified one or more groups of similar records; and storing the labeled one or more tabular structures in the clustering model, wherein the clustering model includes a database (Singh, paragraph 169: “A test document is encoded to the feature vector form, and its Euclidean distance is computed from each of the clusters. The labels of the closest clusters are assigned to the document.”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined Buisson and Singh such that the similarity feature values (or cosine distances) are used to cluster the features of the tabular structures. This would have enabled the invention to correctly identify and label data (Singh, paragraph 124: “The data extraction system 452 uses OCR data extraction, non-OCR visual recognition, contextual feature matching, business intelligence and output formatting, all with machine learning elements, to accurately extract and present data from a wide range of documents.”).

Claim 8 recites substantially similar limitations to claim 1 and is thus rejected along the same rationale.


Claim 14 recites substantially similar limitations to claim 7 and is thus rejected along the same rationale.

Claim 15 recites substantially similar limitations to claim 1 and is thus rejected along the same rationale.

Claim 20 recites substantially similar limitations to claim 6 and is thus rejected along the same rationale.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Cohen (US 20170132526 A1): Extracts data from images using dot products and machine learning with pre-trained embedding models.
Srinivasan (US 20060288268 A1): Uses AI to extract tables, parse, interpret, and standardize them.
Furmaniak (US 20080107338 A1): Teaches using cosine distances to compare two blocks of text to find similarities.
Kandekar (US 20120078612 A1): Teaches determining the semantic relatedness of text segments using cosine distance.
Fang et al (Fang, Jing, et al. “Table Header Detection and Classification.” Association for the Advancement of Artificial Intelligence, 2012, pp. 599–605.): Teaches using artificial intelligence to detect table headers.
Mohammadzadeh (Mohammadzadeh, Hadi & Gottron, Thomas & Schweiggert, Franz & Heyer, Gerhard. (2012). TitleFinder: Extracting the Headline of News Web Pages based on Cosine Similarity and Overlap Scoring Similarity. 65-72. 10.1145/2389936.2389950.): Teaches extracting the headline of news pages using cosine similarity.
Hu et al (US 20030097384 A1): Teaches creating a cluster tree and calculating the distance between the clusters using Euclidean distances.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRODERICK C ANDERSON whose telephone number is (313)446-6566.  The examiner can normally be reached on Monday-Friday 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Hong can be reached on 5712724124.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 




/B.C.A/Examiner, Art Unit 2178                                                                                                                                                                                                        

/STEPHEN S HONG/Supervisory Patent Examiner, Art Unit 2178