DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Specification
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.

Response to Request for Continued Examination
The response filed on 8/9/2022 has been entered and made of record. Claims 1, 3, 10-11, 13, and 20 are amended. Claims 1-20 are pending.
The previous rejections of claims 1-20 under 35 USC 102 under Duta have been withdrawn as necessitated by amendment. New rejections have been made under 35 USC 103 under Duta in view of Wnek.



Drawings
The drawings filed 2/22/2021 were accepted.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Duta (US 20200117944 A1; filed 10/10/2018) in view of Wnek (US 20070053611 A1; filed 10/10/2006).

With regards to claim 1, Duta discloses a computer-implemented method for extracting text information from an electronic document (Duta, abstract: “Systems, methods, and computer-executable instructions for extracting key value data”), comprising: obtaining text and one or more document features of the electronic document (Duta, abstract: “Optical character recognition (OCR) text of a document is received. The y-coordinate of characters are adjusted to a common y-coordinate. The rows of OCR text are tokenized into tokens based on a distance between characters. The tokens are ordered based on the x,y coordinates of the characters”); clustering text… (Duta, paragraph 15: “The tokens can then be used to cluster 150 the forms into cluster models 160. For example, the ordered tokens between forms may be used to cluster forms based on similarity”); identifying one or more text strings from the text in the… groups as one or more keys, wherein identifying the one or more text strings is based on the clustering (Duta, paragraph 16: “For each cluster, a cluster model can be generated. The cluster model can include tokens that are considered keys for that cluster. A key of a document, is a token that is identified as a key that may have a corresponding value.”); generating one or more key/value pairs, including associating one or more values to the one or more keys, wherein a value includes text outside of the one or more identified text strings (Duta, paragraph 16: “A key of a document, is a token that is identified as a key that may have a corresponding value. The value is one or more tokens that are associated with the key.” Paragraph 19: “The cluster model is used to determine keys 220 for the document 210.” Paragraph 21: “The token and the key can then be marked as paired or removed from the list of tokens of the document”); and outputting the one or more key/value pairs in a computer-readable format (Duta, paragraph 48: “At 590, the keys and combined values can be stored. For example, the generated table can be stored to a file.”).
However, Duta does not disclose clustering the text from the electronic document into multiple groups based on the one or more document features… identifying one or more text strings from the text in the multiple groups.
Wnek teaches clustering the text from the electronic document into multiple groups based on the one or more document features… identifying one or more text strings from the text in the multiple groups (Wnek, abstract: “method for extracting information;” paragraph 38: “Preferred embodiments of the present invention employ a paradigmatic structure type, i.e., the Flexrecord, to represent structural patterns typically found, for example, in documents such as credit card bills and insurance payment summaries;” the records taught by Wnek are being interpreted as the clusters of the claims; paragraph 37: “A wide variety of printed documents exhibit complex data structures including textual or numerical data organized in rows and columns, i.e., tables, along with more general structures, one of which may be broadly characterized as consisting of one or more contextually-related elements including possibly tables, i.e., records”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined Duta and Wnek such that data from records (such as tables) are grouped together in clusters prior to being extracted. This would have enabled the invention to extract structured data from “complex, but mostly regular, data structures” (Wnek, paragraph 4) and to bind the elements together based on their relationships (Wnek, paragraph 37: “Contextual relationship typically binds the data elements together in such structures”).


With regards to claim 2, which depends on claim 1, Duta discloses the one or more document features include: locations of the text in the electronic document; and locations of one or more boxes identified in the electronic document (Duta, paragraph 12: “The OCR system 120 may also extract non-text features such as table borders. In various examples, the output from the OCR system 120 is a list of characters and as associated x,y coordinate of where each character is located within a document.”), wherein clustering is based on the locations of the text with reference to the locations of the one or more boxes (Duta, paragraph 12: “The OCR system 120 may also extract non-text features such as table borders. In various examples, the output from the OCR system 120 is a list of characters and as associated x,y coordinate of where each character is located within a document.” Duta, paragraph 15: “The tokens can then be used to cluster 150 the forms into cluster models 160. For example, the ordered tokens between forms may be used to cluster forms based on similarity”; Duta, Fig. 1: the OCR data is aligned, tokenized then formed into clusters, thus the clusters are based in part on the alignment of OCR data which includes the table borders).

With regards to claim 3, which depends on claim 2, Duta discloses wherein identifying the one or more text strings as one or more keys includes: identifying candidate text strings from the text in a group…; determining a similarity score for each candidate text string based on a content similarity of the candidate text string (Duta, paragraph 15: “For example, the ordered tokens between forms may be used to cluster forms based on similarity. The similarity may be based on forms that include tokens in similar positions. A link between two forms is found when both forms include the same token. In an example, tokens are considered to be the same when the tokens are within a certain string distance/difference between them; thus, allowing fuzzy token matching.”); and determining one or more of the candidate text strings as one or more keys based on the similarity score (Duta, paragraph 16: “A key of a document, is a token that is identified as a key that may have a corresponding value. The value is one or more tokens that are associated with the key. In an example, keys are identified from tokens when a token is found in a predetermined number of forms within the cluster.”).
Duta does not disclose yet Wnek teaches of the multiple groups (Wnek, paragraph 38: “Preferred embodiments of the present invention employ a paradigmatic structure type, i.e., the Flexrecord, to represent structural patterns typically found, for example, in documents such as credit card bills and insurance payment summaries”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined Duta and Wnek such that data from records (such as tables) are grouped together in clusters prior to being extracted. This would have enabled the invention to extract structured data from “complex, but mostly regular, data structures” (Wnek, paragraph 4) and to bind the elements together based on their relationships (Wnek, paragraph 37: “Contextual relationship typically binds the data elements together in such structures”).

With regards to claim 4, which depends on claim 3, Duta discloses filtering the one or more keys after determining one or more of the candidate text strings as one or more keys, including: detecting one or more outlier keys in the one or more keys based on one or more of: one or more font properties of an outlier key in the electronic document, wherein the one or more document features includes the one or more font properties; or a proximity of the outlier key to the one or more keys in the electronic document; and removing the one or more outlier keys from the one or more keys (Duta, paragraph 33: “Some tokens, such as the token 452, do not have any corresponding values… Accordingly, the token 452 can be used during the value extraction to limit values for keys. For example, a value to a key cannot have an intermediate key between the key and the possible value”).

With regards to claim 5, which depends on claim 3, Duta discloses wherein associating one or more values to the one or more keys is based on one or more of: a proximity of a value to a key in the electronic document (Duta, paragraph 21: “In addition, the distance between the key and a token may be used to determine if a token is a value for the key.”); or a content constraint on values associated with the key.

With regards to claim 6, which depends on claim 5, Duta discloses wherein associating one or more values to the one or more keys includes: identifying a first candidate value for a key; determining a first distance between the first candidate value and the key; determining whether the first distance is within a threshold distance from the key; and associating the first candidate value to the key based on determining that the first distance is within the threshold distance (Duta, paragraph 21: “In addition, the distance between the key and a token may be used to determine if a token is a value for the key… For example, a first iteration may use an initial distance. If a token is less than the initial distance from a key, the token is considered a value for the key.”), wherein the key remains unassociated with a value when the first distance is greater than the threshold distance (Duta, paragraph 22: “In some examples, there is a limit such that the value can only include tokens within a maximum distance from the key.”).

With regards to claim 7, which depends on claim 6, Duta discloses wherein associating one or more values to the one or more keys further includes: identifying a plurality of candidate values for the key, wherein the plurality of candidate values includes the first candidate value; and determining a distance between each of the plurality of candidate values and the key, wherein the first distance is the smallest distance from the determined distances (Duta, paragraph 30: “The distance between the token 412 and the token 414 can be determine and if the distance is less than the distance parameter then the token 414 is considered the value of the key 412, Account #. The token 416 and the token 418 are also close enough such that the token 418 is considered to be the value for the key 416.” Note: the claims state that the first distance is the smallest distance, but that is not used to perform any action (such as selecting the value for the key/value pair). As such, the distance between keys/value candidates that is the shortest can just be labeled as the first distance.).

With regards to claim 8, which depends on claim 5, Duta discloses further comprising generating a plurality of values including the one or more values associated to the one or more keys, wherein generating the plurality of values includes one or more of: generating a multi-tuple included in the one or more values, including: dividing the text not included in the one or more keys in the electronic document into a plurality of 1-tuples; and combining a portion of the plurality of 1-tuples into the multi-tuple based on one or more domain specific rules for the electronic document, wherein the multi-tuple is included in the plurality of values (Duta, paragraph 31: “After the token 442 is identified as a value, additional tokens below the token 440 can be identified from the list of tokens extracted from the invoice 400. The possible additional tokens are tokens that have not been selected as values and are also not keys. One such token is token 444. The token 444 is below the token 442 and overlaps along the x-axis with the token 442. The vertical distance the token 442 and the token 444 is determined and if the distance is smaller than a vertical threshold, the token is added to the value.”); or segmenting values associated with an address key based on one or more address format rules.

With regards to claim 9, which depends on claim 2, Duta discloses wherein obtaining the text and one or more document features includes: obtaining the electronic document; and extracting the text and the one or more document features from the electronic document (Duta, Fig. 1: OCR 120; Duta, abstract: “Optical character recognition (OCR) text of a document is received. The y-coordinate of characters are adjusted to a common y-coordinate”).

With regards to claim 10, which depends on claim 2, Duta discloses further comprising filtering the text included in the… groups before identifying the one or more text strings (Duta, paragraph 65: “In Example 6, the subject matter of Example 5 includes, determining the token includes a separator character; and removing the separator character and any characters following the separator character from the token prior to counting the token.”).
Duta does not disclose yet Wnek teaches multiple groups (Wnek, paragraph 38: “Preferred embodiments of the present invention employ a paradigmatic structure type, i.e., the Flexrecord, to represent structural patterns typically found, for example, in documents such as credit card bills and insurance payment summaries”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined Duta and Wnek such that data from records (such as tables) are grouped together in clusters prior to being extracted. This would have enabled the invention to extract structured data from “complex, but mostly regular, data structures” (Wnek, paragraph 4) and to bind the elements together based on their relationships (Wnek, paragraph 37: “Contextual relationship typically binds the data elements together in such structures”).


Claims 11-20 recite substantially similar limitations to claims 1-10 respectively and are thus rejected along the same rationales.


Response to Arguments
Applicant's arguments filed 8/9/2022 regarding claims 1 and 11 have been fully considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Examiner agrees with applicant’s remarks that Duta does not disclose the amended claims, because Duta does not group the text of a single document into multiple groups prior to extracting the key/value pairs. However, upon further search, Wnek was found to teach the grouping of related data into ‘records’ in order to extra the data in a structured format. Thus a new 35 USC 103 rejection has been made above over Duta in view of Wnek.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRODERICK C ANDERSON whose telephone number is (313)446-6566. The examiner can normally be reached Monday-Friday 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Hong can be reached on 5712724124. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/B.C.A/Examiner, Art Unit 2178                                                                                                                                                                                                        

                                                                                                                                                                                        /STEPHEN S HONG/Supervisory Patent Examiner, Art Unit 2178