DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
Applicant is reminded of the proper language and format for an abstract of the disclosure.
The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words in length. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.
The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc.  In addition, the form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided.
The abstract of the disclosure is objected to because it recites "Aspects of the disclosure" and "A method of the disclosure" in lines 1-2.  Correction is required.  See MPEP § 608.01(b).
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –



Claims 1-4, 6-9, 11-14, and 16-19 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Cristescu (U.S. PG-PUB NO. 2021/0012102 A1).
-Regarding claim 1, Cristescu discloses a method (Abstract FIG. 4 FIG. 13), comprising: obtaining a plurality of symbol sequences of a document ([0033] “a sequence of characters” FIG. 4 image 20 token 30 indicator 31 [0037]), the document having a plurality of text fields ([0032] “a set of text fields”; FIG. 2 [0034]-[0035]); determining a plurality of vectors, wherein a vector of the plurality of vectors is representative of one of the plurality of symbol sequences (FIG. 4 extractor 44 FIG. 9 encoder 55 [0051] “character embedding vectors”, “corresponding to…character of token 30”); processing, by a processing device (FIG. 4 extractors 44 46 FIG. 9 FIG.11 [0050] [0064]-[0065]), the plurality of vectors using a first neural network to obtain a plurality of recalculated vectors (FIGS. 9-10 convolver 57 encoder 59 text feature vector 62 [0052] [0055] “neural network”; FIGS 11-12 feature vector 64), wherein each of the plurality of recalculated vectors is recalculated based on values of the plurality of vectors (FIG. 4 token embedding vector 60 [0070]; FIG. 10 embedding vector, feature vector 62; FIG. 12 feature vector 64); determining, by the processing device, an association between a first recalculated vector of the plurality of recalculated vectors and a first text field of the plurality of text fields ([0058] “extractor 44 is trained to produce … vector 62 indicative of a field type of … token 30”;[0069]), wherein the first recalculated vector is representative of a first symbol sequence of the plurality of symbol sequences (FIG. 4 vector 60 [0070] FIG. 10 vector 62 token 30; FIG. 12); and determining, based on the association between the first recalculated vector and the first text field, an association between the first symbol sequence and the first text field (Abstract; FIG. 4 classifier 50 indicator 54; [0071] “likelihood…respective token belongs to a particular field type” [0058] [0069]).
-Regarding claim 2, Cristescu further discloses wherein obtaining the plurality of symbol sequences of the document comprises: performing optical character recognition (OCR) of an image of the document to obtain an OCR text of the document; and partitioning the OCR text into the plurality of symbol sequences (FIG. 4 OCR engine 42 [0037]).
-Regarding claim 3, Cristescu further discloses wherein determining the plurality of vectors representative of the plurality of symbol sequences comprises determining, using a second neural network, a word embedding for each one of the plurality of symbol sequences (FIG. 4 extractors 44 46 FIG. 9 encoder 55 [0051] “may comprise an artificial multilayer neural network” “token 30” “embedding vectors” FIG. 11).
-Regarding claim 4, Cristescu further discloses wherein the first neural network comprises a plurality of subnetworks, wherein each of the subnetworks comprises at least one hidden layer of neurons (FIG. 9 [0052] [0055] [0058] FIG. 11 [0065] [0069]).
-Regarding claim 6, Cristescu further discloses wherein determining the association between the first recalculated vector of the plurality of recalculated vectors and the first text field of the plurality of text fields comprises (Abstract; FIG. 4; FIG. 13): generating, by a third neural network (FIG. 4 classifier 50; FIG. 13 step 216), a plurality of association hypotheses, wherein each of the association hypotheses comprises an association of one of the plurality of recalculated vectors with the first text FIG. 4 indicator 35; FIG. 13 step 205 [0071] [0075]); determining a probability of occurrence for each of the plurality of association hypotheses (FIG. 4 indicator 54; FIG. 13 step 216-218; [0071] “indicating a likelihood” [0076]-[0077]); and determining that a first association hypothesis of the plurality of association hypotheses has a highest probability of occurrence, wherein the first association hypothesis comprises an association of the first recalculated vector with the first text field (FIG. 4 engine 52; FIG. 13 step 220; [0071] [0073] [0078]).
-Regarding claim 7, Cristescu further discloses comprising: determining, based on the association between the first symbol sequence and the first text field, a location of the first text field; and storing the location of the first text field (FIG. 2 [0034]-[0037]; FIG. 4 indicator 31; FIG. 5).
-Regarding claim 8, Cristescu further discloses wherein the location of the first text field comprises coordinates of the first symbol sequence ([0037] FIG. 5).
-Regarding claim 9, Cristescu further discloses wherein the location of the first text field comprises a placement of the first symbol sequence relative to at least one other symbol sequence of the plurality of symbol sequences (FIGS. 6-7).
-Regarding claim 11, Cristescu discloses a non-transitory machine-readable storage medium including instructions that, when accessed by a processing device, cause the processing device to (FIG. 15 device 80 unit 84 processor 82 [0085] Abstract): obtain a plurality of symbol sequences of a document ([0033] “a sequence of characters”; FIG. 4 image 20 token 30 indicator 31 [0037]), the document having a plurality of text fields ([0032] “a set of text fields”; FIG. 2 [0034]-[0035]); determine a plurality of vectors, wherein a vector of the plurality of vectors is representative of one FIG. 4 extractor 44 FIG. 9 encoder 55 [0051] “character embedding vectors”, “corresponding to…character of token 30”); process, by a processing device (FIG. 4 extractors 44 46 FIG. 9 FIG.11 [0050] [0064]-[0065]), the plurality of vectors using a first neural network to obtain a plurality of recalculated vectors (FIGS. 9-10 convolver 57 encoder 59 text feature vector 62 [0052] [0055] “neural network”; FIGS 11-12 feature vector 64), wherein each of the plurality of recalculated vectors is recalculated based on values of the plurality of vectors (FIG. 4 token embedding vector 60 [0070]; FIG. 10 embedding vector, feature vector 62; FIG. 12 feature vector 64); determine, by the processing device, an association between a first recalculated vector of the plurality of recalculated vectors and a first text field of the plurality of text fields ([0058] “extractor 44 is trained to produce … vector 62 indicative of a field type of … token 30”;[0069]), wherein the first recalculated vector is representative of a first symbol sequence of the plurality of symbol sequences (FIG. 4 vector 60 [0070] FIG. 10 vector 62 token 30; FIG. 12); and determine, based on the association between the first recalculated vector and the first text field, an association between the first symbol sequence and the first text field (Abstract; FIG. 4 classifier 50 indicator 54; [0071] “likelihood…respective token belongs to a particular field type” [0058] [0069]).
-Regarding claim 12, Cristescu further discloses wherein to cause the processing device to obtain the plurality of symbol sequences of the document the instructions are to cause the processing device to: perform optical character recognition (OCR) of an image of the document to obtain an OCR text of the document; and partition the OCR text into the plurality of symbol sequences (FIG. 4 OCR engine 42 [0037]; FIG. 15).
Abstract; FIG. 4; FIG. 13; FIG. 15): generate, by a third neural network (FIG. 4 classifier 50; FIG. 13 step 216), a plurality of association hypotheses, wherein each of the association hypotheses comprises an association of one of the plurality of recalculated vectors with the first text field FIG. 4 indicator 35; FIG. 13 step 205 [0071] [0075]); determine a probability of occurrence for each of the plurality of association hypotheses (FIG. 4 indicator 54; FIG. 13 step 216-218; [0071] “indicating a likelihood” [0076]-[0077]); and determine that a first association hypothesis of the plurality of association hypotheses has a highest probability of occurrence, wherein the first association hypothesis comprises an association of the first recalculated vector with the first text field (FIG. 4 engine 52; FIG. 13 step 220; [0071] [0073] [0078]).
-Regarding claim 14, Cristescu further discloses wherein the instructions are further to cause the processing device to: determine, based on the association between the first symbol sequence and the first text field, a location of the first text field; and store the location of the first text field (FIG. 2 [0034]-[0037]; FIG. 4 indicator 31; FIG. 5; FIG. 15).
-Regarding claim 16, Cristescu discloses a system comprising : a memory; and a processing device operatively coupled to the memory, the processing device to (Abstract FIG. 15 device 80 unit 84 processor 82 [0085]): obtain a plurality of symbol sequences of a document ([0033] “a sequence of characters”; FIG. 4 image 20 token 30 indicator 31 [0037]), the document having a plurality of text fields ([0032] “a set of text fields”; FIG. 2 [0034]-[0035]); determine a plurality of vectors, wherein a vector of the plurality of vectors is representative of one of the plurality of symbol sequences (FIG. 4 extractor 44 FIG. 9 encoder 55 [0051] “character embedding vectors”, “corresponding to…character of token 30”); process, by a processing device (FIG. 4 extractors 44 46 FIG. 9 FIG.11 [0050] [0064]-[0065]), the plurality of vectors using a first neural network to obtain a plurality of recalculated vectors (FIGS. 9-10 convolver 57 encoder 59 text feature vector 62 [0052] [0055] “neural network”; FIGS 11-12 feature vector 64), wherein each of the plurality of recalculated vectors is recalculated based on values of the plurality of vectors (FIG. 4 token embedding vector 60 [0070]; FIG. 10 embedding vector, feature vector 62; FIG. 12 feature vector 64); determine, by the processing device, an association between a first recalculated vector of the plurality of recalculated vectors and a first text field of the plurality of text fields ([0058] “extractor 44 is trained to produce … vector 62 indicative of a field type of … token 30”;[0069]), wherein the first recalculated vector is representative of a first symbol sequence of the plurality of symbol sequences (FIG. 4 vector 60 [0070] FIG. 10 vector 62 token 30; FIG. 12); and determine, based on the association between the first recalculated vector and the first text field, an association between the first symbol sequence and the first text field (Abstract; FIG. 4 classifier 50 indicator 54; [0071] “likelihood…respective token belongs to a particular field type” [0058] [0069]).
-Regarding claim 17, Cristescu further discloses wherein to obtain the plurality of symbol sequences of the document the processing device is to: perform optical FIG. 4 OCR engine 42 [0037]; FIG. 15).
-Regarding claim 18, Cristescu further discloses wherein to determine the association between the first recalculated vector of the plurality of recalculated vectors and the first text field of the plurality of text fields the processing device is to (Abstract; FIG. 4; FIG. 15): generate, by a third neural network (FIG. 4 classifier 50; FIG. 13 step 216), a plurality of association hypotheses, wherein each of the association hypotheses comprises an association of one of the plurality of recalculated vectors with the first text field (FIG. 4 indicator 35; FIG. 13 step 205 [0071] [0075]); determine a probability of occurrence for each of the plurality of association hypotheses (FIG. 4 indicator 54; FIG. 13 step 216-218; [0071] “indicating a likelihood” [0076]-[0077]); and determine that a first association hypothesis of the plurality of association hypotheses has a highest probability of occurrence, wherein the first association hypothesis comprises an association of the first recalculated vector with the first text field (FIG. 4 engine 52; FIG. 13 step 220; [0071] [0073] [0078]).
-Regarding claim 19, Cristescu further discloses wherein the processing device is further to: determine, based on the association between the first symbol sequence and the first text field, a location of the first text field; and store the location of the first text field (FIG. 2 [0034]-[0037]; FIG. 4 indicator 31; FIG. 5; FIG. 15).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claims 10, 15 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Cristescu (U.S. PG-PUB NO. 2021/0012102 A1) in view of  Sousa (WO 2013/135474 A1).
-Regarding claims 10, 15 and 20, Cristescu discloses the methods of claims 7, 11, and 16.
 Cristescu is silent to teach obtaining a subsequent document; obtaining a symbol sequence of the subsequent document; determining that the symbol sequence of the subsequent document has a location in the subsequent document that coincides, within a pre-determined accuracy, with the location of the first text field; and associating the symbol sequence of the subsequent document with the first text field.
In the same field of endeavor, Sousa teaches obtaining a subsequent document (Sousa: FIG. 1 set (2) (12)); obtaining a symbol sequence of the subsequent document (Sousa: FIG. 2 document (3) keywords (7)); determining that the symbol sequence of the subsequent document has a location in the subsequent document (Sousa: FIG. 2 table 22) that coincides, within a pre-determined accuracy (Sousa: page 12, line 23-27), with the location of the first text field; and associating the symbol sequence of the subsequent document with the first text field (Sousa: page 11 line 12-15, line 24, line 32-35; page 12, line 5-7, line 15-27).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Cristescu .
Allowable Subject Matter
Claim 5 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIAO LIU whose telephone number is (571)272-4539.  The examiner can normally be reached on Monday-Thursday and Alternate Fridays 8:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nay Maung can be reached on (571) 272-7882.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-






/XIAO LIU/Examiner, Art Unit 2664                                                                                                                                                                                                        

/PING Y HSIEH/Primary Examiner, Art Unit 2664