DETAILED ACTION
Status of Claims
This is a final office action on the merits in response to the amendments and arguments filed on 3 January 2022. 
Claims 1, 4, 13, 14, and 20 were amended. Claims 1-20 are currently pending and have been examined. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 6-12 and 15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. Claims not listed below are rejected for dependency.

Claim 6 recites “a first plurality of word features.” The specification does not define the term “word features”, and it does not appear to be a term of art. Further, in view of the specification, one of ordinary skill in the art would not be able to reasonably determine the boundaries of the term. As such, one of ordinary skill in the art would not be able to determine the scope of the claim. Therefore the claim is rejected as indefinite. Claim 15 is similarly rejected. 


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 2, 13, and 20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Yang et al. (US 2019/0026550 A1).

Regarding Claim 1, 13, and 20: Yang discloses a method, comprising:
extracting a plurality of features from an electronic document, the plurality of features comprising a plurality of symbolic vectors representative of words in an image (Text contained within an electronic document might be obtained using, for example, optical character recognition ("OCR") techniques. The page segmentation application transforms the obtained text into textual feature representations. See at least [0022]. Also: A textual feature representation represents a text portion of a document in a vector space. See at least [0007])
processing the plurality of features using a neural network; receiving an output of the neural network, wherein the output indicates a probability of a text field in the electronic document corresponding to one of a plurality of field types; detecting, by a processing device, a plurality of text fields in the electronic document based on the output of the neural network (The page segmentation generates an output page segmentation of the electronic document by applying a neural network to the textual feature representation and the visual feature representation. See at least [0008]. Also: An example of a page segmentation 600 is shown in FIG. 6. The page segmentation 600 includes text object 601, text object 602, and text object 605, generated by the page segmentation application 106 and corresponding to text object 401, text object 402, and text object 405 depicted in FIG. 4. See at least [0081]. Also: The page segmentation application outputs a classification of the set of pixels as a particular document object type based on a correspondence between a location of the set of pixels in the electronic document and a location of the particular document object type in the output page is a text field. The claimed indication of probability has not limit on the possible range of probabilities, and an indication that an element is something is the same as an indication that there is a 100% probability that an element is something.
assigning, by the processing device, each of the plurality of text fields to one of the plurality of field types based on the output of the neural network (The page segmentation application outputs a classification of the set of pixels as a particular document object type based on a correspondence between a location of the set of pixels in the electronic document and a location of the particular document object type in the output page segmentation. See at least [0008]). 
a memory; and a processing device operatively coupled to the memory (See at least [0110]).
a non-transitory machine-readable storage medium (See at least [0111]). 

Regarding Claim 2: Yang discloses the above limitations. Additionally, Yang discloses recognizing text in the image of the electronic document (Text contained within an electronic document might be obtained using, for example, optical character recognition ("OCR") techniques. See at least [0022]) dividing the recognized text in the image into the words; extracting a plurality of character sequences from the words; and extracting the plurality of symbolic vectors from the plurality of character sequences (transforms the obtained text into textual feature representations. Each textual feature representation represents, in a vector space, a text portion of the page of the document. See at least [0022]. Also: Text portions can be represented at any suitable level of granularity (e.g., by sentence, by phrase, by word, by character, etc.). See at least [0059]). 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having 

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Yang et al. (US 2019/0026550 A1) in view of Oliner et al. (US 2018/0089561 A1). 

Regarding Claim 3: Yang discloses the above limitations. Additionally, Yang discloses extracting a first plurality of characters from each of the words (transforms the obtained text into textual feature representations. Each textual feature representation represents, in a vector space, a text portion of the page of the document. See at least [0022]. Also: Text portions can be represented at any suitable level of granularity (e.g., by sentence, by phrase, by word, by character, etc.). See at least [0059]). However, Yang does not appear to disclose extracting a first plurality of characters and a second plurality of characters from each of the words, the first plurality of characters corresponding to the second plurality of characters in a reverse order.
extracting a first plurality of characters and a second plurality of characters from each of the words, the first plurality of characters corresponding to the second plurality of characters in a reverse order (As described in more detail below, multiple neural networks may be utilized to identify static and/or variable text within a character string, such as an event or portion thereof. Accordingly, multiple neural networks may be trained. In cases that neural networks are trained for utilization in association with a particular source type, data set, index, etc., multiple neural networks can be trained for each such source type, data set, index, etc. In some embodiments, a forward neural network and a reverse neural network can be trained to facilitate identification of static and/or variable text within a character string. A forward neural network refers to a neural network that is trained via a character-by-character forward approach. In this regard, characters are fed to the neural network beginning with the first character of the character string and continuing through the last character. A reverse neural network refers to a neural network that is trained via a character-by-character reverse or backward approach. To this end, characters are fed to the neural network beginning with the last character of the character string and continuing through the first character. As can be appreciated, multiple neural networks can be trained using the same or different character strings, such as events. For example, in some implementations, a same set of events may be used to train both a forward neural network and a reverse neural network. In other implementations, a first set of events may be used to train a forward neural network, while a second set of events may be used to train a reverse neural network. See at least [0327]). 
Yang provides a system which extracts characters and generates word embeddings, which differs from the claimed invention by the substitution of Yang’s general character extraction with a forward and backward character extraction. However, Oliner demonstrates that the prior art already knew of forward and backwards character extractions. One of ordinary skill in the art could have trivially substituted Yang’s general character extraction with Oliner’s character extraction. Further, one of ordinary skill in the art would have recognized that such a substitution would have predictably resulted in a system which would generate word embeddings based on both forward and backwards readings of words. As such, the claimed invention would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention in view of the disclosures of Yang and the teachings of Oliner. 

Claims 4-6, 8-12, and 14-19 are rejected under 35 U.S.C. 103 as being unpatentable over Yang et al. (US 2019/0026550 A1). 

Regarding Claim 4 and 14: Yang discloses the above limitations. Additionally, Yang discloses processing a plurality of character sequences to extract a first plurality of feature vectors representative of the words in the electronic document (Text contained within an electronic document might be obtained using, for example, optical character recognition ("OCR") techniques. The page segmentation application transforms the obtained text into textual feature representations. See at least [0022]. Also: A textual feature representation represents a text portion of a document in a vector space. See at least [0007]). Further, Yang discloses layers of a neural network (Adjusting the neural network includes one or more of adjusting the number of layers in the neural network. See at least [0096]). However, Yang does not appear to explicitly disclose where the processing is performed by a first plurality of layers of the neural network. 
	Yang provides a system which generates word embeddings based on the text of a document, which differs from the claimed invention by the substitution of Yang’s unstated processing technique for a neural network processing technique. However, Yang also demonstrates that the prior art already knew of neural networks. One of ordinary skill in the art could have trivially substituted Yang’s unstated processing technique with a neural network, to generate the word embeddings. Further, one of ordinary skill in the art would have recognized that such a substitution would have predictably resulted in a system which would generate word embeddings based on neural network processing. As such, the claimed invention would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention in view of the disclosures of Yang. 

Regarding Claim 5: Yang makes obvious the above limitations. Additionally, Yang discloses wherein the first plurality of feature vectors comprises a plurality of word embeddings (The page segmentation application transforms the obtained text into textual feature representations. See at least [0022]. Also: A 

Regarding Claim 6 and 15: Yang makes obvious the above limitations. Additionally, Yang discloses processing the plurality of features extracted from the electronic document to build at least one first table of a first plurality of word features based on the first plurality of feature vectors and a second plurality of feature vectors representative of the words in the electronic document (the method 300 involves matching the visual feature representation to the textual feature representation from the text mapping based on both the visual feature representation and the textual feature representation corresponding to the set of pixels. More specifically, the page segmentation application 106 matches the visual features for a pixel (or set of pixels), as represented in the visual feature representation 270, with the textual feature representation 240 associated with the pixel (or set of pixels) in the text mapping 290. Thus, this matching operation allows the page segmentation application 106 to use a combination of the visual feature representation 270 and the textual feature representation 240 to classify a particular pixel (or set of pixels) associated with these feature representations. See at least [0076]). Further, Yang discloses layers of a neural network (Adjusting the neural network includes one or more of adjusting the number of layers in the neural network. See at least [0096]). However, Yang does not appear to explicitly disclose where the processing is performed by a second plurality of layers of the neural network. 
	Yang provides a system which makes a mapping of text features and visual features, which differs from the claimed invention by the substitution of Yang’s unstated processing technique for a neural network processing technique. However, Yang also demonstrates that the prior art already knew of neural networks. One of ordinary skill in the art could have trivially substituted Yang’s unstated processing technique with a neural network, to generate the mapping. Further, one of ordinary skill in the art would have recognized that such a substitution would have predictably resulted in a system which would generate the mapping of text and visual features based on neural network processing. As such, the claimed invention would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention in view of the disclosures of Yang. 

Regarding Claim 8: Yang makes obvious the above limitations. Additionally, Yang discloses wherein the second plurality of feature vectors comprises spatial information of a plurality of portions of the electronic document containing the words, and wherein each of the plurality of portions of the electronic document corresponds to one of the words (the method 300 involves matching the visual feature representation to the textual feature representation from the text mapping based on both the visual feature representation and the textual feature representation corresponding to the set of pixels. More specifically, the page segmentation application 106 matches the visual features for a pixel (or set of pixels), as represented in the visual feature representation 270, with the textual feature representation 240 associated with the pixel (or set of pixels) in the text mapping 290. Thus, this matching operation allows the page segmentation application 106 to use a combination of the visual feature representation 270 and the textual feature representation 240 to classify a particular pixel (or set of pixels) associated with these feature representations. See at least [0076]). Further, Yang discloses layers of a neural network (Adjusting the neural network includes one or more of adjusting the number of layers in the neural network. See at least [0096]).

Regarding Claim 9 and 16: Yang makes obvious the above limitations. Additionally, Yang discloses constructing a pseudo-image based on the at least one first table of the first plurality of word features, wherein the pseudo-image comprises spatial information indicative of locations of the text fields in the electronic document (the page segmentation application uses a combination of the visual feature representations and the textual feature representations. See at least [0023]. Also: a combined set of visual and textual feature representations. See at least [0026]. Also: this matching operation allows the page segmentation application 106 to use a combination of the visual feature representation 270 and the textual feature representation 240. See at least [0076]); processing the pseudo-image using a fourth plurality of layers of the neural network to extract a second plurality of word features representative of the words in the electronic document (the page segmentation application uses a combination of the visual feature representations and the textual feature representations to segment a certain set of pixels in a page segmentation. … A neural network (or portion of a neural network), which is trained to segment a page based on both visual and textual information, receives the identified visual feature representations layers of a neural network (Adjusting the neural network includes one or more of adjusting the number of layers in the neural network. See at least [0096]). However, Yang does not appear to explicitly disclose where the constructing is performed by a third plurality of layers of the neural network. 
	Yang provides a system which combines visual features and text features, which differs from the claimed invention by the substitution of Yang’s unstated processing technique for a neural network processing technique. However, Yang also demonstrates that the prior art already knew of neural networks. One of ordinary skill in the art could have trivially substituted Yang’s unstated processing technique with a neural network, to combine the visual and text features. Further, one of ordinary skill in the art would have recognized that such a substitution would have predictably resulted in a system which would combine the visual features and the text features based on neural network processing. As such, the claimed invention would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention in view of the disclosures of Yang. 

Regarding Claim 10 and 17: Yang makes obvious the above limitations. Additionally, Yang discloses wherein processing the pseudo-image by the fourth plurality of layers of the neural network comprises performing semantic segmentation on the pseudoimage (performing semantic page segmentation functions. See at least [0110]. Also: the page segmentation application uses a combination of the visual feature representations and the textual feature representations to segment a certain set of pixels in a page segmentation. … A neural network (or portion of a neural network), which is trained to segment a page based on both visual and textual information, receives the identified visual feature representations and textual feature representations as an input and uses these representations to generate at least part 

Regarding Claim 11 and 18: Yang makes obvious the above limitations. Additionally, Yang discloses constructing at least one second table including the second plurality of word features (The page segmentation application outputs a classification of the set of pixels as a particular document object type based on a correspondence between a location of the set of pixels in the electronic document and a location of the particular document object type in the output page segmentation. See at least [0008]. Also: the neural network classifies these pixels as a certain object type (e.g., assigning a "table" object type). The page segmentation application generates, in the page segmentation, a visual indicator for this set of pixels that identifies the object type (e.g., a mask over the document region depicting a table having a certain color associated with the "table" type). See at least [0023]. Also: Thus, the page segmentation application creates an output page segmentation that can include a pixel-level classification (i.e., the assigned object types) and a segmentation mask (e.g., the visual indicators corresponding to the assigned object types). … The segmentation mask illustrates the locations and types of page objects in the document.  See at least [0024]. Also: post-processing algorithm can adjust an initial, raw, page segmentation and transform the raw page segmentation into an improved output page segmentation. For example, the post-processing algorithm analyzes each pixel on the page and calculates a respective probability that the pixel is within each of a predefined number of object classes. The post-processing algorithm assigns, for each pixel, based on the respective probability, a most likely object type and incorporates the likely object type into the classification. See at least [0082]. Also: The neural network has a second portion (e.g., one or more additional decoders) that are trained to classify, from a combined set of visual and textual feature representations, certain pixels as belonging to certain object types. See at least [0026]). 

Regarding Claim 12 and 19: Yang makes obvious the above limitations. Additionally, Yang discloses classifying, by a fifth layer of the neural network, each of the words into one of a predetermined classes based on the second plurality of word features, wherein each of the predefined classes corresponds to one of the field types (The page segmentation application outputs a classification of the set of pixels as a particular document object type based on a correspondence between a location of the set of pixels in the electronic document and a location of the particular document object type in the output page segmentation. See at least [0008]. Also: the neural network classifies these pixels as a certain object type (e.g., assigning a "table" object type). The page segmentation application generates, in the page segmentation, a visual indicator for this set of pixels that identifies the object type (e.g., a mask over the document region depicting a table having a certain color associated with the "table" type). See at least [0023]. Also: Thus, the page segmentation application creates an output page segmentation that can include a pixel-level classification (i.e., the assigned object types) and a segmentation mask (e.g., the visual indicators corresponding to the assigned object types). … The segmentation mask illustrates the locations and types of page objects in the document.  See at least [0024]. Also: post-processing algorithm can adjust an initial, raw, page segmentation and transform the raw page segmentation into an improved output page segmentation. For example, the post-processing algorithm analyzes each pixel on the page and calculates a respective probability that the pixel is within each of a predefined number of object classes. The post-processing algorithm assigns, for each pixel, based on the respective probability, a most likely object type and incorporates the likely object type into the classification. See at least [0082]. Also: The neural network has a second portion (e.g., one or more additional decoders) that are trained to classify, from a combined set of visual and textual feature representations, certain pixels as belonging to certain object types. See at least [0026]).

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Yang et al. (US 2019/0026550 A1), and further in view of Cheng et al. (US 2019/0171913 A1). 

Regarding Claim 7: Yang makes obvious the above limitations. Yang does not appear to disclose wherein the second plurality of feature vectors comprises at least one of a plurality of word vectors in an embedding dictionary or a plurality of word vectors in a keyword dictionary.
	However, Cheng teaches a second plurality of feature vectors representative of the words in the electronic document, wherein the second plurality of feature vectors comprises at least one of a plurality of word vectors in an embedding dictionary or a plurality of word vectors in a keyword dictionary (In some examples, the words in the input dictionary 36 and the class labels in hierarchy structure dictionary 38 are encoded with respective indices. During training of the hierarchical classification sequential model, embeddings are learned for the encoded words in the input dictionary 36 and the class labels in the hierarchy structure dictionary 38. The embeddings are dense vectors that project the words in the input dictionary 36 and the class labels in hierarchy structure dictionary 38 into a learned continuous vector space. In an example, an embedding layer is used to learn the word embeddings for all the words in the input dictionary 36 and the class labels in the hierarchy structure dictionary 38 at the same time the hierarchical classification system 30 is trained. The embedding layer can be initialized with random weights or it can be loaded with a pre-trained embedding model. The input dictionary 36 and the hierarchy structure dictionary 38 store respective mappings between the word representations of the input words and class labels and their corresponding word vector representations. See at least [0024]. Also: The hierarchical classification system 30 converts the sequence of words in the input text block 26 into a sequence of inputs 40 by replacing the input words (and optionally the input punctuation marks and/or symbols) with their respective word embeddings based on the mappings stored in the input dictionary 36. In some examples, the hierarchical classification system 30 also brackets the input word embedding sequence between one or both of the start-of-sequence symbol and the end-of-sequence symbol. See at least [0025]). 
	Yang provides a system which maps text representations to visual representations, which differs from the claimed invention by the substitution of Yang’s directly generated text representations for a combination of Yang’s directly generated text representations and dictionary generated text representations. However, Cheng demonstrates that the prior art already knew of dictionary generated text representations. One of ordinary skill in the art could have easily substituted Yang and Cheng’s text . 

Response to Arguments
Applicant’s Argument Regarding 112(b) Rejections of claims 1-12, 14, and 15: 
Applicant submits that claims 1, 4, and 14 have been amended to address the rejections.
Applicant disagrees that “a first plurality of word features” as recited in claims 6 and 15 is not defined. Applicant asserts that the term “word features” is defined in the specification at least in paragraphs [0041], [0043], and [0043]. Thus applicant submits that claims 6 and 15 are definite. 
Examiner’s Response: Applicant's arguments and amendments filed 3 January 2022 have been fully considered.
Regarding the rejections based on a lack of antecedent basis, Applicant’s amendments resolve the identified issues. The 112(b) rejections of claims 1, 2, 4 and 14 are withdrawn. 
Regarding the rejection based on “word feature”, Applicant’s argument is unpersuasive. There is no definition of “word feature” in the identified paragraphs. A statement that an element “may be” or “may include” something is not a clear lexicographic definition. Further, even in view of these disclosures one of ordinary skill in the art would not be able to determine the boundaries of the term. As such, applicant’s argument is unpersuasive and the 112(b) rejection of claim 6 and 15 is maintained. 

Applicant’s Argument Regarding 102 Rejections of claims 1-2, 13, and 20: Yang does not disclose that the output page segmentation indicates a probability of a text field in the page. For at least the reasons stated above, Yang fails to disclose all the features of claims 1, 13, and 20.  
Examiner’s Response: Applicant's arguments filed 3 January 2022 have been fully considered but they are not persuasive. As explained above, the broadest reasonable interpretation of an indication of a 

Additional Considerations
The prior art made of record and not relied upon that is considered pertinent to applicant’s disclosure can be found in the Office Action dated 2 September 2021.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Bion A Shelden whose telephone number is (571)270-0515. The examiner can normally be reached M-F, 12pm-10pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hajime S Rojas can be reached on (571)270-5491. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.






/Bion A Shelden/Examiner, Art Unit 3681                                                                                                                                                                                                        2022-01-13