DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 3/23/21 has been entered. Claims 1-20 are pending.















Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. Accordingly, 35 USC 112(f) is NOT invoked. Accordingly:
MPEP 2111.01 III. “PLAIN MEANING” REFERS TO THE ORDINARY AND CUSTOMARY MEANING GIVEN TO THE TERM BY THOSE OF ORDINARY SKILL IN THE ART, 3rd paragraph, emphasis added:
“It is also appropriate to look to how the claim term is used in the prior art, which includes prior art patents, published applications, trade publications, and dictionaries. Any meaning of a claim term taken from the prior art must be consistent with the use of the claim term in the specification and drawings. Moreover , when the specification is clear about the scope and content of a claim term, there is no need to turn to extrinsic evidence for claim interpretation. 3M Innovative Props. Co. v. Tredegar Corp., 725 F.3d 1315, 1326-28, 107 USPQ2d 1717, 1726-27 (Fed. Cir. 2013) (holding that "continuous microtextured skin layer over substantially the entire laminate" was clearly defined in the written description, and therefore, there was no need to turn to extrinsic evidence to construe the claim).”

Accordingly, a meaning that is “taken” does not mean other meanings can’t be “taken”.

The claimed “anchor” (as in “determining anchors that type of form” in claim 1, lines 7,8) is interpreted, via a)-c) as shown below in the context of how “anchor” is used:
a)	in light of applicant’s disclosure, as one of ordinary skill in the art would, such as in applicant’s disclosure, emphasis added:
[0048]   Determined anchors for the form generated by anchor generation block 455 may also be received by vocabulary learning block 457. Vocabulary learning block 457 may use the ground truth, the generated anchors, and outputs from Parser block 453 to generate a language model. The language model output from vocabulary learning block 457 may be used by OCR block 453 to provide more accurate OCR by acting as a model- tuned form of OCR. The adjustment may include adding or increasing in the language model for fields which are present in the anchors, for use on the whole form. Also, the OCR may be run with a field-specific language model and run on a specific bounding box where the field is expected to be. For example, a particular language model may be trained for dates, another for addresses, and another for names, and so on. Regular expressions may be run in the language mode. In some examples, this may be specified via a Finite State Transducer Model and incorporated into the OCR language model. Regular expressions rules may be extrapolated from the forms in this manner.

Thus the claim term “anchor” is used “to generate a language model” wherein “language” is defined via Dictionary.com:

















language
noun
1	a body of words and the systems for their use common to a people who are of the same community or nation, the same geographical area, or the same cultural tradition:
the two languages of Belgium; a Bantu language; the French language; the Yiddish language.
2	communication by voice in the distinctively human manner, using arbitrary sounds in conventional ways with conventional meanings; speech.
3	the system of linguistic signs or symbols considered in the abstract (opposed to speech).
4	any set or system of such symbols as used in a more or less uniform fashion by a number of people, who are thus enabled to communicate intelligibly with one another.
5	any system of formalized symbols, signs, sounds, gestures, or the like used or conceived as a means of communicating thought, emotion, etc.:
the language of mathematics; sign language.
6	the means of communication used by animals:
the language of birds.
7	communication of meaning in any way; medium that is expressive, significant, etc.:
the language of flowers; the language of art.
8	linguistics; the study of language.
9	the speech or phraseology peculiar to a class, profession, etc.; lexis; jargon.
10	a particular manner of verbal expression:
flowery language.
11	choice of words or style of writing; diction:
the language of poetry.
12	Computers. a set of characters and symbols and syntactic rules for their combination and use, by means of which a computer can be given directions:
The language of many commercial application programs is COBOL.
13	a nation or people considered in terms of their speech.
14	Archaic. faculty or power of speech.

Accordingly, the “prior art must be consistent with the use of the claim tern in the specification and drawings”, cited above, MPEP 2111.01 III. 3rd paragraph, as shown next.



b)	the use of “anchor” as evidenced by the prior art:
1)	Chang et al. (US Patent App. No.: US 2020/0004873) is pertinent as teaching “an anchor that indicates a place within the document at which content of interest pertaining to a subtopic is located” and IS consistent with the use of applicant’s “anchor” via
“[0056] The link data 139 represents links to the content that may be accessed by a browser or a search engine according to the improved techniques described herein.  In some implementations, the links include a uniform resource locator (URL) address at which a document is stored (e.g., a web server, not necessarily the computer 120).  In some implementations, the links also include an anchor that indicates a place within the document at which content of interest pertaining to a subtopic is located.”;














2)	cited Kumar et al. (US Patent 10,489,682) is pertinent as teaching “A sequence of text words for training data is first formed by choosing consecutive words from the ordered list, and anchored at a random list-index.” via c.8,ll.38-52:
“Text corpus 432, in one embodiment, comprises several tens of thousands of fully-formed English sentences, composed of 2.4 million English words.  This is advantageously large enough to capture the statistics of commonly used words and/or word combinations.  In one embodiment, the 2.4 million English words is arranged in an ordered list so that word combinations and orderings are preserved.  A sequence of text words for training data is first formed by choosing consecutive words from the ordered list, and anchored at a random list-index.  The segment length chosen randomly as a number between 1 through 8.  Once the sequence of text words is determined it is then rendered into an image using a randomly chosen font and font size.  This allows creation of (image, text) pairs randomly for training data.”

Thus, “A sequence of text words for training data is first formed by choosing consecutive words from the ordered list, and anchored at a random list-index.” is used to “build” fig. 1:100:OCR via c.1,ll. 33-44:
“Optical character recognition systems capable of providing the level of accuracy required for business processes, such as with human accuracy or better are disclosed herein.  The disclosed systems employ a novel methodology that exploits a collection of weak & low-accuracy OCR systems.  This is employed to build a strong & high-accuracy deep-learning system for the OCR problem. The disclosed systems reduce the need for expensive ground-truth creation via human labeling, which is almost always needed for deep-learning models.  Certain embodiments employ a number of accuracy improvement techniques that can be adapted to deep learning solutions for other problems.”

Thus, Kumar’s use of “A sequence of text words for training data is first formed by choosing consecutive words from the ordered list, and anchored at a random list-index.” used to “build” OCR of the English language in a lexicon domain, such as invoices, IS consistent with applicant’s use of “anchor” to generate the language model.



	3)	Vajda et al. (US Patent App. Pub. No.: US 2019/0172223 A1) is pertinent as teaching “Each image sample may have a corresponding ground truth or label, which may include bounding boxes (e.g., represented by anchors) or any other suitable indicators for RoIs that contain foreground/background objects in the image sample” and is considered NOT consistent with applicant’s use of “anchor” via:
“[0100] At stage 420, a temporary trunk (referenced as Trunk.sub.temp in FIG. 4) and a temporary RPN (referenced as RPN.sub.temp in FIG. 4) may be trained together to generate a temporary functional model for generating RoI candidates, in accordance with particular embodiments.  Once trained, Trunk.sub.temp and RPN.sub.temp in particular embodiments are used to assist with the subsequent training process and are not themselves included in the machine-learning model 200.  In particular embodiments, the temporary Trunk.sub.temp may be initialized to have the same parameters as those of Trunk.sub.1 from stage 410.  Rather than initializing Trunk.sub.1 in stage 410 and using the result to initialize Trunk.sub.temp, one skilled in the art would recognize that the order may be switched (i.e., Trunk.sub.temp may be initialized in stage 410 and the initialized Trunk.sub.temp may be used to initialize Trunk.sub.1).  The training dataset at stage 420 may include image samples.  Each image sample may have a corresponding ground truth or label, which may include bounding boxes (e.g., represented by anchors) or any other suitable indicators for RoIs that contain foreground/background objects in the image sample.  In particular embodiments, the RPN may be trained in the same manner as in Faster R-CNN.  For example, the RPN may be trained to generate k anchors (e.g., associated with boxes of predetermined aspect ratios and sizes) for each sampling region and predict a likelihood of each anchor being background or foreground.  Once trained, Trunk.sub.temp and RPN.sub.temp would be configured to process a given image and generate candidate RoIs.” 









4)	Skans et al. (US Patent App. Pub. No.: US 2018/0165546 A1) is pertinent as teaching “a first input image (also known as an anchor)” in the context of “training” is NOT consistent with applicant’s use of “anchor” via:
[0017] By the term "triplet-based cost function", should, in the context of present specification, be understood a function for minimizing, or reducing, a distance between a first input image (also known as an anchor) comprising an object being of a first classification or identification and a second input image (also known as a positive) comprising an object being of the same classification or identification.  The triplet-based cost function should further accomplish that a distance between the first input image and a third image (also known as a negative) comprising an object being of another classification or identification is at least alpha larger than the distance between the anchor-positive pair of input images.  This means that the alpha value is used to create a difference in separation between anchor-positive and anchor-negative pairs such that, for a specific triplet of images, the distance between the anchor-negative pair is at least alpha larger than the distance between the anchor-positive pair.  It should be noted that alpha is always a positive number.  In case, the difference between the distance between the anchor-positive pair and the distance between the anchor-negative pair of a triplet is smaller than alpha, the cost function will change the weights of the neural network to increase the difference towards alpha.  It should also be noted that reaching the alpha distance margin may be an iterative process.  The triplet based cost function will change the weights such that the difference is increased towards alpha, but the alpha distance margin may not be reached in one iteration.  It is an iterative process to meet all alpha conditions for all images in the training database and alpha distance margin is not achieved for a particular triplet, the gradients which is calculated based on the cost function to make the weights to change such that the particular triplet will come a little closer to meeting alpha margin.  However, if the difference already is larger than alpha, the cost function will not affect the weights of the neural network for that specific triplet.  Accordingly, separation of image data being of different classifications or identifications in the neural network hyperspace are achieved.  Details of this alpha value are disclosed in published articles, for example in the article "FaceNet: A Unified Embedding for Face Recognition and Clustering" by Schroff et al. (Google Inc.).







5)	Wang et al. (US Patent App. Pub. No.: US 2017/0098138 A1) is pertinent as teaching an “anchor image…is a training image that is to be used as a basis for…learning” IS consistent with applicant’s use of “anchor” via:

[0076] The three columns are constrained to be identical in both structure and 
parameter, i.e., there is only a single set of parameters to learn.  The loss function learned to form the model may be expressed as follows: 
 
k = 0 n [ f ( x A ) - f ( x P ) - f ( x A ) - f ( x N ) + .alpha.  ] + 
##EQU00001## 
 
where "x.sub.A," "x.sub.P" and "x.sub.N" are the anchor image, positive image, and negative image respectively, and ".alpha." is a parameter used to control a permissible margin of differences between positive image and negative image.  The anchor image 812, as the name implies, is a training image that is to be used as a basis for comparison as part of the machine learning with the positive and negative images 821, 814.  The positive image "x.sub.P" 812 has a font type asx.sub.A that matches a font type of the anchor image 812, but is rendered with different text or random perturbation as described in the previous section.  The negative image "x.sub.N" has a different font type than that of the anchor image 810.

6)	previously cited Sampson et al. (US Patent 8,880,540) is pertinent as teaching “An anchor in a scanned document is an object that enables identification of the position of a data field in the document with respect to something else in the document”, such as shown by the arrow 1902 in fig. 19, via c. 2,ll. 31-46 IS considered consistent with applicant’s use of “anchor”:
“An anchor in a scanned document is an object that enables identification of the position of a data field in the document with respect to something else in the document.  For example, an anchor for a "total amount due" data field in an invoice document may be the text "Total" that appears one inch to the right of the "total amount due" data field.  Anchor specification also identifies a search zone in which an anchor may be located in a document.  Reliably finding anchors is a key to accurate field recognition.  However, a newly scanned document is often not in the same position as the reference document, which may be because the newly scanned document was in a different position when scanned.  The newly scanned document can also be rotated relative to the reference document.  Photos taken of documents by mobile phones introduce the likelihood that images of a document may also have significantly different scaling than a reference document.”
; and

7)	Datta et al. (US 2008/0285860 A1) is pertinent as teaching “a pre-determined anchor database of images” and is considered NOT consistent with applicant’s use of “anchor” via:	
“[0050] Humans learn to rate the aesthetics of pictures from the experience 
gathered by seeing other pictures.  Our opinions are often governed by what we have seen in the past.  Because of our curiosity, when we see something unusual or rare we perceive it in a way different from what we get to see on a regular basis.  In order to capture this factor in human judgment of photography, we define a new measure of familiarity based on the integrated region matching (IRM) image distance [21].  The IRM distance computes image similarity by using color, texture and shape information from automatically segmented regions, and performing a robust region-based matching with other images.  Primarily meant for image retrieval applications, we use it here to quantify familiarity.  Given a pre-determined anchor database of images with a well-spread distribution of aesthetics scores, we retrieve the top K closest matches in it 
with the candidate image as query.  Denoting IRM distances of the top matches 
for each image in decreasing order of rank as [q(i)|1.ltoreq.i.ltoreq.K].  We 
compute f8 and f9 as f 8 = 1 20 i = 1 20 q ( i ) , f g = 1 100 i = 1 100 q ( i ) .  ##EQU00004## ”; and

	8)	Tillberg et al. (US 2007/0168382 A1) is pertinent as teaching “ ‘OCR 

anchors’ ” and thus IS consistent with applicant’s disclose via Tillberg:

	“[0070] "OCR anchors" means regions or fields of a scan that are examined with OCR technology and then compared with the same regions or fields of a template 
to validate fingerprinting results.”






9)	Brown et al. (US Patent 6,665,666) is pertinent as teaching “Word-List 740 anchored” for “developing…question-templates” IS considered consistent with the use of applicant’s use of “anchor” via:
c.5,ll. 11-25:
“The query analysis is enhanced by developing a set of question-templates that are matched against the user's query, with substitution of certain query terms with special query-tokens that correspond to the phrase labels mentioned above.  So for example, the pattern "where .  . . " causes the word "where" to be replaced with PLACE$.  The pattern "how much does .  . . cost" causes those terms to be replaced with MONEY$.  The pattern "how old .  . . " causes a replacement with AGE$.  The base set of such labels is: PLACE$, PERSON$, ROLE$, NAME$, ORGANIZATION$, DURATION$, AGE$, DATE$, TIME$, VOLUME$, AREA$, LENGTH$, WEIGHT$, NUMBER$, METHOD$, MOST$, RATE$ and MONEY$.  More specific versions of 
these, such as STATE$, COUNTRY$, CITY$, YEAR$ can be used as long as the phrase analyser (discussed below) can recognize such quantities.”

c.15,ll. 3-28:
“The operation of the Augmentation process 730 is depicted in FIG. 7c.  In step 780, WLE-pointer 781 is set to point to the first WLE of Word-List 740.  Step 782 iterates through all records 762 in QA-Token file 760.  Suppose a particular record 762a is selected, consisting of QA-Token 765a and pointer 767a to QA-file 770a.  Step 784 iterates through every pattern 775 in QA-file 770a in turn.  Suppose a particular pattern 775b is selected.  In step 786 the pattern-matcher 730 attempts to match pattern 775b with the Word-List 740 anchored at the point marked by the WLE-pointer 781.  If a match occurs, step 788 is executed and an augmentation 755b is added to the Word-List at point 781, labelled with the current QA-Token 7765a.  Step 790 is then executed in which it is tested to see if the Word-List Pointer 781 is at the end of the Word-List 740.  If it is, then exit point 798 is reached, otherwise Word-List Pointer 781 is advanced (step 792) and the execution returns to step 782.  If in step 786 no match occurs, step 784 continues to iterate through all patterns 775.  If step 784 completes with no match, step 782 continues the iteration though all QA-token files 760.  When the iteration in step 782 is finished, step 794 is executed to see if the Word-List pointer 781 is at the end of the Word-List 740.  If it is, then exit point 798 is reached, otherwise Word-List 
Pointer 781 is advanced (step 796) and the execution restarts step 782.  The output of this process is an Augmented Word-List 757.”
	
Thus if the prior art uses the word “anchor” or equivalent thereof with “language” as defined above, then that art’s use of “anchor” or equivalent thereof is consistent with applicant’s use of “anchor”; and
b)	definition thereof via Dictionary.com, wherein definitions 1-11 (U.S.) and 1-10 (U.K.) are equally applicable:

anchor, noun
1	any of various devices dropped by a chain, cable, or rope to the bottom of a body of water for preventing or restricting the motion of a vessel or other floating object, typically having broad, hooklike arms that bury themselves in the bottom to provide a firm hold.
2	any similar device for holding fast or checking motion: an anchor of stones.
3	any device for securing a suspension or cantilever bridge at either end.
4	any of various devices, as a metal tie, for binding one part of a structure to another.
5	a person or thing that can be relied on for support, stability, or security; mainstay: Hope was his only anchor.
6	Also anchorman. Radio and Television. a person who is the main broadcaster on a program of news, sports, etc., and who usually also serves as coordinator of all participating broadcasters during the program; anchorman or anchorwoman; anchorperson.
7	Television. a program that attracts many viewers who are likely to stay tuned to the network for the programs that follow.
8	Also called anchor store . a well-known store, especially a department store, that attracts customers to the shopping center in which it is located.
9	Slang. automotive brakes.
10	Military. a key position in defense lines.
11	Also anchorman. Sports.
a	the person on a team, especially a relay team, who competes last.
b	the person farthest to the rear on a tug-of-war team.

BRITISH DICTIONARY DEFINITIONS FOR ANCHOR
anchor, noun
1	any of several devices, usually of steel, attached to a vessel by a cable and dropped overboard so as to grip the bottom and restrict the vessel's movement
2	an object used to hold something else firmly in place: the rock provided an anchor for the rope
3	a source of stability or security: religion was his anchor
4	a	a metal cramp, bolt, or similar fitting, esp one used to make a connection 
to masonry
b	(as modifier)anchor bolt; anchor plate
5	a	the rear person in a tug-of-war team
b	short for anchorman, anchorwoman
6	at anchor (of a vessel) anchored
7	cast anchor, come to anchor or drop anchor to anchor a vessel
8	drag anchor See drag (def. 13)
9	ride at anchor to be anchored
10	weigh anchor to raise a vessel's anchor or (of a vessel) to have its anchor 
raised in preparation for departure; and
The claimed “each” (as in “determining anchors that type of form” in claim 1,lines 7,8 and in other locations of claim 1) is interpreted in light of applicant’s disclosure as one of ordinary skill in the art would and definition thereof via Dictionary.com:
each
adjective
1	every one of two or more considered individually or one by one:
each stone in a building; a hallway with a door at each end.

The claimed “includes” (as in “wherein the ground truth …includes, for each form, at least one …key-value….pair” in claim 1, lines 17-19) is interpreted in light of applicant’s disclosure and definition there of wherein definitions 1 and 3: “to contain” is “taken” (as discussed above in Claim Interpretation) as being consistent with applicant’s specification and drawings: 
include
verb (used with object), in·clud·ed, in·clud·ing.
1	to contain, as a whole does parts or any part or element:
The package includes the computer, program, disks, and a manual.
2	to place in an aggregate, class, category, or the like.
3	to contain as a subordinate element; involve as a factor.

















wherein “to” of “to contain” is defined” wherein definitions 8 and 9 are “taken”:
to
preposition
1	(used for expressing motion or direction toward a point, person, place, or thing approached and reached, as opposed to from):They came to the house.
2	(used for expressing direction or motion or direction toward something) in the direction of; toward: from north to south.
3	(used for expressing limit of movement or extension):He grew to six feet.
4	(used for expressing contact or contiguity) on; against; beside; upon: a right uppercut to the jaw; Apply varnish to the surface.
5	(used for expressing a point of limit in time) before; until: to this day; It is ten minutes to six. We work from nine to five.
6	(used for expressing aim, purpose, or intention):going to the rescue.
7	(used for expressing destination or appointed end):sentenced to jail.
8	(used for expressing agency, result, or consequence):to my dismay; The flowers opened to the sun.
9	(used for expressing a resulting state or condition):He tore it to pieces.
10	(used for expressing the object of inclination or desire):They drank to her health.
11	(used for expressing the object of a right or claim):claimants to an estate.
12	(used for expressing limit in degree, condition, or amount):wet to the skin; goods amounting to $1000; Tomorrow's high will be 75 to 80°.
13	(used for expressing addition or accompaniment) with: He added insult to injury. They danced to the music. Where is the top to this box?
14	(used for expressing attachment or adherence):She held to her opinion.
15	(used for expressing comparison or opposition):inferior to last year's crop; The score is eight to seven.
16	(used for expressing agreement or accordance) according to; by:a position to one's liking; to the best of my knowledge.
17	(used for expressing reference, reaction, or relation):What will he say to this?
18	(used for expressing a relative position):parallel to the roof.
19	(used for expressing a proportion of number or quantity) in; making up:12 to the dozen; 20 miles to the gallon.
20	(used for indicating the indirect object of a verb, for connecting a verb with its complement, or for indicating or limiting the application of an adjective, noun, or pronoun):Give it to me. I refer to your work.
21	(used as the ordinary sign or accompaniment of the infinitive, as in expressing motion, direction, or purpose, in ordinary uses with a substantive object.)
22	Mathematics. raised to the power indicated: Three to the fourth is 81 (34= 81).







wherein “contain” is defined wherein definition 3 is “taken”:
contain
verb (used with object)
1	to hold or include within its volume or area:
This glass contains water. This paddock contains our best horses.
2	to be capable of holding; have capacity for:
The room will contain 75 persons safely.
3	to have as contents or constituent parts; comprise; include.
4	to keep under proper control; restrain: He could not contain his amusement.
5	to prevent or limit the expansion, influence, success, or advance of (a hostile nation, competitor, opposing force, natural disaster, etc.):to contain an epidemic.
6	to succeed in preventing the spread of: efforts to contain water pollution.
7	Mathematics. (of a number) to be a multiple of; be divisible by, without a remainder: Ten contains five.
8	to be equal to: A quart contains two pints.

wherein “comprise” is defined wherein any of 1-3 can be “taken”:
comprise
verb (used with object), com·prised, com·pris·ing.
1	to include or contain:
The Soviet Union comprised several socialist republics.
2	to consist of; be composed of:
The advisory board comprises six members.
3	to form or constitute:
Seminars and lectures comprised the day's activities.











The claimed “key-value…pair” (as in “wherein the ground truth …includes, for each form, at least one …key-value….pair” in claim 1, lines 17-19) in interpreted in light of applicant’s disclosure and of the prior art, as one of ordinary skill in the art of data-structures or identifier-value pairs would, such as:
said Tillberg et al. (US 2007/0168382 A1) via:
“[0024] U.S.  Pat.  No. 5,293,429 (Pizano et al., "System and method for automatically classifying heterogeneous business forms", Mar.  8, 1994) teaches a system that classifies images of forms based on a predefined set of templates.  The system utilizes pattern recognition techniques for identifying vertical and horizontal line patterns on scanned forms.  The identified line segments may be clustered to identify full length lines.  The length of the lines in a specific template form may be employed to provide a key value pair for the form in the dictionary.  Form identification for the scan using the 
template dictionary is performed using either a window matching means or a means for comparing the line length and the distance between lines through a condensation of the projection information.  In addition, intersections between lines may be identified.  A methodology is also taught for the creation of forms with horizontal and vertical lines for testing the system.  However, the patent does not teach utilizing other sources of information residing within the forms, such as textual information.  In addition, the patent teaches no means for handling scans that do not have an appropriate template within the dictionary.  Furthermore, the teaching is limited to a form dictionary that has 
widely differing form templates; templates that have similar structures, such as form variants, will not be discriminated.”; and

Hall, JR. et al. (US Patent App. Pub. No.: US 2003/0152277 A1):

“[0012] In the present invention, an interactive framework is presented for efficiently ground-truthing document images via image objects paired with fields for ground-truthed metadata (called herein "image object pairs").  Here, ground-truthing an image object pair is accomplished by ground-truthing its metadata.  More specifically, in one embodiment of the invention, in order to "ground-truth" an image object pair, the following two computer assisted steps are available:” 

as shown in fig. 7A:702: “Image Object”: “Clinics” and fig. 7A:710: “Metadata”: “Clinics”.




The claimed “pair” (as in “wherein the ground truth …includes, for each form, at least one …key-value….pair” in claim 1, lines 17-19) is interpreted in light of applicant’s disclosure (US 2020/0151443 A1):
“[0015] Briefly stated, the disclosed technology is generally directed to optical character recognition for forms.  In one example of the technology, optical character recognition is performed on a plurality of forms.  In some examples, the forms of the plurality of forms include at least one type of form.  In some examples, anchors are determined for the forms, including corresponding anchors for each type of form of the plurality of forms.  In some examples, feature rules are determined, including corresponding feature rules 
for each type of form of the plurality of forms.  In some examples, features and labels are determined for each form of the plurality of forms.  In some examples, a training model is generated based on a ground truth that includes a plurality of key-value pairs corresponding to the plurality of forms, and further based on the determined features and labels for the plurality of forms.”

“[0048] Anchor generation block 455 may determine the anchors as follows in some 
examples.  First, all values present in Ground Truth 441 are removed from the forms.  Next, lines that occur more than once per page are removed.  Next, a histogram of the remaining lines is completed.  The lines are then scored based on frequency, with extra points given if a line is included in a set of "known good anchors," such as "date," "address," "DOB," "order number," "Customer," and/or the like.  Next, based on this score, the top N anchors from all of the forms, are determined, where N is a number that is determined based on the histogram.”

“[0067] After block 567 generates the preliminary key-value pairs, one or more bounding boxes may be re-OCRed by OCR block 563, and then run through blocks 564-567 again for increased accuracy.  Next, in some examples, post-processing block 568 performs post processing to generate the key-value pairs.  For instance, a particular key may have a possible value of "yes" or "no" which is indicated on the form by a checkbox which is left either checked or unchecked.  In this case, the words "yes" or "no" as values for the key are not present as text in the form.  However, during post processing, for example, the x in a particular location may be used to determine during post-processing by post-processing block 568 that the value of a corresponding key is "yes." The key-value pairs output by post-processing block 568, along with the OCRed form, 
may serve as the results of service pipeline 561.  In testing pipeline examples as discussed above, the key-values may be received by evaluation block 569 for an accuracy determination.”

and definition thereof via Dictionary.com wherein definitions 1 and 2 are taken, as one of ordinary skill would, for being consistent with applicant’s disclosure:

pair
noun, plural pairs, pair.
1	two identical, similar, or corresponding things that are matched for use together:
a pair of gloves; a pair of earrings.
2	something consisting of or regarded as having two parts or pieces joined together:
a pair of scissors; a pair of slacks.
3	two individuals who are similar or in some way associated:
a pair of liars; a pair of seal pups.
4	a married, engaged, or dating couple.
5	two mated animals.
6	a span or team:
a pair of horses.
7	Government.
a	two members on opposite sides in a deliberative body who for convenience, as to permit absence, arrange together to forgo voting on a given occasion.
b	the arrangement thus made.
8	Cards.
a	two playing cards of the same denomination without regard to suit or color.
b	pairs, two card players who are matched together against different contestants.
9	pairs, pair skating.
10	Also called kinematic pair .Mechanics. two parts or pieces so connected that they mutually constrain relative motion.
11	Philately. two postage stamps joined together either vertically or horizontally.
12	a set or combination of more than two objects forming a collective whole:
a pair of beads.

Note that the broader definition 12 of “pair” can be taken from the prior art, dictionary, as one of ordinary skill in the art would, because definition 12 is consistent with applicant’s disclosure and drawings. Accordingly, the disclosure at fig. 4:455: “ANCHOR GENERATION” corresponds to definition 12 of “pair” via the “set of ‘known good anchors,’ ” (cited above [0048]); however, “the set of ‘known good anchors,’ ” is not claimed. Thus, definition 12 is not taken in this case from the dictionary, as one of ordinary skill in the art would, under the broadest reasonable interpretation of claim 1 because “the set of ‘known good anchors,’ ” is not claimed.
	Similarly in another case, IDS cited Becker (US 2018/0033147) teaches applicant’s disclosed “pair” via said definition 12 (“a set…of more than two”) mapped to a “subset of textual characters” as shown in fig. 4:310: “63,780.45”, a pair of characters, via:
“[0025] In some embodiments, an image segment may be classified as a field that contains a specific type of information.  This classification can be used to identify a subset of textual characters that may be depicted in the image segment.  For example, if an image segment that has been classified as a field for a social security number (e.g., "box a" of W-2 form), the subset of textual characters may include digits and dashes and exclude letters.  In some embodiments, once an image segment has been classified, it may be desirable to perform an OCR process to extract text depicted in the image segment.  The OCR process can be modified or constrained to presume that text in the image segment contains only characters in the subset of textual characters.  This may enable the OCR process to disambiguate extracted text more easily.  For 
example, if a region in an image segment can be interpreted as either "IB" or "18," and if the image segment has been classified as a field for a social security number, the OCR process can elect "18" as the extracted text for the region because 1 and 8 are included in the subset of textual characters for social-security-number fields (while "I" and "B" are not).”

	However, Becker’s pair of characters used for “OCR” is not consistent with applicant’s use (i.e., usefulness or serving some purpose or intended result corresponding to applicant’s “a training model is generated based on…a plurality of key-value pairs”, cited above: [0015]) of the claimed “key value…pair” in applicant’s disclosure and drawings. Thus, Becker’s “pair” of characters, as shown in fig. 4:310: “63,780.45”, is not the claimed “pair” due to not being consistent with applicant’s disclosure’s serving of an intended “result” (i.e., use or usefulness or utility) of generating a training model as shown in applicant’s fig. 4:444:458: “TRAINING” “MODEL”. Rather, Becker’s identified “pair” of characters of “63,780.45” of fig. 6: “OCR” occurs after training of fig. 5:108:502:504:506: “Training” “Model”.
	
In contrast to the above cases as shown in this final case, Becker also teaches another “pair” via said definition 12 (“a set…of more than two”) in fig. 5:108: “subsets…of training data” that IS consistent with the disclosure’s use of “pair” intentionally resulting in “trained” “machine-learning models”:
“[0045] Furthermore, individual machine learning models can be combined to form an ensemble machine-learning model.  An ensemble machine-learning model may be 
homogenous (i.e., using multiple member models of the same type) or non-homogenous (i.e., using multiple member models of different types).  Individual machine-learning models within an ensemble may all be trained using the same training data or may be trained using overlapping or non-overlapping subsets randomly selected from a larger set of training data.”

	Thus, definition 12 of “pair” can be taken from the prior art dictionary, as one of ordinary skill in the art would, as the meaning consistent with applicant’s disclosure and drawings use of claimed “pair”, because the use of Becker’s “subsets…of training data”, which is a pair of data via definition 12 (“a set…of more than two”), is consistent with the use or the serving of some intended “result” of “pair” in applicant’s disclosure and drawings (corresponding to “The key-value pairs…serve…results”, cited above) intentionally resulting in applicant’s disclosed and claimed “training model” by using the service or aid provided by the “pair”.











Claim Review - 35 USC § 112

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are NOT rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The term "substantially the same physical location" in claim 1, line 10 is a relative term which renders the claim NOT indefinite. The term "substantially the same physical location” is NOT not defined by the claim, the specification does NOT not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would NOT not be reasonably apprised of the scope of the invention. Thus, the claimed "substantially the same physical location" when taking in the surrounding context, such as the disclosed or claimed “form” comprises a “guide” for the claimed "substantially the same physical location" such that the claimed "substantially the same physical location" does not stray from the “guide” to one of ordinary skill in the art wherein “form” is defined:
form
noun
17	a typical document to be used as a guide in framing others for like cases:
a form for a deed.


	Further the disclosure (US 2020/0151443 A1) provides a “standard… for…comparison” to one of ordinary skill in the art of forms for the claimed “substantially the same physical location” via:
“[0015] Briefly stated, the disclosed technology is generally directed to optical character recognition for forms.  In one example of the technology, optical character recognition is performed on a plurality of forms.  In some examples, the forms of the plurality of forms include at least one type of form.  In some examples, anchors are determined for the forms, including corresponding anchors for each type of form of the plurality of forms.  In some examples, feature rules are determined, including corresponding feature rules 
for each type of form of the plurality of forms.  In some examples, features and labels are determined for each form of the plurality of forms.  In some examples, a training model is generated based on a ground truth that includes a plurality of key-value pairs corresponding to the plurality of forms, and further based on the determined features and labels for the plurality of forms.”

wherein “type” is defined via Dictionary.com wherein meaning 4: “model” is taken:
type, noun
1	a number of things or persons sharing a particular characteristic, or set of characteristics, that causes them to be regarded as a group, more or less precisely defined or designated; class; category:
a criminal of the most vicious type.
2	a thing or person regarded as a member of a class or category; kind; sort (usually followed by of):
This is some type of mushroom.
3	Informal. a person, regarded as reflecting or typifying a certain line of work, environment, etc.:
a couple of civil service types.
4	a thing or person that represents perfectly or in the best way a class or category; model:
the very type of a headmaster.

wherein “model” is defined:
model, noun
1	a standard or example for imitation or comparison.

Thus, claims 1-20 are NOT rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Response to Arguments

Applicant’s arguments, see remarks, pages 9,10 emphasis added:
“First, it is respectfully submitted that the rejection to claim 1 under 35 U.S.C. § 102 should be withdrawn at least because Kumar fails to disclose, "for each type of form, determining anchors for that type of form, such that, for each type of form, the anchors for that type of form are visible form elements present at substantially the same physical location of each form of that type of form," as recited in claim 1 as amended. 
The Office argued that Kumar discloses, "determining anchors for the forms, including corresponding anchors for each type of form of the plurality of forms," based on a text corpus in Kumar that comprises several tens of fully-formed English sentences, composed of 2.4 million English words, arranged in an ordered list so that word combinations and orderings are preserved. However, the text corpus of Kumar does not include anchors that are visible form elements present at substantially the same physical location in substantially each form of the corresponding type of form on which optical character recognition is performed. Rather, the text corpus of Kumar is a list of words with no relation to the physical location of words on forms on which optical character recognition is performed. 
Second, it is respectfully submitted that the rejection to claim 1 under 35 U.S.C. § 102 should be withdrawn at least because Kumar fails to disclose, "generating a training model based on ... the ground truth, wherein the ground truth includes, for each form, at least one key-value pair associated with that form, wherein each key-value pair includes a key and a value, wherein the value includes a data item that includes text from the associated form, and wherein the key includes a data item that is linked to the value in that key-value pair as an identifier of the value in that key-value pair," as recited in claim 1 as amended. 
 	The Office argues that Kumar discloses generating a training module that includes a plurality of key-value pairs, citing fig. 4A, fig. 4B, and col. 3, lines 35-48 of Kumar. More specifically, the Office argues that the image/text pairs discussed at col. 3, lines 35-48 of Kumar are key-value pairs. However, the image/text pairs discussed at col. 3, lines 35-48 of Kumar do not include a value that is a data item that includes text from the associated form, or a key that includes a data item that is linked to a value in a key-value pair as an identifier of a value in a key-value pair.”

, filed 3/23/21, with respect to the rejection(s) of claim(s) 1-8,11-13 and 16-18 under 35 USC 102(a)(2) in the Office action of 12/28/20, starting page 47, have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is NOT made in view of 35 USC 103 in view of Consul (Learning how to Extract Information from Scanned Documents) that teaches an anchor being a visible cat element on visible skate-board element in page 15, fig. 2-2.

In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., “optical character recognition is performed” via applicant’s arguments, see remarks, pages 9,10 emphasis added:
“First, it is respectfully submitted that the rejection to claim 1 under 35 U.S.C. § 102 should be withdrawn at least because Kumar fails to disclose, "for each type of form, determining anchors for that type of form, such that, for each type of form, the anchors for that type of form are visible form elements present at substantially the same physical location of each form of that type of form," as recited in claim 1 as amended. 
The Office argued that Kumar discloses, "determining anchors for the forms, including corresponding anchors for each type of form of the plurality of forms," based on a text corpus in Kumar that comprises several tens of fully-formed English sentences, composed of 2.4 million English words, arranged in an ordered list so that word combinations and orderings are preserved. However, the text corpus of Kumar does not include anchors that are visible form elements present at substantially the same physical location in substantially each form of the corresponding type of form on which optical character recognition is performed. Rather, the text corpus of Kumar is a list of words with no relation to the physical location of words on forms on which optical character recognition is performed. 
Second, it is respectfully submitted that the rejection to claim 1 under 35 U.S.C. § 102 should be withdrawn at least because Kumar fails to disclose, "generating a training model based on ... the ground truth, wherein the ground truth includes, for each form, at least one key-value pair associated with that form, wherein each key-value pair includes a key and a value, wherein the value includes a data item that includes text from the associated form, and wherein the key includes a data item that is linked to the value in that key-value pair as an identifier of the value in that key-value pair," as recited in claim 1 as amended. 
 	The Office argues that Kumar discloses generating a training module that includes a plurality of key-value pairs, citing fig. 4A, fig. 4B, and col. 3, lines 35-48 of Kumar. More specifically, the Office argues that the image/text pairs discussed at col. 3, lines 35-48 of Kumar are key-value pairs. However, the image/text pairs discussed at col. 3, lines 35-48 of Kumar do not include a value that is a data item that includes text from the associated form, or a key that includes a data item that is linked to a value in a key-value pair as an identifier of a value in a key-value pair.”

) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
	


In contrast, claim 1, line 5 claims “performing optical character recognition” and in addition applicant’s disclosure (US 2020/0151443 A1) states “not limited”:
“[0068] For clarity, the processes described herein are described in terms of operations performed in particular sequences by particular devices or components of a system.  However, it is noted that other processes are not limited to the stated sequences, devices, or components.  For example, certain acts may be performed in different sequences, in parallel, omitted, or may be supplemented by additional acts or features, whether or not such sequences, parallelisms, acts, or features are described herein.  Likewise, any of the technology described in this disclosure may be incorporated into the described processes or other processes, whether or not that technology is specifically described in conjunction with a process.  The disclosed processes may also be performed on or by other devices, components, or systems, whether or not such 
devices, components, or systems are described herein.  These processes may also 
be embodied in a variety of ways.  For example, they may be embodied on an article of manufacture, e.g., as processor-readable instructions stored in a processor-readable storage medium or be performed as a computer-implemented process.  As an alternate example, these processes may be encoded as processor-executable instructions and transmitted via a communications medium.”

Thus claim 1, line 5’s “performing optical character recognition” is “not limited” to  “optical character recognition is performed”.











Applicant’s arguments, see remarks, pages 9,10 emphasis added:
“First, it is respectfully submitted that the rejection to claim 1 under 35 U.S.C. § 102 should be withdrawn at least because Kumar fails to disclose, "for each type of form, determining anchors for that type of form, such that, for each type of form, the anchors for that type of form are visible form elements present at substantially the same physical location of each form of that type of form," as recited in claim 1 as amended. 
The Office argued that Kumar discloses, "determining anchors for the forms, including corresponding anchors for each type of form of the plurality of forms," based on a text corpus in Kumar that comprises several tens of fully-formed English sentences, composed of 2.4 million English words, arranged in an ordered list so that word combinations and orderings are preserved. However, the text corpus of Kumar does not include anchors that are visible form elements present at substantially the same physical location in substantially each form of the corresponding type of form on which optical character recognition is performed. Rather, the text corpus of Kumar is a list of words with no relation to the physical location of words on forms on which optical character recognition is performed. 
Second, it is respectfully submitted that the rejection to claim 1 under 35 U.S.C. § 102 should be withdrawn at least because Kumar fails to disclose, "generating a training model based on ... the ground truth, wherein the ground truth includes, for each form, at least one key-value pair associated with that form, wherein each key-value pair includes a key and a value, wherein the value includes a data item that includes text from the associated form, and wherein the key includes a data item that is linked to the value in that key-value pair as an identifier of the value in that key-value pair," as recited in claim 1 as amended. 
 	The Office argues that Kumar discloses generating a training module that includes a plurality of key-value pairs, citing fig. 4A, fig. 4B, and col. 3, lines 35-48 of Kumar. More specifically, the Office argues that the image/text pairs discussed at col. 3, lines 35-48 of Kumar are key-value pairs. However, the image/text pairs discussed at col. 3, lines 35-48 of Kumar do not include a value that is a data item that includes text from the associated form, or a key that includes a data item that is linked to a value in a key-value pair as an identifier of a value in a key-value pair.”

, filed 3/23/21, with respect to the rejection(s) of claim(s) 1-8,11-13 and 16-18 under 35 USC 102(a)(2) have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is NOT made in view of 35 USC 103 in view of Consul (Learning how to Extract Information from Scanned Documents) that teaches key-values pairs in data-form in page 48, fig. 4-9 such as in the 3rd line: (height value= “24.6094”, key=“ ‘medical’ ”) as one pair and (width=“103.6934”, “ ‘medical’ ”) as another pair.


Applicants state on page 11, emphasis added:
“For example, the Claim Interpretation states, "the claim term 'anchor' is used 'to generate a language model."' The Present Office Action is not clear here, but the Present Office Action appears to be arguing that "to generate a language model" is the definition of the term anchor. It is not entirely clear whether that is what the Present Office Action is attempting to state or not. However, if the Present Office Action is stating that the term "anchor" as used in the claims is defined as something that is used to generate a language model, the undersigned disagrees.
The specification states that the anchor may be used to generate a language model, but this is not the definition of an anchor, and is not a statement of necessary and/or sufficient conditions for something to be an anchor. Accordingly, it is respectfully submitted that it is improper to use "to generate a language model" as a definition of the claim term "anchor.”
The Claim Interpretation section is relatively lengthy, and at least some aspects of the Claim Interpretation section may be moot in light of the amendments to the claims. Therefore, although the undersigned disagrees with multiple aspects of the Claim Interpretation section, each of the points with which the undersigned disagrees is not being specifically discussed herein.”

In response, the examiner is not attempting to define “anchor” in this context as mentioned in Claim Interpretation: 
MPEP 2111.01 III. “PLAIN MEANING” REFERS TO THE ORDINARY AND CUSTOMARY MEANING GIVEN TO THE TERM BY THOSE OF ORDINARY SKILL IN THE ART, 3rd paragraph, emphasis added:
“It is also appropriate to look to how the claim term is used in the prior art, which includes prior art patents, published applications, trade publications, and dictionaries. Any meaning of a claim term taken from the prior art must be consistent with the use of the claim term in the specification and drawings. Moreover , when the specification is clear about the scope and content of a claim term, there is no need to turn to extrinsic evidence for claim interpretation. 3M Innovative Props. Co. v. Tredegar Corp., 725 F.3d 1315, 1326-28, 107 USPQ2d 1717, 1726-27 (Fed. Cir. 2013) (holding that "continuous microtextured skin layer over substantially the entire laminate" was clearly defined in the written description, and therefore, there was no need to turn to extrinsic evidence to construe the claim).”

Rather, the examiner is attempting to determine “the use of the claim term in the specification and drawings”, such as how “anchor” is used in the disclosure consistent with MPEP 2111, last paragraph, emphasis added:




“The broadest reasonable interpretation does not mean the broadest possible interpretation. Rather, the meaning given to a claim term must be consistent with the ordinary and customary meaning of the term (unless the term has been given a special definition in the specification), and must be consistent with the use of the claim term in the specification and drawings. Further, the broadest reasonable interpretation of the claims must be consistent with the interpretation that those skilled in the art would reach. In re Cortright, 165 F.3d 1353, 1359, 49 USPQ2d 1464, 1468 (Fed. Cir. 1999) (The Board’s construction of the claim limitation "restore hair growth" as requiring the hair to be returned to its original state was held to be an incorrect interpretation of the limitation. The court held that, consistent with applicant’s disclosure and the disclosure of three patents from analogous arts using the same phrase to require only some increase in hair growth, one of ordinary skill would construe "restore hair growth" to mean that the claimed method increases the amount of hair grown on the scalp, but does not necessarily produce a full head of hair.). Thus the focus of the inquiry regarding the meaning of a claim should be what would be reasonable from the perspective of one of ordinary skill in the art. In re Suitco Surface, Inc., 603 F.3d 1255, 1260, 94 USPQ2d 1640, 1644 (Fed. Cir. 2010); In re Buszard, 504 F.3d 1364, 84 USPQ2d 1749 (Fed. Cir. 2007). In Buszard, the claim was directed to a flame retardant composition comprising a flexible polyurethane foam reaction mixture. 504 F.3d at 1365, 84 USPQ2d at 1750. The Federal Circuit found that the Board’s interpretation that equated a "flexible" foam with a crushed "rigid" foam was not reasonable. Id. at 1367, 84 USPQ2d at 1751. Persuasive argument was presented that persons experienced in the field of polyurethane foams know that a flexible mixture is different than a rigid foam mixture. Id. at 1366, 84 USPQ2d at 1751.”

For example applicant’s disclosure states:

[0044] Anchor generation block 455 may receive the document with lines breaks added from value extraction block 454, and may determine anchors for the particular type of form according to the current form.  Anchors, in these examples, are fields that would appear in the empty form for the current form.  For example, a form may have been filled out from an empty form, where the empty form is the version of the form that exists before the form is filled out.  Even if the empty form itself it not accessible, it may be possible to determine or approximately determine the empty form based on, among other things, the intersection of several forms of the same type.  The fields present in the determined empty form are defined as anchors. 

Thus in this context, “anchor” is used as an adjective, as in fig. 5:565: “ANCHOR FINDING” and used as a noun. Thus, the meaning “given” by the examiner under the broadest reasonable interpretation of “anchor” will comprise an adjective-form and a noun-form of “anchor”. The verb-form of “anchor” under the broadest reasonable interpretation has not been established by the examiner but does not mean that the claimed “anchor” is excluded from being a verb.

Another “use” of anchor is the ultimate every-day practical application, such as under 35 USC 101, such as shown in applicant’s fig. 5. Thus, the meaning “given” by the examiner under the broadest reasonable interpretation of the claim term “anchor” must be consistent with the “use”, such as in the ultimate practical application under 35 USC 101, of the claim term in the specification and the drawings such as in applicant’s fig. 5:565: “ANCHOR FINDING”.
Thus, if “anchor” is “given” the meaning by the examiner in the context of anchoring a boat being the “use”, which is not “given” by the examiner, such a given meaning is not consistent with the disclosure’s “use” of “anchor” as shown in applicant’s drawings, such as in applicant’s fig. 5, and thus does not fall under the broadest reasonable interpretation in light of applicant’s disclosure as one of skill in the art of applicant’s disclosure would reach upon reading applicant’s disclosure.
Thus, if applicant’s fig. 5 did show a boat (“boat” is not found in applicant’s disclosure), then a meaning “taken” by the examiner from the prior art or “given” by the examiner ought be consistent with the “use of the claim term in the specification and the drawings”.
Thus, a meaning “taken” by the examiner from the prior art or “given” by the examiner for a “claim term” (such as “anchor”) ought be consistent with the “use of the claim term in the specification and the drawings”. 



The examiner has provided examples 1)-9) in Claim Interpretation about how the term “anchor” as appearing in the prior art IS and is NOT consistent with applicant’s “use” of the claim term in applicant’s specification and drawings. Thus any one of the examples 1)-9) regarding “anchor” and meaning thereof with “NOT” is not “consistent with the use of the claim term in the specification and the drawings” even though “anchor” appears in the examples of the prior art. 
Thus the examiner will give meaning under the broadest reasonable interpretation to a claim term that “must be consistent with the ordinary and customary meaning of the term (unless the term has been given a special definition in the specification) and must be consistent with the use of the claim term in the specification and the drawings”.
	In the case of the claimed “anchor”, the claimed “anchor” is already “given” meaning via the above definition of “anchor” via Dictionary.com and thus those meanings “taken” (such as “anchor” ’s definition  “5	a person or thing that can be relied on for support, stability, or security; mainstay: Hope was his only anchor.”) by the examiner of “anchor” “must be consistent with the use of the claim term in the specification and the drawings”. Thus, the examiner is not defining “anchor”, rather the examiner is taking any one “meaning of a claim term from the prior art” resulting in a “taken” meaning that is already “given” by one of ordinary skill in the art via Dictionary.com. Thus, the examiner need not give a meaning of “anchor” such that the meaning of “anchor” is “given”, because the meaning of “anchor” is already “given” by one of ordinary skill in the art of anchors via Dictionary.com. 
	Further a meaning that is “taken” does not mean other meaning can’t be “taken”.
Applicants state on page 11, emphasis added:
“For example, the Claim Interpretation states, "the claim term 'anchor' is used 'to generate a language model."' The Present Office Action is not clear here, but the Present Office Action appears to be arguing that "to generate a language model" is the definition of the term anchor. It is not entirely clear whether that is what the Present Office Action is attempting to state or not. However, if the Present Office Action is stating that the term "anchor" as used in the claims is defined as something that is used to generate a language model, the undersigned disagrees.
The specification states that the anchor may be used to generate a language model, but this is not the definition of an anchor, and is not a statement of necessary and/or sufficient conditions for something to be an anchor. Accordingly, it is respectfully submitted that it is improper to use "to generate a language model" as a definition of the claim term "anchor.”
The Claim Interpretation section is relatively lengthy, and at least some aspects of the Claim Interpretation section may be moot in light of the amendments to the claims. Therefore, although the undersigned disagrees with multiple aspects of the Claim Interpretation section, each of the points with which the undersigned disagrees is not being specifically discussed herein.”

In response, the examiner is not defining “anchor” as discussed in the previous examiner response. Rather, the examiner is taking meanings of a term from the prior art that already has given meanings that are consistent with the use of the claim term, “anchor”, in applicant’s disclosure and drawings.









Applicants state on page 11, emphasis added:
“For example, the Claim Interpretation states, "the claim term 'anchor' is used 'to generate a language model."' The Present Office Action is not clear here, but the Present Office Action appears to be arguing that "to generate a language model" is the definition of the term anchor. It is not entirely clear whether that is what the Present Office Action is attempting to state or not. However, if the Present Office Action is stating that the term "anchor" as used in the claims is defined as something that is used to generate a language model, the undersigned disagrees.
The specification states that the anchor may be used to generate a language model, but this is not the definition of an anchor, and is not a statement of necessary and/or sufficient conditions for something to be an anchor. Accordingly, it is respectfully submitted that it is improper to use "to generate a language model" as a definition of the claim term "anchor.”
The Claim Interpretation section is relatively lengthy, and at least some aspects of the Claim Interpretation section may be moot in light of the amendments to the claims. Therefore, although the undersigned disagrees with multiple aspects of the Claim Interpretation section, each of the points with which the undersigned disagrees is not being specifically discussed herein.”

In response, the examiner agrees with applicants regarding “the anchor may be used to generate a language model, but this is not the definition of an anchor”. Rather, the definition of anchor has already been “given” as discussed above and thus is ready for the taking so as long as the meaning is consistent with the use of the claim term, “anchor”, in applicant’s specification and drawings.








Applicants state on page 11, emphasis added:
“For example, the Claim Interpretation states, "the claim term 'anchor' is used 'to generate a language model."' The Present Office Action is not clear here, but the Present Office Action appears to be arguing that "to generate a language model" is the definition of the term anchor. It is not entirely clear whether that is what the Present Office Action is attempting to state or not. However, if the Present Office Action is stating that the term "anchor" as used in the claims is defined as something that is used to generate a language model, the undersigned disagrees.
The specification states that the anchor may be used to generate a language model, but this is not the definition of an anchor, and is not a statement of necessary and/or sufficient conditions for something to be an anchor. Accordingly, it is respectfully submitted that it is improper to use "to generate a language model" as a definition of the claim term "anchor.”
The Claim Interpretation section is relatively lengthy, and at least some aspects of the Claim Interpretation section may be moot in light of the amendments to the claims. Therefore, although the undersigned disagrees with multiple aspects of the Claim Interpretation section, each of the points with which the undersigned disagrees is not being specifically discussed herein.”

In response, the examiner agrees “that it is improper to use ‘to generate a language model’ as a definition of the claim term ‘anchor’.” Rather, "to generate a language model" is just one example of “the use of the claim term in the specification and drawings” in order to establish or determine or ascertain or recognize or identify or take or give a consistent meaning of the claim term “anchor” as used in applicant’s disclosure and drawings.







Applicants state on page 11, emphasis added:
“For example, the Claim Interpretation states, "the claim term 'anchor' is used 'to generate a language model."' The Present Office Action is not clear here, but the Present Office Action appears to be arguing that "to generate a language model" is the definition of the term anchor. It is not entirely clear whether that is what the Present Office Action is attempting to state or not. However, if the Present Office Action is stating that the term "anchor" as used in the claims is defined as something that is used to generate a language model, the undersigned disagrees.
The specification states that the anchor may be used to generate a language model, but this is not the definition of an anchor, and is not a statement of necessary and/or sufficient conditions for something to be an anchor. Accordingly, it is respectfully submitted that it is improper to use "to generate a language model" as a definition of the claim term "anchor.
The Claim Interpretation section is relatively lengthy, and at least some aspects of the Claim Interpretation section may be moot in light of the amendments to the claims. Therefore, although the undersigned disagrees with multiple aspects of the Claim Interpretation section, each of the points with which the undersigned disagrees is not being specifically discussed herein.”

In response, the claimed “corresponding” is removed from Claim Interpretation because “corresponding” has a 
	Thus, in view of the above there is no remaining prior art rejection under 35 USC 102 or 35 USC 103 since all prior art rejections are withdrawn.














Allowable Subject Matter
Claims 1-20 are allowed.
The following is an examiner’s statement of reasons for allowance:
The claims are allowed for the reasons as discussed above, reproduced below:
Applicant’s arguments, see remarks 3/23/21, pages 9,10 emphasis added:
“First, it is respectfully submitted that the rejection to claim 1 under 35 U.S.C. § 102 should be withdrawn at least because Kumar fails to disclose, "for each type of form, determining anchors for that type of form, such that, for each type of form, the anchors for that type of form are visible form elements present at substantially the same physical location of each form of that type of form," as recited in claim 1 as amended. 
The Office argued that Kumar discloses, "determining anchors for the forms, including corresponding anchors for each type of form of the plurality of forms," based on a text corpus in Kumar that comprises several tens of fully-formed English sentences, composed of 2.4 million English words, arranged in an ordered list so that word combinations and orderings are preserved. However, the text corpus of Kumar does not include anchors that are visible form elements present at substantially the same physical location in substantially each form of the corresponding type of form on which optical character recognition is performed. Rather, the text corpus of Kumar is a list of words with no relation to the physical location of words on forms on which optical character recognition is performed. 
Second, it is respectfully submitted that the rejection to claim 1 under 35 U.S.C. § 102 should be withdrawn at least because Kumar fails to disclose, "generating a training model based on ... the ground truth, wherein the ground truth includes, for each form, at least one key-value pair associated with that form, wherein each key-value pair includes a key and a value, wherein the value includes a data item that includes text from the associated form, and wherein the key includes a data item that is linked to the value in that key-value pair as an identifier of the value in that key-value pair," as recited in claim 1 as amended. 
 	The Office argues that Kumar discloses generating a training module that includes a plurality of key-value pairs, citing fig. 4A, fig. 4B, and col. 3, lines 35-48 of Kumar. More specifically, the Office argues that the image/text pairs discussed at col. 3, lines 35-48 of Kumar are key-value pairs. However, the image/text pairs discussed at col. 3, lines 35-48 of Kumar do not include a value that is a data item that includes text from the associated form, or a key that includes a data item that is linked to a value in a key-value pair as an identifier of a value in a key-value pair.”

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Claim 1 is/are reviewed under 35 U.S.C. 103 via Kumar et al. (US Patent 10,489,682) in view of Consul (Learning how to Extract Information from Scanned Documents) further in view of Ferrara et al. (Similarity Recognition in the Web of Data) further in view of HARRIS et al. (US 2018/0226066 A1).
Regarding claim 1, Kumar teaches an apparatus, comprising: 
a device (fig. 7:700) including at least one memory (fig. 7:706) adapted to store run-time data for the device (700), and at least one processor (fig. 7:702) that is adapted to execute processor-executable code that, in response to execution, enables the device (700) to perform actions, including:
performing optical character recognition (via fig. 1:100:“OCR”) on a plurality of forms (corresponding to fig. 3:302-309 one of which is shown in fig. 2), wherein the forms (corresponding to fig. 3:302-309) s of at least one type (or “other types or business documents” via c.5,ll. 44-50:
It should be noted that while the present disclosure shows invoices as an example of a particular domain in which the principles of the invention may be employed, the principles described herein may be employed equally well in other domains, and in particular in other types of business documents, e.g., passports, property deeds, utility bills, eviction notices, and bank account statements.); 




for each type of form, determining anchors (resulting in “anchored” “sequence of text words”) that type of form  , such that, for each type of form, the anchors for that type of form are visible form elements present at substantially the same physical location of each form of that type of form (via c.8,ll. 38-52:
Text corpus 432, in one embodiment, comprises several tens of thousands of fully-formed English sentences, composed of 2.4 million English words.  This is advantageously large enough to capture the statistics of commonly used words 
and/or word combinations.  In one embodiment, the 2.4 million English words is arranged in an ordered list so that word combinations and orderings are preserved.  A sequence of text words for training data is first formed by choosing consecutive words from the ordered list, and anchored at a random list-index.  The segment length chosen randomly as a number between 1 through 8.  Once the sequence of text words is determined it is then rendered into an image using a randomly chosen font and font size.  This allows creation of (image, text) pairs randomly for training data.); 

a ground truth (or “expensive ground-truth creation via human labeling” in fig. 4B:428: “Human Labeling”), determining (via “extract” comprising a deduction) feature rules (via “recurrent layers” that deduce or “extract rules” that say “how to stitch…features”)
“Optical character recognition systems capable of providing the level of accuracy required for business processes, such as with human accuracy or better are disclosed herein.  The disclosed systems employ a novel methodology that exploits a collection of weak & low-accuracy OCR systems.  This is employed to build a strong & high-accuracy deep-learning system for the OCR problem. The disclosed systems reduce the need for expensive ground-truth creation via human labeling, which is almost always needed for deep-learning models.  Certain embodiments employ a number of accuracy improvement techniques that can be adapted to deep learning solutions for other problems.”; 
c.5,ll. 44-50:
“It should be noted that while the present disclosure shows invoices as an example of a particular domain in which the principles of the invention may be employed, the principles described herein may be employed equally well in other domains, and in particular in other types of business documents, e.g., passports, property deeds, utility bills, eviction notices, and bank account statements.”

wherein “principles” is defined via Dictionary.com :”an accepted…rule of action”:
“principle, noun
1	an accepted or professed rule of action or conduct:
a person of good moral principles.; and
c.6,ll.51-62:
“Deep learning system 103 preferably comprises several convolution layers with recurrent layers stacked on top.  The convolution layers extract low-level features from the input image, whereas recurrent layers extract rules of how to stitch a sequence of image features to derive a sequence of text characters.  Recurrent layers are most effectively trained in phases (called curriculums) with progressively increasing difficulty-level of training data used in each phase.  This is called curriculum learning, and the curricula are preferably based on ramping length of text and noise levels.  In certain 
embodiments, the Keras framework, available at https://github.com/mbhenry (Author: Mike Henry) may be used.”);

determining features (via said recurrent rules) and labels (via said human labeling of fig. 4B:428) for each form of the plurality of forms; and










generating a training model (fig. 4A:414: “OCR Model Training”) based on the determined features and labels and on the , wherein the ground truth (said human labeling of fig. 4B:428 for fig. 1:103: “Deep Learning System”)  includes, for each form, at least one key-value  pair (corresponding to fig. 4B: “(image, text)” or “image/text pairs”) associated with that form, wherein each key-value pair includes a key and a value, wherein the value includes a data item that includes text from the associated form, and wherein the key includes a data item that is linked to the value in that key-value pair as an identifier of the value in that key-value pair (via c.3,ll. 35-48:
“Deep neural nets require large data sets for adequate training.  In the case of OCR, the data takes the form of tuples, image/text pairs.  Conventionally, such data can be generated by humans who perform manual labeling.  To increase accuracy, the same data may be labeled by multiple individuals and then compared to identify mistakes.  While this technique can yield a high-quality data set for training, it can be quite time consuming and expensive when considering that a data set to adequately train a deep neural net can comprise millions or tens of millions of tuples.  Advantageously, the disclosed systems employ a novel combination of conventional, relatively low-accuracy OCR systems to generate training data rapidly and without 
requiring time-consuming and expensive human labeling.”).










Thus, Kumar does not teach, as indicated in bold above, the claimed:

A.	“, such that, for each type of form, the anchors for that type of form are visible form elements present at substantially the same physical location of each form of that type of form”; and
B.	“wherein the ground truth …includes, for each form, at least one …key-value…pair associated with that form, wherein each key-value pair includes a key and a value, wherein the value includes a data item that includes text from the associated form, and wherein the key includes a data item that is linked to the value in that key-value pair as an identifier of the value in that key-value pair”.













	
Accordingly, Consul teaches:
A.	, such that, for each type of form (via “different types of…forms”), the anchors (or “pre-determined anchors”) for that type of form are visible (as shown in page 15, fig. 2-2:cat-box, fig. 2-4:dog-bike-box, figs. 2-5, pages 33,34: figs. 4-1,4-2:text-box) form elements present at substantially the same physical location (as shown in fig. 4-2:check-boxes) of each form of that type of form; and
B.	wherein the ground truth (or “ground-truth positions (x, y, width, height)” indicated by the text-boxes corresponding to fig. 4-2: “medical need”) …includes ( corresponding to “The next step to improving this system will be to build a stronger ruleset”), for each form, at least one …key-value…pair (via “into two key-value pairs” as shown in page 48, fig. 4-9 as any one “<txt height…width.../>” line, each comprising two formed key-values pairs compared to the “ground truth”: said “height”) associated with that form, wherein each key-value pair includes a key and a value, wherein the value includes a data item that includes text (fig. 4-9: “ ‘medical’ ”) from the associated form, and wherein the key includes a data item that is linked (by being on the same said  “<txt height…width.../>” line) to the value in that key-value pair as an identifier of the value in that key-value pair (as understood to one of skill in the art of key-value pairs via
“ABSTRACT
In recent years, there has been a lot of interest in methodologies for extracting
information from text-based documents. Specifically in the medical field, a recent challenge has been to extract information from different types of scanned medical documents, such as patient registration forms, prescription order forms, and medical history forms. The lack of structure and large variety of information across these documents makes it difficult to automate the process of retrieving data. Today, humans read the documents and manually record the key pieces of information.”;




page 14:
The R-CNN architecture provided a baseline for other object detection systems that came later: DenseCap (2015) [14], Faster R-CNN (2015) [15], and YOLO (2016) [16], which all aimed to detect, localize, and classify objects in datasets similar to the R-CNN datasets. The DenseCap system aimed to provide dense captions for images by first localizing objects, and then annotating the image with dense captions. The system used a Fully Convolutional Localization Network (FCLN) architecture, which was composed of a Convolutional Neural Network, followed by a Recurrent Neural Network language model for generating the dense captions. The use of a CNN for object detection was inspired by the R-CNN architecture [8], but in the FCLN design, there were no external region proposals necessary. Instead, the system used a Localization layer which proposed regions of interest by regressing bounding box coordinates on pre-determined anchors, where each anchor would be the center of the bounding boxes of various aspect ratios [14]. This system aimed to classify and localize objects, while also combining the classifications into complex captions through the RNN language model. An example of this is shown in Figure 2-2.”;

pages 47,48:
“As described above, after detecting and recognizing all of the text, the program would decipher each text element into two key-value pairs. The unconstrainedRecog key mapped to the string of detected alphanumeric characters that made up the element, and the lexRecog key mapped to the string of alphabet characters (excluding numbers and other symbols) that the text element contained. Because a large amount of text in medical documents contained alphanumeric characters and punctuation marks, I used the string corresponding to the unconstrainedRecog key first before looking at the text from the lexRecog key. Figure 4-9 depicts that the unconstrainedRecog text tends to be more accurate than the text in lexRecog for
my use case. For example, the third text element is found to contain the text “address:” in the undeciphered unconstrainedRecog, but only “address” in the deciphered lexRecog. With this element, the extra punctuation (or lack of) does not make a difference when the third and final component of the system makes meaning out of the text elements. However, looking at the bottom-most element, we see that the unconstrainedRecog contains the string “4-2385”, while the lexRecog deciphers the string as “bs”. This element pertains to a portion of the patient’s phone number. Because the lexRecog piece of data only uses alphabet characters to encode the
data, any numerical information would be missed. Thus, I initially looked at the… unconstrainedRecog pieces of data.”; and









page 52:
“In order to evaluate the accuracy of the field extraction process, I measured the
correctness of the positional data. I manually recorded the ground-truth positions (x, y, width, height) of the field content for each field of interest within the documents, and I compared those values with the positional data given by the algorithm. This is described in more detail in the next section.”; and

page 57:
“In this thesis I describe two different approaches to extracting fields from medical forms: an object-detection approach and a text-spotting approach. Both of these methods have been applied to different applications, and I explored how to apply them to extracting information from scanned medical documents. I found that the object-detection approach resulted in a 65-70% accuracy for classifying document content as the correct field or not a field at all, but a majority of the accuracy came from classifying whitespace as whitespace, so it did not perform very well overall. The text-spotting approach seemed to have more promise, but the challenge was in building a ruleset that would be robust to any input. In the end, my ruleset performed well at low thresholds. With a threshold of 0.1, the system achieved 83% accuracy. The next step to improving this system will be to build a stronger ruleset.”).


























Thus, one of ordinary skill in the art of optical character recognition or OCR can modify Kumar’s teaching of OCR as shown in Kumar’s fig. 1:100:OCR with Consul’s teaching of the “pre-determined anchors” and said any one “<txt height…./>” line, each comprising two key-values pairs by:
A.	performing first Consul’s associated method “such as text-spotting” (cited below)  regarding the “pre-determined anchors” and said any one “<txt height…./>” line, each comprising two key-values pairs;
B.	insert the spotting results into Kumar’s fig. 1:100; and then 
C.	perform Kumar’s fig. 1:100: “OCR” based on the spotting results;

and recognize that the modification is predictable or looked forward to because the modification will make “OCR…more efficient” in via Consul:
pages 11,12:
“When hospitals digitize their medical documents, they need to be converted from image scans into machine-readable formats. There are different techniques that exist to convert the image scans into machine-readable formats, such as Optical Character Recognition (OCR). OCR uses pattern recognition to identify both typed and handwritten characters and words within a document. There has been work done to extract fields from biomedical documents using OCR [6][7]. However, there is a lot of post-processing manual labor needed to correct recognition errors and deduce the context or relations amongst the words. OCR tends to work better when digitizing content of structured line-based text documents [18]. These structured documents
contain text of consistent fonts and sizes, and are usually split into horizontal lines throughout the document [18]. Medical documents tend to lack structure and have a variety of fonts and sizes of text within a single document. The complications brought by OCR can be avoided by using a method that does not use OCR at all (such as object detection and localization), or by using a method that reformats the input prior to applying OCR to make the process more efficient (such as text-spotting). There has not been much research done in these areas with regards to extracting information from medical documents. However, there has been research in these areas with regards to other applications, described in the following subsections.

	However, the combination does not teach, as indicated in bold above, the claimed”
“wherein the ground truth  …includes … at least one …key-value…pair”.

Accordingly, Consul teaches said “The next step to improving this system will be to build a stronger ruleset”.
Thus, one of ordinary skill in the art of machine learning can:
A.	use the results from the ground truth as shown in Consul’s fig. 4-10: “Accuracy Measurements”;
B.	build or generate a stronger “underlying” rule set based on the results of the ground truth; and
C.	run Consul’s deep learning again based on generated stronger “underlying” rule set based on the results of the ground truth to output data as shown in Consul’s fig. 4-9 which includes key-value pairs with improved results.  

wherein “deep learning” is defined via Dictionary.com:
deep learning
noun Computers.
an advanced type of machine learning that uses multilayered neural networks to establish nested hierarchical models for data processing and analysis, as in image recognition or natural language processing, with the goal of self-directed information processing.

wherein “machine learning” is defined:
BRITISH DICTIONARY DEFINITIONS FOR MACHINE LEARNING
machine learning
noun
1	a branch of artificial intelligence in which a computer generates rules underlying or based on raw data that has been fed into it

Again, the combination of the combination does not teach, as indicated in bold above, the claimed:
“wherein the ground truth  …includes … at least one …key-value…pair”. 
Instead the combination results in “fig. 4-9 which includes key-value pairs”.





Accordingly, Ferrara teaches “the ground truth…is built on…feature-value pairs” via page 266, right column, section: 4.1 Experiment setup:
“Quality assessment. The ground truth has been produced by exploiting a novel crowdsourcing approach called Liquid Crowd. A crowdsourcing approach consists in reducing a problem in a set of elementary units of work that are distributed to a (possibly) large number of human workers. Each worker participates giving the solution for one or more work units and receives a reward (e.g., money, personal satisfaction or other benefits) proportional to the completed amount of work. The main idea behind Liquid Crowd is to change the definition of worker from a single user to a group of users. A work unit is considered accepted only if the assigned group reaches a consensus on the produced answer (i.e., the qualified majority of users converge on the same answer). In our experimentation, the ground truth for quality assessment is built on 58 individuals from Freebase repository with a total number of 275 feature-value pairs.
Thus, the work units have been structured as a blind evaluation of a pair of web resources. For instance, the users have to evaluate the similarity of the given resources only knowing their features and features-values without knowing their identifiers (i.e., the names of the resources): this is done to avoid that users exploit their personal knowledge in evaluating the similarity.”

Thus, one of ordinary skill in the art of ground-truth with key-value pairs or feature-value pairs can modify Consul’s teaching of the ground truth positions by including Ferrara’s ground truth built on feature-value pairs and using the positional built ground truth built on feature-value pairs to search in the context of position via Consul’s teaching of the Internet. Thus, the modification does not result in the claimed invention. 
Additionally, if said one of ordinary skill in the art of key-value pairs asks said one of skill in the art of machine learning to modify Ferrara’s ground truth built on feature-value pairs such that the ground truth built on feature-value pairs are used for the claimed “generating a training model” in claim 1, line 16, then such a modification appears as hindsight reconstruction of applicant’s invention because there is no guidance on how to use Ferrara’s ground truth built on feature-value pairs with OCR machine learning since Ferrara’s ground truth built on feature-value pairs is for the Internet.
Accordingly, Harris of the same assignee teaches:
wherein the ground truth (or “ground truth”)  …includes … at least one …key-value…pair (or “pair (key-value, frame)” via:
“[0064] Two metrics may be defined: frame identification and frame creation.  For frame identification, for each dialogue act, the ground truth pair (key-value, frame) may be compared to the one predicted by the frame tracker.  Performance may be computed as the number of correct predictions over the number of pairs.  A prediction may be deemed correct if the frame, key, and/or value are the same (e.g., an exactly or approximate match, within a certain threshold, etc.) in the ground truth and in the prediction.  The frame may be the id of the referred frame.  The key and value may be respectively the type and the value of the slot used to refer to the frame (as said previously, these can be null).  It will be appreciated that other metrics or conditions may be used to determine whether a prediction is correct.  Frame creation may be 
computed as the number of times the frame tracker predicts that a frame is 
created over the number of dialogue turns.”). 

	Thus, one of skill in the art of natural language processing can modify Consul’s teaching of natural language processing of machine-readable text with Harris’ teaching of natural language processing of utterance frames by including Harris’ teaching of natural language processing, as shown in Harris’ fig. 3, of utterance frames with Consul’s teaching of natural language processing of text resulting in a person reading the text while recording the utterance to become machine-readable. However, the combination does not result in claim 1 because Consul’s work is not directed to natural language processing, such as recording utterances, and instead is directed to extracting information or important or useful facts, such as a person’s name, from a scanned document. 




Guruprasad et al. (US Patent App. Pub. No. US 2020/0005089 A1) is pertinent as teaching “key-value pairs” and “ground truth data” comprising the “value” “SGE-28984” in “Table 3” via:
“[0035] In one embodiment, in order to arrive an optimal threshold to determine the extraction correctness, the threshold indicates Green, which indicates developed system has trust on the extracted values so that the user (data entry person) need to check for its accuracy; and below the threshold means Red, that mean user has to look at the document and verify whether it is extracted correctly or not.  In this case, an example of extraction is key-value pairs, for instance, Invoice Number -1234, here key is invoice number and value is 1234.  Key is what needs to be extracted, and value is the corresponding value in the document that represents the key.  In the scenario where it is required to extract invoice number from 100 invoices (training data), the OCR confidence for each invoice for the field invoice number can be used.  Also, upon lookin 
at the actual document, it is known whether invoice number from the document is extracted correctly or not.  That means, now there are two values associated with invoice number: (i) OCR confidence obtained from OCR engine, and (ii) match/mismatch information from ground truth.  Using these values, a decision 
matrix is framed as below: 
 
TABLE-US-00002 TABLE 2 Match: X Mismatch: Y Maximum Confidence 100 71 Minimum Confidence 12 22”; and

“[0062] In one embodiment, the data may be acquired by the learning engine (300C) using a data adapter (339) and the account-specific configurator (331).  The data adapter is configured to accept a set of predefined extraction criteria and a set of parameters as provided in the configuration file for acquiring the data.  The data adapter is configured to capture a set of historical datasets comprising of ground truth data as provided in Table 3 and OCR extracted data for each field.  The extraction criteria may comprise a set of preformatted and predefined extraction templates like, but not limited to, one of a regular expression, geometric markers, anchor text markers etc. The 
data adapter captures data when the extraction criteria is satisfied based on the configuration parameters.  This acquired data may be stored in a database or in a file system.”

However, claim 1 requires “the ground truth…includes…at least one…pair”. In contrast, said Table 3’s “Ground truth” lists a single entry “SGE-28984” for row “C1” and similar for row “C2”.


Brunets et al. (US Patent App. Pub. No.: US 2019/0362452 A1) is pertinent as teaching “field-value pairs…can serve or function as the ground truth” via:
[0698] To ascertain the level of reliability of the data source provider for the system of record, a node health scorer of the node graph generation system can generate a trust score for the data source provider based on a comparison of the node field-value pairs and the object field-value pairs.  The node health scorer can identify a subset of node field-value pairs with confidence scores greater than a threshold score.  The threshold score can represent a score at which the value of the node field-value pair can be deemed ground truth for the purposes of comparison with the object field-value pair of the record objects.  With the identification of the ground truth subset, the node health scorer can identify a corresponding object field-value pair for each node field-value pair.  The corresponding object field-value pair can be the same field type as the node field-value pair for the same entity associated with both the record object for the object field-value pair and the node field-value pair.

“[0705] From the set of node profiles 600a-n maintained by the node profile manager 220, the node health scorer 215 can identify a subset of node field-value pairs each with a confidence score 2614 above a threshold score.  This subset of node field-value pairs from the node profiles 600a-n can serve or function as the ground truth for the field values attributed to the entities corresponding to the node profiles that include the respective field-value pairs of the subset.  This subset can also be referred to herein as `true subset` In some embodiments, the node health scorer 215 can iterate through the 
node field-value pairs of the node profiles 600a-n maintained by the data processing system 9300.  For each node field-value pair, the node health scorer 215 can identify the confidence score 2614.  The node health scorer 215 can then compare the identified confidence score 2614 of the node field-value pair with the threshold score.  If the confidence score 2614 of the node field-value pair is greater than or equal to the threshold score (the threshold is `satisfied`), the node health scorer 215 can include the node field-value pair into the true subset.  Otherwise, if the confidence score 2614 of the node field-value pair is less than the threshold score, the node health scorer 215 
can exclude the node field-value pair from the true subset.  For example, the node health scorer 215 can identify the true subset of node-field-value pairs from the node profiles 600a-n of the node graph 9035 with confidence scores greater than the confidence score of 7 (as depicted shaded and in bold in FIG. 28).  The true subset of node field-value pairs from the node profiles 600a-n can serve or function as the ground truth for the field values attributed to the entity.”

However Brunets does not teach the claimed “generating a training model based on…the 

IDS cited Becker (US 2018/0033147) is pertinent as teaching the claimed “anchor” via “located in the same place in all W-2 forms” via Becker:
“[0019] In cases where the information in a form conforms to a known template, it may be possible to configure software applications to locate fields in an image of a form based on the fields' locations in the template.  However, this approach is not effective if the template of the form is unknown.  Furthermore, if multiple templates are possible for a certain type of form, different program instructions may have to be hard-coded for each possible template.  Since templates for some forms (e.g., a 1040 tax form) periodically change and multiple templates are possible for other types of forms (e.g., birth certificates issued in different states), the limitations inherent in a purely 
tem plated approach are problematic.”; and 

“[0047] In the present example, the training image segments 502 shown in FIG. 5 
are examples of image segments of box 1 of a W-2 tax form.  However, training image segments 502 can include other types of image segments.  For example, the training image segments 502 also generally include other image segments of other fields in W-2 forms (or, if a more generalized model is sought, other types of forms).  Furthermore, some of the training image segments 502 may be box-1 image segments from W-2 forms that have different templates.  The machine-learning model may identify a correct classification for a box-1 image segment even if the box-1 image segment is not located in the same place in all W-2 forms.”

However, claim 1, including similar claims 11 and 16, requires “determining anchors … for…that type of form”. Thus it is possible that both “box-1” and box-2, being as two anchors, can be located in the same place in all W-2 forms as well. However, anchor box-2 that can be located in the same place in all W-2 forms is more directed to using hind-sight of applicant’s disclosure instead of Becker’s disclosure that teaches “some forms (e.g., a 1040 tax form) periodically change” such that “limitations inherent in a purely tem plated approach are problematic”, Becker, cited above. Thus use of multiple anchors is problematic in a purely template approach.
	

The examiner is going to assume the case or contingency that both anchor boxes 1 and 2 from “a known template” of the W-2 form are correctly classified via “The machine-learning model” and are in the same location in all W-2 forms in order to read on the claimed “determining anchors…for…that type of form” such any one anchor-box in fig. 3:308.
	Accordingly, Becker teaches the claimed:
generating a training model (via fig. 5:506: “Machine-Learning Model”) based on the determined features (via fig. 504: “Training Instances”) and labels (via fig. 5:502: “Training Image Segments” one of which is detailed in fig. 4:310) and on the , 












wherein the ground truth (via “verify…a correct classification” and “image segments… assigned verified classifications” such “as box 1 fields” and “training instances can include verified classifications”) , for each form (via fig. 3:308), at least one…key (via fig. 5:502: “Training Image Segments” that serve as a means of identification or “identifying”)-value (via fig. 5:504: “Training Instances” comprising a “quantified” “Features” value)…pair (via “subsets…of training data”, comprising a pair via improper hindsight, such as one of said fig. 5:502: “Training Image Segments” and said one of fig. 5:504: “Training Instances” or via said definition 12:a set of more than two) associated with that form (said via fig. 3:308),
wherein each key (said via fig. 5:502: “Training Image Segments” that serve as a means of identification or “identifying”)-value (said via fig. 5:504: “Training Instances” comprising a “quantified” “Features” value) pair (said via “subsets…of training data”, comprising a pair via improper hindsight, such as one of fig. 5:502: “Training Image Segments” that serve as a means of identification and one of fig. 5:504: “Training Instances” comprising “quantified” “Features” or via said definition 12:a set of more than two) includes a key (said via fig. 5:502: “Training Image Segments” that serve as a means of identification or “identifying”) and a value (said via fig. 5:504: “Training Instances” comprising a “quantified” “Features” value), 
wherein the value (said via fig. 5:504: “Training Instances” comprising a “quantified” “Features” value) includes a (quantified) data item that includes text (as shown in fig. 3:308) from the associated form (said via fig. 3:308),  


wherein the key (said via fig. 5:502: “Training Image Segments” that serve as a means of “identifying”) includes a (segmented) data item that is linked (via fig. 5:108: “Training Data” and said “subsets” thereof) to the value (said via fig. 5:504: “Training Instances” comprising a “quantified” “Features” value) in that key-value pair (said via “subsets…of training data”, comprising a pair, via improper hindsight, such as one of fig. 5:502: “Training Image Segments” that serve as a means of identification and one of fig. 5:504: “Training Instances” comprising “quantified” “Features” or via said definition 12:a set of more than two) as an (segmented) identifier of the value (said via fig. 5:504: “Training Instances” comprising a “quantified” “Features” value) in that key-value pair (said via “subsets…of training data”, comprising a pair, via improper hindsight, such as one of fig. 5:502: “Training Image Segments” that serve as a means of identification and one of fig. 5:504: “Training Instances” comprising “quantified” “Features” or via said definition 12:a set of more than two).
	









However, Beckers does not teach, as indicated in bold above, the claimed “an identifier of the value in that key-value pair”, as understood by one or skill in the art of attribute-value pairs or name-value pairs or key- value pairs or field-value pairs, and instead teaches:
1)	“labels identifying the type” as shown in said fig. 4: “Wages, tips, other…”;
2)	“identify points…at…brightness” values represented in fig. 6: 602 and 610: “Identify an image”; 
3)	“The machine-learning model may identify a correct classification” via fig. 5:508: “Output Classification” or fig. 6:614:“Assign a classification based on the features using one or more machine-learning models”;
4)	“identify a subset of textual characters…For example… digits and dashes and exclude letters” after being “classified” said via fig. 6:614: “Assign a classification based on the features using one or more machine-learning models” i.e., “identify a subset of textual characters based on the classification”; and
5)	OCR that identifies characters in the identified subset of textual characters.

via Becker:
[0014] Forms are often used to collect, register, or record certain types of information about an entity (e.g., a person or a business), a transaction (e.g., a sale), an event (e.g., a birth), a contract (e.g., a rental agreement), or some other matter of interest.  A form typically contains fields or sections for specific types of information associated with the subject matter of the form.  A field is typically associated with one or more labels 
identifying the type of information that should be found in the field.  For example, a W2 form contains a field with the label "employee's social security number" in which an employee's social security number is entered.  In another example, a death certificate typically contains at least one field that is associated with the label name (e.g., "first name" or "last name") in order to identify the deceased person to whom the certificate applies.  In another example, a paper receipt typically has a labeled field indicating a total amount due for a transaction for which the receipt was issued.

“[0020] Embodiments presented herein provide techniques to identify and classify fields and labels in digital images without using OCR and without a template.  In one embodiment, computer-vision image-segmentation techniques divide an image of a form in to image segments.  Features of a given image segment can be detected and quantified using computer-vision feature-detection methods.  The resulting features can be used to create an input instance provided to a machine-learning model.  The machine-learning model can classify the instance (and thus the image segment represented by the instance).”

“[0025] In some embodiments, an image segment may be classified as a field that contains a specific type of information.  This classification can be used to identify a subset of textual characters that may be depicted in the image segment.  For example, if an image segment that has been classified as a field for a social security number (e.g., "box a" of W-2 form), the subset of textual characters may include digits and dashes and exclude letters.  In some embodiments, once an image segment has been classified, it may be desirable to perform an OCR process to extract text depicted in the image segment.  The OCR process can be modified or constrained to presume that text in the image segment contains only characters in the subset of textual characters.  This may enable the OCR process to disambiguate extracted text more easily.  For example, if a region in an image segment can be interpreted as either "IB" or "18," and if the image segment has been classified as a field for a social security number, the OCR process can elect "18" as the extracted text for the region because 1 and 8 are included in the subset of textual characters for social-security-number fields (while "I" and "B" are not).” 

“[0027] The machine-learning model may be trained using training input instances 
comprising features extracted from image segments that have been assigned classifications that have verified as correct.  To verify that a classification for an image snippet is correct, the image snippet may be presented to a user on a display and the user may manually provide or verify a correct classification for the image snippet.”; 

“[0034] In computer vision, image segmentation generally refers to the process of partitioning a digital image into multiple segments, wherein a segment is a set of pixels.  Image segmentation is often used to locate objects and boundaries (e.g., lines and gaps.) in images.  Image segmentation methods often incorporate, for example, edge detection, corner or interest-point detection, or blob detection.  Edge detection generally refers to mathematical approaches to identify points in a digital image at which brightness changes sharply (e.g., has discontinuities).  Such points can be organized into curved line segments that are called edges.  Corner or interest-point detection generally refers to computer-vision approaches that are used to detect corners and 
interest points.  A corner can refer to an intersection of two edges or a point for which there are two dominant and different edge directions in a local neighborhood of the point.  An interest point can refer to a robustly detectable point with a well-defined position in an image (e.g., a corner, an isolated point of local intensity maximum or minimum, a line ending, or a point on a curve with locally maximal curvature).  Blob detection generally refers to detecting regions of an image that differ with respect to some property of interest (e.g., brightness or color) compared to surrounding regions.  If a property of interest is expressed as a function of position relative to an image, blob detection approaches can apply differential methods or focus local extrema to identify blobs.”





“[0042] FIG. 5 illustrates an example of training the segment classifier 206 to classify image segments without OCR.  As shown, the segment classifier 206 includes a machine-learning model 506 (e.g., a computer-implemented predictive model that can classify input data and can improve its prediction accuracy using training data without being explicitly reprogrammed).  Training Data 108 can include training image segments 502.  The training image segments 502 can include image segments that have been assigned verified classifications.  For example, the training image segments 502 can comprise image segments that have been classified as box 1 fields from images of W-2 tax forms.  Each of the training instances 504 can be a representation of a corresponding image segment that includes extracted features extracted from the corresponding image segment.  In addition, some, most, or all of the training instances can include verified classifications for the respective image segments they represent.  One common format that is used to input training data into machine learning models is the attribute-relation file format (ARFF).”; and

“[0045] Furthermore, individual machine learning models can be combined to form an ensemble machine-learning model.  An ensemble machine-learning model may be 
homogenous (i.e., using multiple member models of the same type) or non-homogenous (i.e., using multiple member models of different types).  Individual machine-learning models within an ensemble may all be trained using the same training data or may be trained using overlapping or non-overlapping subsets randomly selected from a larger set of training data.”

“[0047] In the present example, the training image segments 502 shown in FIG. 5 
are examples of image segments of box 1 of a W-2 tax form.  However, training image segments 502 can include other types of image segments.  For example, the training image segments 502 also generally include other image segments of other fields in W-2 forms (or, if a more generalized model is sought, other types of forms).  Furthermore, some of the training image segments 502 may be box-1 image segments from W-2 forms that have different templates.  The machine-learning model may identify a correct classification for a box-1 image segment even if the box-1 image segment is not located in the same place in all W-2 forms.”

“[0068] At block 710, in some examples, the processors identify a subset of textual characters based on the classification and performing an Optical Character Recognition (OCR) process on the image segment subject to a constraint that text extracted by the OCR process can only include textual characters found in the subset of textual characters.  The image segment can be preprocessed before performing the OCR process.  The preprocessing can include at least one of: spatial image filtering, point processing, contrast stretching, or thresholding.”





“[0073] As shown, storage 810 includes training data 108.  The training data 108 may include training image segments 502 and training instances 504.  A training instance is be a representation of a training image segment and includes features extracted therefrom.  A training instance can also include an accepted, known, or verified classification for the training image segment that the training instance represents.  The segment classifier uses some or all of the training data 108 to train or refine a machine-learning model to classify image segments.”

Thus, Becker does not use the segments as shown in fig. 3:310-318 to identify a quantified brightness value (such as black 0 or white 255 value as indicated in fig. 3:308) in each of the subset pairs of fig. 5:108: “Training Data”. 
Additionally, Becker’s fig. 5:502: “Training Image Segments” comprise the claimed “value” such as “53,657.00”; however, such a “value” is not recognized or identified at this point via the segmented subset-pair since Becker’s fig. 5:502: “Training Image Segments” is directed to “segments without using” identification of character values via Becker:
“[0008] FIG. 5 illustrates an example of training a segment classifier to classify image segments without using OCR, according to one embodiment.”
	
	Instead the value “53,657.00” is identified/recognized at a later point via fig. 6:622: “Extract text using OCR”.















Thus the claimed, in claim 1 (including claims 11 and 16), lines 16-19:

“generating a training model based on the determined features and labels and on the wherein the ground truth , for each form, at least one …key-value…pair associated with that form, wherein each key-value pair includes a key and a value, wherein the value includes a data item that includes text from the associated form, and wherein the key includes a data item that is linked to the value in that key-value pair as an identifier of the value in that key-value pair”

is not anticipated or rendered obvious to said one of ordinary skill in the art, such as in data-structures, before the claimed invention unless hind-sight of applicant’s disclosure from the future via a time-machine is used to re-create applicant’s invention in the past before applicant’s invention is created in the future.














Al-Hashim (Arabic Database for Automatic Printed Arabic Text Recognition Research and Benchmarking) is pertinent as teaching “ground-truth” or “groundtruth” via:
pages 5,6:
“First, it will remove from the AATR researchers and developers the burden of acquiring a suitable data for each task in the recognition process [Bipp95][Phil93a]. This is a genuine hindrance for many researchers in the AATR field [Khar99][Märg01]. Building a database with its ground-truth value for Arabic is more difficult, time consuming and error prone than building it for English [Märg01].”; and

pages 75,76:
“JSON will be used as a data representation for all the page-related and zonerelated records’ files except the zone truth-value record. The zone groundtruth value is stored directly, without any data representation, to the respective zone truth-value record’s file.

JSON will not be used as is when representing the records’ files; a simple rule must be followed. This rule states that no more that a single JSON object can exists in a record’s file. The rule also states that each name/value pair must exist on a separate line. The object curly brackets must also be on separate lines. Figure 9 shows the content of the page bounding box record’s file of a sample page image using JSON representation.”

	However, Al-Hashim teaches that the “ground-truth” is “without any” “name/value pair” or the claimed “key-value pair” of claim 1, line 20 as understood by one of ordinary skill in the art of data-structures. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENNIS ROSARIO whose telephone number is (571)272-7397.  The examiner can normally be reached on Monday-Friday, 9AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571)272-7778.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/DENNIS ROSARIO/
Examiner, Art Unit 2667                                                                                                                                                                                             
/MATTHEW C BELLA/Supervisory Patent Examiner, Art Unit 2667