DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-8 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the claim(s) as a whole is/are drawn to a computer readable medium (CRM) which, under broadest reasonable interpretation (BRI), covers a signal per se unless otherwise defined in the application to exclude ineligible signal embodiments. When looking to the Specification, paragraphs [00112] and [00114] broadly define CRM. While paragraph [00115] states that computer storage media excludes signals per se, paragraph [00114] indicates that CRM, which is the term used in the claims, may comprise computer storage media and communication media. Communication media is then defined in paragraph [00116] to explicitly be a carrier wave. Therefore, the claim(s) as a whole, given BRI and in light of the Specification, has/have embodiments that are drawn to a signal per se and is/are ineligible under 35 U.S.C. 101.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-8, 16, and 17 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims of U.S. Patent No. 11,195,008. Although the claims at issue are not identical, they are not patentably distinct from each other because of the following.

Regarding claim 1, claim 1 of U.S. Patent No. 11,195,008 teaches the same limitations as underlined below:

Claim 1 of the present application
Claim 1 of U.S. Patent No. 11,195,008
One or more computer-readable media [1] having computer-executable instructions embodied thereon that, when executed by one or more processors, cause the one or more processors to perform operations comprising: 

receiving an electronic document comprising text;

identifying predefined text in the text of the electronic document; 



modifying a first portion of the predefined text from the electronic document and leaving a remaining portion of the text unmodified, the modifying incudes deleting the predefined text [2]; 

generating a representation for the electronic document based on the modified first portion of the predefined text and the remaining portion of text; 
and storing the representation of the electronic document in a data store [3].
One or more computer storage media [1] having computer-executable instructions embodied thereon that, when executed by one or more processors, cause the one or more processors to perform operations comprising: 

receiving an electronic document comprising text; 

identifying predefined text in the text of the electronic document; 

in response to the identifying of the pre-defined text, 
modifying a first portion of the predefined text from the electronic document while leaving a remaining portion of the text unmodified by replacing the predefined text with standardized text; 

generating a representation for the electronic document based on the modified first portion of the predefined text and the remaining portion of text; and storing the representation of the electronic document in a database.


Regarding difference [1], a computer readable storage media is a type of computer readable media. Regarding difference [2], replacing text as required in claim 1 of U.S. Patent No. 11,195,008 would require a deletion of the text as recited in claim 1 of the present application. Regarding difference [3], a database is a type of data store. Therefore, claim 1 of the present application is anticipated by claim 1 of U.S. Patent No. 11,195,008.

Aside from minor wording differences, claims 2-7 are identical to claims 2-7 of U.S. Patent No. 11,195,008.

In addition, claim 16 is a corresponding method claim to claim 1 and is similarly anticipated by claim 1 of U.S. Patent No. 11,195,008. 

Regarding claim 17, the claim specifies that the modifying compromises removing the first portion and adding a standardized text which is also present in claim 1 of U.S. Patent No. 11,195,008.


Regarding claim 8, claim 6 (claims 2, 4 and 6) of U.S. Patent No. 11,195,008 teaches the same limitations as underlined below:

Claim 8 of the present application
Claim 6 of U.S. Patent No. 11,195,008
The one or more computer-readable media of claim 1, wherein the operations further comprising: 













generating a target vector from a second portion of text associated with a target electronic document; and Page 3 of 9 4886-4055-3743 viApplication No. 17/543,315Attorney Docket No. 41233.379740Preliminary Amendment Filed: 02/15/2022


determining a measure of similarity between the target vector and a reference vector associated with the representation of the electronic document 





based at least in part on comparing a first function of a first norm of the target vector with a second function of second norm of the reference vector.
The one or more computer storage media of claim 1, wherein the electronic document representation is associated with one or more vectors and is grouped with a set of reference vectors.

The one or more computer storage media of claim 2, wherein the set of reference vectors are associated with one or more data extraction models that are used to extract data from a target electronic document.

The one or more computer storage media of claim 3, further comprising: generating a target vector from a portion of text associated with a target electronic document; and 

determining a measure of similarity between the target vector and a reference vector.

The one or more computer storage media of claim 4, wherein a cosine similarity function is used to determine a measure of similarity, and a denominator of a cosine similarity function is determined 
based on comparing a function of the norm of the target vector with a function of the norm of the reference vector.


	As shown above claim 6 of U.S. Patent No. 11,195,008 anticipated claim 8 of the present application.

Claims 9-15 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims of U.S. Patent No. 11,195,008 in view of Yellapragada et al, U.S. Publication No. 2018/0032804 and van Rotterdam et al, U.S. Publication No. 2018/0068183. 

Regarding claim 9, claim 8 of U.S. Patent No. 11,195,008 teaches the same limitations as underlined below

Claim 9 of the present application
Claim 8 of U.S. Patent No. 11,195,008
A system comprising: one or more processors; and one or more computer-readable media having computer-executable instructions embodied thereon that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: 

receiving an electronic document comprising text; 










modifying a first portion of the text from the electronic document and leaving a remaining portion of the text unmodified; 

generating a representation for the electronic document based on the modified first portion of the text and the remaining portion of text; 


A system comprising: one or more processors; and one or more computer-readable media having computer-executable instructions embodied thereon that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: 

receiving a target electronic document comprising text; 

generating a target document representation based on at least a portion of the text of a target electronic document; 

determining a measure of similarity between the target document representation and a reference document representation having been generated from a reference electronic document by modifying a first portion of the text from the reference electronic document while leaving a remaining portion of the text unmodified, the modifying includes modifying a predefined type of text and adding a standardized text;

based on the measure of similarity, selecting the reference document representation, the reference document representation having been generated from the reference electronic document and associated with an extraction model; and extracting data form the target electronic document based on the extraction model associated with the reference document representation.


	Claim 8 of U.S. Patent No. 11,195,008 does not expressively teach 

storing the representation of the electronic document in a data store;

generating a target vector from a second portion of text associated with a target electronic document; and 

determining a measure of similarity between the target vector and a reference vector associated with the representation of the electronic document based at least in part on comparing a first function of a first norm of the target vector with a second function of second norm of the reference vector.

However, Yellapragada in a similar invention in the same field of endeavor teaches one or more processors; and one or more computer-readable media having computer-executable instructions embodied thereon that, when executed by the one or more processors, cause the one or more processors to perform operations (see Yellapragada claim 7) comprising: receiving an electronic document (see Figure 3, document extractor 306 and paragraph [0040]) comprising text (see Figure 2 and paragraph [0038]); modifying a first portion of the text from the electronic document and leaving a remaining portion of the text unmodified (see paragraph [0041] which indicates that text 224 of section 226 of Figure 2 is removed while text label 222 is left); generating a representation for the electronic document based on the modified first portion of the text and the remaining portion of text (see Figure 3, hashing function 310 and paragraph [0041]); as taught in claim 8 of U.S. Patent No. 11,195,008 and further teaches 

storing the representation of the electronic document in a data store (see Figure 3, template database 314 and paragraph [0042]). 

One of ordinary skill in the art before the effective filing date of the invention would have found it obvious to combine the teaching of storing the representation as taught in Yellapragada with the system taught in claim 8 of U.S. Patent No. 11,195,008, the motivation being to allow a user to access the representation at a latter time. 

Claim 8 of U.S. Patent No. 11,195,008 in view of Yellapragada does not expressively teach 

enerating a target vector from a second portion of text associated with a target electronic document; and 
determining a measure of similarity between the target vector and a reference vector associated with the representation of the electronic document based at least in part on comparing a first function of a first norm of the target vector with a second function of second norm of the reference vector.

However, van Rotterdam in a similar invention in the same field of endeavor teaches a method involving comparing a representation of an electronic document comprising text and (see van Rotterdam Abstract) as taught in claim 8 of U.S. Patent No. 11,195,008 in view of Yellapragada comprising 

generating a target vector from a second portion of text associated with a target electronic document (see Figure 2, step 214 and paragraph [0034]); and 

determining a measure of similarity between the target vector and a reference vector associated with the representation of the electronic document (see Figure 2, steps 206 and 216 and paragraph [0028]) based at least in part on comparing a first function of a first norm of the target vector with a second function of second norm of the reference vector (see paragraph [0049]. The final sentence says the similarity can be based off orientation and magnitude of the vectors, which would therefore be a function of the norms of the vectors).

One of ordinary skill in the art before the effective filing date of the invention would have found it obvious as a matter of simple substitution to replace the comparing of hashes as taught in claim 8 of U.S. Patent No. 11,195,008 in view of Yellapragada with using norms of vectors of electronic documents for comparison as taught in van Rotterdam to yield the predictable results of successfully comparing and finding similar documents. 

Regarding claim 10, claim 8 of U.S. Patent No. 11,195,008 in view of Yellapragada and van Rotterdam teaches all the limitations of claim 9, and further teaches wherein the modifying includes deleting the first portion of the text (see Yellapragada paragraph [0041] which indicates that text 224 of section 226 of Figure 2 is removed while text label 222 is left).

Regarding claim 11, claim 8 of U.S. Patent No. 11,195,008 in view of Yellapragada and van Rotterdam teaches all the limitations of claim 9, and further teaches wherein the modified text comprises a predefined type of text and the modifying includes adding a standardized text (this is in the body of claim 8).

Regarding claim 12, claim 8 of U.S. Patent No. 11,195,008 in view of Yellapragada and van Rotterdam teaches all the limitations of claim 10, and further teaches wherein generating the target vector comprises generating one or more vectors based on the modified text (see van Rotterdam paragraph [0034] as combined with Yellapragada Figure 5, step 505 and paragraph [0041]).

Regarding claim 13, claim 11 of U.S. Patent No. 11,195,008 when taken in combination with claim 8 of U.S. Patent No. 11,195,008 in view of Yellapragada and van Rotterdam teaches the same limitations.

Regarding claim 14, claim 8 of U.S. Patent No. 11,195,008 in view of Yellapragada and van Rotterdam teaches all the limitations of claim 9, but does not expressively teach wherein a cosine similarity function is used to determine the measure of similarity.

However, van Rotterdam goes on to teach that cosine similarity can be used along and that various methods of similarity determination can be combined (see van Rotterdam paragraph [0049]). Therefore, one of ordinary skill in the art before the effective filing date of the invention would have found it obvious to combine the teaching of using a cosine similarity as taught in separate embodiment of van Rotterdam with using and combining similarity methods involving the magnitude and orientation of the vectors as further taught in van Rotterdam, the motivation being to make the similarity measure more robust.  

Regarding claim 15, claim 13 of U.S. Patent No. 11,195,008 when taken in combination with claim 8 of U.S. Patent No. 11,195,008 in view of Yellapragada and van Rotterdam teaches the same limitations.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 6, 7, and 18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 6 recites the limitation "the stored vector".  There is insufficient antecedent basis for this limitation in the claim.

Claim 7 recites the limitation "the same entity".  There is insufficient antecedent basis for this limitation in the claim.

Claim 18 recites the limitation "the same source entity".  There is insufficient antecedent basis for this limitation in the claim.


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1, 7, and 16 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Yellapragada et al, U.S. Publication No. 2018/0032804.

Regarding claim 7, Yellapragada teaches all the limitations of claim 1, and further teaches determining that a target document representation (see Yellapragada paragraph [0043]) is associated with the same entity as the stored representation of the electronic document (see Yellapragada Figure 3, template matching component 316 and paragraph [0044]).

Regarding claim 16, Yellapragada teaches a computer-implemented method comprising: 

receiving an electronic document (see Figure 3, document extractor 306 and paragraph [0040]) comprising text (see Figure 2 and paragraph [0038]);

identifying predefined text in the text of the electronic document (see paragraph [0042] referring to labels 222 of Figure 2 identifying section 226); 

modifying a first portion of the predefined text from the electronic document and leaving a remaining portion of the text unmodified (see paragraph [0041] which indicates that text 224 of section 226 of Figure 2 is removed while text label 222 is left);

generating a representation for the electronic document based on the modified first portion of the predefined text and the remaining portion of text (see Figure 3, hashing function 310 and paragraph [0041]); and
storing the representation of the electronic document in data store (see Figure 3, template database 314 and paragraph [0042]).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-4, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Yellapragada et al, U.S. Publication No. 2018/0032804 in view of Mukhopadhyay et al, U.S. Publication No. 2020/0074169.

Regarding claim 2, Yellapragada teaches all the limitations of claim 1, and further teaches wherein the electronic document representation is associated with one or more hashes (see Yellapragada Figure 2, hashing function 310 and paragraph [0041]) and is grouped with a set of reference hashes (see Yellapragada Figure 2, template database 314).

Yellapragada does not expressively teach wherein the electronic document representation is associated with one or more reference vectors and is grouped with a set of reference vectors.

However, Mukhopadhyay in a similar invention in the same field of endeavor teaches one or more computer-readable media having computer-executable instructions embodied thereon that, when executed by one or more processors, cause the one or more processors to perform operations (see Mukhopadhyay paragraph [0015]) and a database containing a representation for an electronic document (see Figure 2, known forms repository 208 and paragraph [0047]) as taught in Yellapragada wherein 

the electronic document representation is associated with one or more reference vectors (see paragraph [0047] referring to fingerprints and paragraph [0032] which indicates a fingerprint is a multi-dimensional feature vector) and is grouped with a set of reference vectors (see Figure 2, known forms repository 208).

One of ordinary skill in the art before the effective filing date of the invention would have found it obvious as a matter of simple substitution to replace the hashes representing electronic documents of Yellapragada with the reference vectors of Mukhopadhyay to yield the predictable results of successfully storing numerical representations of the electronic documents.

Regarding claim 3, Yellapragada in view of Mukhopadhyay teaches all the limitations of claim 2, and further teaches wherein the set of reference vectors are associated with one or more data extraction models that are used to extract data from a target electronic document (see Yellapragada Figure 3, document extractor 318 and paragraph [0045] as combined with Mukhopadhyay paragraph [0032]).

Regarding claim 4, Yellapragada in view of Mukhopadhyay teaches all the limitations of claim 3, and further teaches 

generating a target vector (see Yellapragada Figure 3, hashing function 310 and paragraph [0043] as combined with Mukhopadhyay paragraph [0032]) from a portion of text associated with a target electronic document (see Yellapragada paragraph [0041] as applied to document 210 and paragraph [0043]); and 

determining a measure of similarity between the target vector and a reference vector (see Yellapragada Figure 3, template matching component 316 and paragraph [0045] as combined with Mukhopadhyay paragraph [0048]).

Regarding claim 19, Yellapragada teaches all the limitations of claim 16, but does not expressively teach an extraction model utilizes heuristics to determine a location of text in the electronic document.

However, Mukhopadhyay in a similar invention in the same field of endeavor teaches a method involving a representation for an electronic document (see Mukhopadhyay Figure 2, known forms repository 208 and paragraph [0047]) wherein 

an extraction model utilizes heuristics to determine a location of text in the electronic document (see paragraph [0061]).

One of ordinary skill in the art before the effective filing date of the invention would have found it obvious to combine the teaching of using an extraction model to determine the location of text as taught in Mukhopadhyay with the method taught in Yellapragada, the motivation being to add in extra similarity measures to the system thereby making it easier to extract relevant text from the documents. 

Regarding claim 20, Yellapragada in view of Mukhopadhyay teaches all the limitations of claim 19, and further teaches wherein the known location of text is based on at least one of absolute coordinates, relative coordinates (see Mukhopadhyay paragraph [0061]), or surrounding text.

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Yellapragada et al, U.S. Publication No. 2018/0032804 in view of Mukhopadhyay et al, U.S. Publication No. 2020/0074169 and Tsukasa, U.S. Patent No. 6,456,738.

Regarding claim 5, Yellapragada in view of Mukhopadhyay teaches all the limitations of claim 4, but does not expressively teach updating the representation based on receiving a confirmation that the target vector corresponds to the reference vector.

However, Tsukasa in a similar invention in the same field of endeavor teaches a method (see Tsukasa Figure 1) comprising receiving an electronic document comprising text (see Figure 2, sample images 110 and Figure 6. The electronic nature is implied by the system of Figure 1), storing a representation of the electronic document (see Figure 2, model generation unit 111 and model storage unit 112), receiving a target electronic document (see Figure 2, input document 101), creating a target representation of the target electronic document (see Figure 2, layout characteristic extraction unit 103), and determining a measure of similarity between the target representation and a reference representation (see Figure 2, model selection unit 104 and column 4, lines 17-26) as taught in Yellapragada in view of Mukhopadhyay further comprising 

updating the representation based on receiving a confirmation that the target representation corresponds to the reference representation (see Figure 2, units 106-108 and model updating unit 114 along with column 4, lines 17-26 and lines 53-58).

One of ordinary skill in the art before the effective filing date of the invention would have found it obvious to combine the teaching of updating a representation based on matching a reference to a target as taught in Tsukasa with the system Yellapragada in view of Mukhopadhyay, the motivation being to eliminate noise in the extraction process by making the system adaptive (see Tsukasa column 1, lines 52-57).

Claims 8-10 and 12-14 are rejected under 35 U.S.C. 103 as being unpatentable over Yellapragada et al, U.S. Publication No. 2018/0032804 in view of van Rotterdam et al, U.S. Publication No. 2018/0068183.

Regarding claim 8, Yellapragada teaches all the limitations of claim 1, and further teaches 

comparing a feature of the representation of the electronic document to a representation of a target electronic document (see Yellapragada Figure 5, steps 505-510). 

Yellapragada does not expressively teach 

generating a target vector from a second portion of text associated with a target electronic document; and 

determining a measure of similarity between the target vector and a reference vector associated with the representation of the electronic document based at least in part on comparing a first function of a first norm of the target vector with a second function of second norm of the reference vector.

However, van Rotterdam in a similar invention in the same field of endeavor teaches a method involving comparing a representation of an electronic document comprising text and (see van Rotterdam Abstract) as taught in Yellapragada comprising 

generating a target vector from a second portion of text associated with a target electronic document (see Figure 2, step 214 and paragraph [0034]); and 

determining a measure of similarity between the target vector and a reference vector associated with the representation of the electronic document (see Figure 2, steps 206 and 216 and paragraph [0028]) based at least in part on comparing a first function of a first norm of the target vector with a second function of second norm of the reference vector (see paragraph [0049]. The final sentence says the similarity can be based off orientation and magnitude of the vectors, which would therefore be a function of the norms of the vectors).

One of ordinary skill in the art before the effective filing date of the invention would have found it obvious as a matter of simple substitution to replace the comparing of hashes as taught in Yellapragada with using norms of vectors of electronic documents for comparison as taught in van Rotterdam to yield the predictable results of successfully comparing and finding similar documents. 

Regarding claim 9, Yellapragada teaches a system comprising: 

one or more processors; and one or more computer-readable media having computer-executable instructions embodied thereon that, when executed by the one or more processors, cause the one or more processors to perform operations (see Yellapragada claim 7) comprising: 

receiving an electronic document (see Figure 3, document extractor 306 and paragraph [0040]) comprising text (see Figure 2 and paragraph [0038]);

modifying a first portion of the text from the electronic document and leaving a remaining portion of the text unmodified (see paragraph [0041] which indicates that text 224 of section 226 of Figure 2 is removed while text label 222 is left); 

generating a representation for the electronic document based on the modified first portion of the text and the remaining portion of text (see Figure 3, hashing function 310 and paragraph [0041]); 

storing the representation of the electronic document in a data store (see Figure 3, template database 314 and paragraph [0042]); and

comparing a feature of the representation of the electronic document to a representation of a target electronic document (see Figure 5, steps 505-510). 

Yellapragada does not expressively teach 

generating a target vector from a second portion of text associated with a target electronic document; and 

determining a measure of similarity between the target vector and a reference vector associated with the representation of the electronic document based at least in part on comparing a first function of a first norm of the target vector with a second function of second norm of the reference vector.

However, van Rotterdam in a similar invention in the same field of endeavor teaches a method involving comparing a representation of an electronic document comprising text and (see van Rotterdam Abstract) as taught in Yellapragada comprising 

generating a target vector from a second portion of text associated with a target electronic document (see Figure 2, step 214 and paragraph [0034]); and 

determining a measure of similarity between the target vector and a reference vector associated with the representation of the electronic document (see Figure 2, steps 206 and 216 and paragraph [0028]) based at least in part on comparing a first function of a first norm of the target vector with a second function of second norm of the reference vector (see paragraph [0049]. The final sentence says the similarity can be based off orientation and magnitude of the vectors, which would therefore be a function of the norms of the vectors).

One of ordinary skill in the art before the effective filing date of the invention would have found it obvious as a matter of simple substitution to replace the comparing of hashes as taught in Yellapragada with using norms of vectors of electronic documents for comparison as taught in van Rotterdam to yield the predictable results of successfully comparing and finding similar documents. 

Regarding claim 10, Yellapragada in view of van Rotterdam teaches all the limitations of claim 9, and further teaches wherein the modifying includes deleting the first portion of the text (see Yellapragada paragraph [0041] which indicates that text 224 of section 226 of Figure 2 is removed while text label 222 is left).

Regarding claim 12, Yellapragada in view of van Rotterdam teaches all the limitations of claim 10, and further teaches wherein generating the target vector comprises generating one or more vectors based on the modified text (see van Rotterdam paragraph [0034] as combined with Yellapragada Figure 5, step 505 and paragraph [0041]).

Regarding claim 14, Yellapragada in view of van Rotterdam teaches all the limitations of claim 9, but does not expressively teach wherein a cosine similarity function is used to determine the measure of similarity.

However, van Rotterdam goes on to teach that cosine similarity can be used along and that various methods of similarity determination can be combined (see van Rotterdam paragraph [0049]). Therefore, one of ordinary skill in the art before the effective filing date of the invention would have found it obvious to combine the teaching of using a cosine similarity as taught in separate embodiment of van Rotterdam with using and combining similarity methods involving the magnitude and orientation of the vectors as further taught in van Rotterdam, the motivation being to make the similarity measure more robust.  

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Yellapragada et al, U.S. Publication No. 2018/0032804 in view of van Rotterdam et al, U.S. Publication No. 2018/0068183 and Mukhopadhyay et al, U.S. Publication No. 2020/0074169.

Regarding claim 13, Yellapragada in view of van Rotterdam teaches all the limitations of claim 9, but does not expressively teach wherein the determining of the measure of similarity is based on using an extraction model that utilizes heuristics to determine a location of text in the electronic document.

However, Mukhopadhyay in a similar invention in the same field of endeavor teaches a method involving a representation for an electronic document (see Mukhopadhyay Figure 2, known forms repository 208 and paragraph [0047]) and determining a measure of similarity (see paragraph [0061]) wherein 
wherein the determining of the measure of similarity is based on an extraction model that utilizes heuristics to determine a location of text in the electronic document (see paragraph [0061]).

One of ordinary skill in the art before the effective filing date of the invention would have found it obvious to combine the teaching of using an extraction model to determine the location of text as taught in Mukhopadhyay with the method taught in Yellapragada in view of van Rotterdam, the motivation being to add in extra similarity measures to the system thereby making it easier to extract relevant text from the documents. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CASEY L KRETZER whose telephone number is (571)272-5639. The examiner can normally be reached M-F 10:00-7:00 PM PDT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DAVID C PAYNE can be reached on (571)272-3024. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CASEY L KRETZER/Examiner, Art Unit 2637