Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Compact Prosecution
Examiner would like to suggest amending the independent claims 1,15 and 20 to include the limitations “determining a probability of occurrence for each of a plurality of association hypotheses using one or more neural layers of the neural network by utilizing a testing function,
 wherein the first association hypothesis may include an association of the first recalculated vector with the second recalculated vector and a highest probability is calculated and may refer to an association of a given symbol sequence with a particular field or with a particular table partition. These amendments will overcome the current rejection. 
Allowable Subject Matter
Claims 7-8,10-14 and 18-19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
    At the moment Examiner didn’t find prior art or combination of prior arts that provides motivation to teach 
wherein the method further comprising obtaining a plurality of symbol sequences of a subsequent document, the subsequent
document having a plurality of fields;
determining, based on at least some of the plurality of symbol sequences of the
subsequent document, that the subsequent document does not belong to any type of
documents of the plurality of identified types of documents; 
adding a new type of documents to the plurality of identified types of documents; and using the subsequent document to train a new neural network to be associated with the new type of documents.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-6, 9, 15-17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Corcoran et al. (10963692) in view of LI (US20200151591).
Claim 1, Corcoran discloses a method, comprising: obtaining a layout of a document, (Col. 3 lines 15-16- One skilled in the art will know that grouping the documents based on the layout structure of the document means the layout of the document is identified)  the document having a plurality of fields; (Col. 3 lines 23-25- thus extraction engine to extract all relevant fields on all documents with each group- this means the document has a plurality of fields) 
identifying the document, based on the layout, as belonging to a first type of documents of a plurality of identified types of documents; (Col. 3 lines 13-19 and lines 20-23 “group documents with similar layout together”)
 identifying a plurality of symbol sequences of the document; and processing, by a processing device, (Col. 3 lines 17-19 the extraction engine convert the document text (symbol sequence) into a structured format) the plurality of symbol sequences of the document using a neural network (Convolutional Neural Network 118 (CNN) in Col. 3 lines 35-38) associated with the first type of documents (Invoice 220 Col. 4 lines 33-35) to determine an association of a first field of the plurality of fields (Col. 4 lines 7-10- thus Tax 210, Total 211 and Balance Due 208 fields reads on the plurality of fields) with a first symbol sequence of the plurality of symbol sequences of the document. (Col. 4 lines 1-10- thus there is an association between label “invoice number” and the text of the invoice number “INV0002” as shown in Fig. 2B- the neural network is able identify the text “INV0002” as the invoice number and group/cluster the document based on the text). 
Corcoran does not disclose a plurality of neural network  and therefore does not address a first neural network. 
Li discloses a similar clustering system which clusters documents based on the text of the document in association with the fields of the document (Section 0092, lines 1-5- thus classify each token (text) into one of a pre-defined set of fields of the document class) using a first and second machine learning model (Neural Network) (Section 0092, lines 14-15- thus Machine learning models includes one or more other Machine learning model and also in Section 0130, lines 1-5- thus model 1,model 2 and model 3 shows that Li addresses having a first neural network). 
 Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching of employing a first , second and third neural network in the process of extracting/clustering text in document in association with the fields of the document. The motivation is that the system works better and faster because special machine learning models are specialized in specific areas of machine learning. 

Claim 2, Corcoran in view of Li  discloses wherein the layout comprises locations of at least some of the plurality of symbol sequences of the document. (Corcoran: Col. 6 lines 38-39 talks about grouping images/documents based on structural layout which one skilled in the art will understand that the location of the text are identified on the layout). 
Claim 3, Corcoran in view of Li discloses wherein identifying the document as belonging to the first type of documents comprises executing a clustering algorithm. (Corcoran: Col. 8 lines 21-25- thus Clustering algorithm based on vector distance) 
Claim 4, Corcoran in view of Li  discloses wherein executing the clustering algorithm comprises: determining a document value characterizing the layout of the document; (Corcoran: Col. 6 lines 65-67- thus layout clustering is executed using similarity metric- (how different layout scores falls into different groups)
comparing the document value to a plurality of cluster values, (Corcoran: Col. 6 lines 59-62- distance between documents in different layout groups – thus the distance is determined from comparing the values of two documents within a group) wherein each one of the plurality of cluster values corresponds to one of the plurality of identified types of documents; (Corcoran: Col. 6 lines 53-57- values calculated for the structural layout of a document by the cluster engine within a group of similar layouts of documents) 
and determining that the document value is a closest to a first cluster value of the plurality of cluster values, wherein the first cluster value corresponds to the first type of documents. (Corcoran: Col. 6 lines 57-62- thus the distance between documents with the same layout group is smaller this means the score is close to scores of the our documents- see Col. 6 lines 63 hardness score) 
Claim 5, Corcoran in view of Li discloses  wherein the first neural network is trained by obtaining a training document having a marked-up (Clean -up document see Col. 5 lines 54-57) association of a first field, (Corcoran: Col. 7 lines 10-15 training of CNN 118) wherein the marked-up association (Li: Section 0072, lines 4 a corrected prediction) is with a first symbol sequence of the training document; (Corcoran: Col. 3 lines 37-40- thus the anchor image reads on the training document which is used an image with the learned embeddings) obtaining, using the first neural network, a predicted association of the first field in the training document; (Corcoran: Col. 3 lines 45-50- thus the inference phase used for classification based on the anchor image reads on the prediction association). 
and determining, based on a comparison of the predicted association of the first field in the training document to the first symbol sequence of the training document (Li: Section 0071, lines 1-6 prediction of a second document) whether parameters of the first neural network are to be modified.(Corcoran: Col. 8 lines 5-10- thus the Loss function modify the distance (difference due to the comparison) between similar images (e.g the anchor image and a similar image)). 
Claim 6, Corcoran in view of Li discloses that the method further comprising determining that the association between the first symbol sequence and the first field of the document is incorrect; (Corcoran: Col. 8 lines 5-8 maximize L2 distance between dissimilar images- these are images where there is no association between the embeddings from the document and embeddings of database- understand such documents also have dissimilar relations between keys (text)  and values (fields) within its documents- see Col. 6 lines 49-57)  receiving a marked-up version of the document, (Corcoran: Col. 5 lines 53-57 – thus the clean-up document reads on the marked up version of the document) wherein the marked-up version of the document comprises a corrected association of the first field and  using the marked-up version of the document as a training input into the first neural network. (Corcoran: A version of the cleaned-up document (marked up) is used in training the deep convolutional neural network as discussed in Col. 7 lines 11-15)
Claim 7, No Prior art Rejection
Claim 8, - No Prior Art Rejection because of its dependence from Claim 7. 
Claim 9, Corcoran in view of Li discloses wherein identifying the plurality of symbol sequences of the document comprises Atty Docket No. 2018-IN-0361 L0214 72performing optical character recognition (OCR) of an image of the document to obtain an OCR text of the document  (Corcoran: Col. 5 lines 48-53- thus the OCR converts text represented visually in a document Image into a textually encoded representation) and partitioning the OCR text into the plurality of symbol sequences. (Li: Section 0082, lines 8-9- bounding rectangles of characters)
Claim 10, No Prior Art Rejection
Claim 11, No Prior Art Rejection because of its dependence from Claim 10. 
Claim 12, No Prior Art Rejection because of its dependence from Claim 10. 
Claim 13, No Prior Art Rejection because of its dependence from Claim 12. 
Claim 14, No Prior Art Rejection because of its dependence from Claim 10. 

Claim 15, Corcoran discloses a non-transitory machine-readable storage medium including instructions that, when accessed by a processing device, cause the processing device to: obtain a layout of a document, (Col. 3 lines 15-16- One skilled in the art will know that grouping the documents based on the layout structure of the document means the layout of the document is identified)  the document having a plurality of fields; (Col. 3 lines 23-25- thus extraction engine to extract all relevant fields on all documents with each group- this means the document has a plurality of fields) 
identify the document, based on the layout, as belonging to a first type of documents of a plurality of identified types of documents; (Col. 3 lines 13-19 and lines 20-23 “group documents with similar layout together”)
identify a plurality of symbol sequences of the document and process the plurality of symbol sequences of the document (Col. 3 lines 17-19 the extraction engine convert the document text (symbol sequence) into a structured format) using a neural network (Convolutional Neural Network 118 (CNN) in Col. 3 lines 35-38) associated with the first type of documents (Invoice 220 Col. 4 lines 33-35) to determine an association of a first field of the plurality of fields (Col. 4 lines 7-10- thus Tax 210, Total 211 and Balance Due 208 fields reads on the plurality of fields) with a first symbol sequence of the plurality of symbol sequences of the document. (Col. 4 lines 1-10- thus there is an association between label “invoice number” and the text of the invoice number “INV0002” as shown in Fig. 2B- the neural network is able identify the text “INV0002” as the invoice number and group/cluster the document based on the text). 
Corcoran does not disclose a plurality of neural network  and therefore does not address a first neural network. 
Li discloses a similar clustering system which clusters documents based on the text of the document in association with the fields of the document (Section 0092, lines 1-5- thus classify each token (text) into one of a pre-defined set of fields of the document class) using a first and second machine learning model (Neural Network) (Section 0092, lines 14-15- thus Machine learning models includes one or more other Machine learning model and also in Section 0130, lines 1-5- thus model 1,model 2 and model 3 shows that Li addresses having a first neural network). 
 Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching of employing a first , second and third neural network in the process of extracting/clustering text in document in association with the fields of the document. The motivation is that the system works better and faster because special machine learning models are specialized in specific areas of machine learning. 

Claim 16, Corcoran in view of Li discloses wherein to identify the document as belonging to the first type of documents, the instructions are to cause the processing device to execute a clustering algorithm. (Corcoran: Col. 8 lines 21-25- thus Clustering algorithm based on vector distance)
Claim 17, Corcoran in view of Li discloses wherein to cause the processing device to execute a clustering algorithm the instructions are to cause the processing device to: determine a document value characterizing the layout of the document; (Corcoran: Col. 6 lines 65-67- thus layout clustering is executed using similarity metric- (how different layout scores falls into different groups)
compare the document value to a plurality of cluster values, (Corcoran: Col. 6 lines 59-62- distance between documents in different layout groups – thus the distance is determined from comparing the values of two documents within a group) wherein each one of the plurality of cluster values corresponds to one of the plurality of identified types of documents; (Corcoran: Col. 6 lines 53-57- values calculated for the structural layout of a document by the cluster engine within a group of similar layouts of documents) 
 determine that the document value is a closest to a first cluster value of the plurality of cluster values, wherein the first cluster value corresponds to the type of documents. (Corcoran: Col. 6 lines 57-62- thus the distance between documents with the same layout group is smaller this means the score is close to scores of the our documents- see Col. 6 lines 63 hardness score) 

Claim 18, No Prior Art Rejection Please see item 3 for details. 
Claim 19, No Prior Art Rejection because of its dependence from Claim 18. 
Claim 20, Corcoran discloses a system comprising a memory and a processing device operatively coupled to the memory, (fig. 9 shows Central processing unit 902 coupled to memory 906- see Col. 8 lines 60-65) the processing device to obtain a layout of a document, (Col. 3 lines 15-16- One skilled in the art will know that grouping the documents based on the layout structure of the document means the layout of the document is identified)  the document having a plurality of fields; Col. 3 lines 23-25- thus extraction engine to extract all relevant fields on all documents with each group- this means the document has a plurality of fields) 
 identify the document, based on the layout, as belonging to a first type of documents of a plurality of identified types of documents; (Col. 3 lines 13-19 and lines 20-23 “group documents with similar layout together”)
identify a plurality of symbol sequences of the document; and process, by a processing device, (Col. 3 lines 17-19 the extraction engine convert the document text (symbol sequence) into a structured format) the plurality of symbol sequences of the document using a neural network (Convolutional Neural Network 118 (CNN) in Col. 3 lines 35-38) associated with the first type of documents (Invoice 220 Col. 4 lines 33-35) to determine an association of a first field of the plurality of fields  (Col. 4 lines 7-10- thus Tax 210, Total 211 and Balance Due 208 fields reads on the plurality of fields)  with a first symbol sequence of the plurality of symbol sequences of the document. (Col. 4 lines 1-10- thus there is an association between label “invoice number” and the text of the invoice number “INV0002” as shown in Fig. 2B- the neural network is able identify the text “INV0002” as the invoice number and group/cluster the document based on the text). 
Corcoran does not disclose a plurality of neural network  and therefore does not address a first neural network. 
Li discloses a similar clustering system which clusters documents based on the text of the document in association with the fields of the document (Section 0092, lines 1-5- thus classify each token (text) into one of a pre-defined set of fields of the document class) using a first and second machine learning model (Neural Network) (Section 0092, lines 14-15- thus Machine learning models includes one or more other Machine learning model and also in Section 0130, lines 1-5- thus model 1,model 2 and model 3 shows that Li addresses having a first neural network). 
 Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching of employing a first , second and third neural network in the process of extracting/clustering text in document in association with the fields of the document. The motivation is that the system works better and faster because special machine learning models are specialized in specific areas of machine learning. 

	Cited Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Tsatsin et al. (US20170357896) discloses a method of training a neural network to create an embedding space including a catalog of documents, the method including providing a plurality of training sets of K+2 training documents to a computer system, each training document being represented by a corresponding training vector x, each set of training docu­ments including a target document represented by a vector, a favored document represented by a vector X5, and K> 1 unfavored documents represented respectively by vectors.
Guggilla et al. (US20200073882) discloses a document categorizer  that is executed by the at least one hardware processor (e.g., the hardware processor, and/or the hardware processor  may categorize each annotated document of the annotated document set into a respective category of a plurality of categories and word2vec process may generate a numerical weighted representation for each entity-relation by learning contexts based on the assumption that words that occur together tend to be similar semantically.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Akwasi M Sarpong whose telephone number is (571)270-3438. The examiner can normally be reached Mon-Fri. 8:00am-4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KING D POON can be reached on 571-272-7440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/AKWASI M SARPONG/Primary  Examiner, Art Unit 2675                                                                                                                                                                                                        05/19/2022