DETAILED ACTION
This action is responsive to the Amendment filed on 07/05/2022. Claims 1-7, 9-12, and 14-20 are pending in the case. Claims 8 and 13 are canceled. Claims 1, 11 and 18 are the independent claims.
This office action is FINAL.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Applicant’s Response
In Applicant’s response dated 07/05/2022 (hereinafter Response), Applicant amended Claims 1, 11, and 18; cancelled Claims 3 and 13; and argued against all objections and rejections previously set forth in the Office Action dated 04/19/2022.
After careful review of Applicant’s Response, a number of deficiencies were identified. Examiner reached out to Applicant’s representative to discuss the deficiencies and solicit a supplemental amendment to address them (see attached PTO-413). Initially, Applicant’s representative agreed that a supplemental response would be appropriate, however Applicant decided that they would not file a supplemental unless Examiner would state on the record that the claims would be allowable. 
While the Response fails to conform to all the requirements of with 37 CFR 1.121 (notably, not all changes to claim 18 are shown with correct markup), the amendment is being treated as a bona fide attempt to advance the application to final action to avoid abandonment.
Applicant’s amendment to claims 1, 11, and 18 to further clarify the metes and bounds of the invention are acknowledged. It is noted that Applicant provided no citations of support for the amendments and only argued that the claims are allowable over the art of record because the amendments were made based on statements made by Examiner during a telephonic interview (see Examiner’s interview summary mailed 06/22/2022 which is much more detailed than Applicant’s interview summary included in the response). Note that during the interview, only claims 1, 3, and 8 were discussed; Examiner could not have made any statements regarding the amendments made to independent claims 11 and 18 which are with respect to different aspects of the invention.
As Applicant made no explicit argument against the references of record, the previous rejections are maintained, restated where necessary in response to Applicant’s amendment. Further, new grounds of rejection are applicable to dependent claims in response to Applicant’s amendment to the independent claims.
Claim Objections
The claims are objected to for using different acronyms for the same concept (see claims 1, 11, 18: term frequency-inverse document frequency (TF-IDF) versus dependent claims 3, 17: term frequency-inverse document frequency (TFIDF) and dependent claim 4: TFIDF.
Claim Rejections – 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.



The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 11-12, 14-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Regarding claim 11, the claim recites … calculating a term frequency-inverse document frequency (TF-IDF) value for a plurality of words of the plurality of texts; ranking the plurality of words based on the calculated TF-IDF values; selecting, from the plurality of words, a pre-defined number of highest ranking words; for each of the plurality of texts, embedding only the pre-defined number of words found in the text to a vector space to obtain an invoice vector of a plurality of invoice vectors…  The relevant portions of the disclosure are:
[0033] Embedding module 110 may be configured to embed text to vector form within a continuous vector space. {silent regarding using TFIDF values}
[0037] …Cluster analysis module 116 may be configured to analyze a cluster (e.g. a cluster generated by clustering module 112) and generate a description of text representing the cluster. In some embodiments, this description may be in natural language or normal text, and not in vector format. In some embodiments, generating a description that represents each cluster may include combining up to three original descriptions (prior to any processing) that contain a certain, pre-defined amount of the words with the highest TFIDF values, such as those calculated by text preparation module 108. For example, cluster analysis module 116 may, based on the vectors within a generated cluster, obtain the textual business descriptions associated with each vector. Cluster analysis modules 116 may determine up to three descriptions that contain 95% of the words with the five highest TFIDF values.
[0053] At block 506, embedding module 110 may embed the cleaned up business description using the adjusted embedding algorithm in a manner similar to or the same as described in relation to embedding module 110 in the context of FIG. 1 {silent regarding using TFIDF values}
[0057] At block 604, embedding module 110 may embed the cleanup business descriptions to a vector space. In some embodiments, embedding module 110 may use the adjusted embedding algorithm of block 208 in FIG. 200 and may operate in a manner similar to or the same as described in relation to embedding module 110 in the context of FIG. 1. {silent regarding using TFIDF values}
[0059] At block 608, cluster analysis module 116 may generate a description that represents each cluster. In some embodiments, this description may be in natural language or normal text, and not in vector format. In some embodiments, generating a description that represents each cluster may include combining up to three original descriptions (prior to any processing) that contain a certain, pre-defined amount of the words with the highest TFIDF values, as calculated at block 602. For example, cluster analysis module 116 may, based on the vectors within a generated cluster, obtain the textual business descriptions associated with each vector. Cluster analysis modules 116 may determine up to three descriptions that contain 95% of the words with the five highest TFIDF values. 
[0062] At block 804, embedding module 110 may embed the extracted text from each invoice to a vector within a vector space. In some embodiments, embedding module 110 may use a word2vec algorithm, such as the adjusted word2vec algorithm of FIGS. 5 and 6 and may operate in a manner similar to or the same as described in relation to embedding module 110 in the context of FIG. 1.
[0063] At block 806, embedding module 110 may generate a representative vector for a business based on its associated invoices that were processed in blocks 802 and 804, from database 122. …
[0068] At block 1006, embedding module 110 may embed the invoice text to a vector space. Embedding module 110 may operate in a manner similar to or the same as described in relation to embedding module 110 in the context of FIG. 1.

Neither of these portions clearly teach for each of the plurality of texts, embedding only the pre-defined number of words found in the text to a vector space to obtain an invoice vector of a plurality of invoice vectors, where the pre-defined number of words are those with the highest TFIDF values as required in the claim. 
Note that the only recitation of “ranking” or “rank” appears in the originally-filed claims (with respect to the generation of the description, e.g. claims 17 and now-canceled claims 8, 13; with respect obtaining the invoice vectors based on lemmatized texts as in claim 3 which does not have restriction that “only” the identified words are embedded).
Regarding claim 18, the claim recites at least the same limitations as identified in the rejection of claim 11 above and is rejected for similar reasons.
Regarding dependent claims 12, 14-17 and 19-20, dependent claims necessarily inherit the deficiencies of the respective parent claim.
Claims 3-4, 12, 14-15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding dependent claim 3, the claim is dependent on claims 2/1 and recites… calculating a term frequency-inverse document frequency (TFIDF) value for each word of the lemmatized texts; ranking the calculated TFIDF values; identifying a top pre-defined number of words with a highest rank; and embedding the identified words to obtain the plurality of invoice vectors. 
The written description corresponding to these operations in the disclosure as originally filed states these operations are used in cluster analysis to generate the description (see [0037], [0048-0049])
Parent claim 1 recites (emphasis added)… generating a plurality of clusters in the vector space, each cluster of the plurality of clusters comprising at least one invoice vector of the plurality of invoice vectors; generating a description for a cluster, the description for the cluster representing all invoice vectors assigned to the cluster, wherein generating the description comprises: in response to generating the plurality of clusters, calculating a term frequency- inverse document frequency (TF-IDF) value for a plurality of words in the cluster; identifying a selection of words based on the calculated TF-IDF values;… 
Thus it is unclear whether the operations in dependent claim 3 are intended to be an additional, different TFIDF analysis, or should be further limiting to the TF-IDF analysis in parent claim 1.
Regarding dependent claim 4, the claim is dependent on claims 2/1 and recites calculating a TFIDF value for each word of the lemmatized texts; identifying a pre-defined number of words with TFIDF values above a certain threshold; and embedding the identified words to obtain the plurality of invoice vectors.
Claim 4 is rejected under rationale similar to that of claim 3.
Regarding dependent claim 12, the claim recites … embedding the lemmatized texts to obtain the plurality of invoice vectors. However, parent claim 11 recites … selecting, from the plurality of words, a pre-defined number of highest ranking words… embedding only the pre-defined number of words found in the text to a vector space to obtain an invoice vector of a plurality of invoice vectors… which is unclear on its face. 
Regarding dependent claim 14, the claim recites … wherein embedding each of the plurality of texts comprises… however this element is not recited in parent claim 11. 
Regarding dependent claim 15, the claim inherits the deficiency of parent claim 14.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-3, 7, 9-12, 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over GAO et al. (Pub. No.: US 2018/0060437 A1) in view of BROSAMER et al. (Patent No.: US 10,949,825 B1, filed 06/30/2017).
Regarding claim 1, GAO teaches the method of generating business descriptions (abstract) performed by a server ([0086] processes running on e.g. server), said method comprising: 
receiving a plurality of each being associated with a business of a plurality of businesses (FIG 1: (102, 104) web pages and other textual descriptions about multiple companies; see [0019-0020, 0022]); 
extracting a plurality of texts from the plurality of 
embedding the plurality of texts to a vector space to obtain a plurality of vectors (FIG 1: (106) [0023] forming a vector of term frequencies to represent each company; (108) [0042] Representation Learning techniques are applied by an operation 118 on the TF-IDF or relevance vectors to generalize or project companies in the high dimensional n-gram space into a lower dimensional topic space); 
generating a plurality of clusters in the vector space (FIG 1: (118) generalize companies into topic spaces after performing operations (108-116) to analyze the extracted text (120) segment broad sets of companies into clusters), each cluster of the plurality of clusters comprising at least one vector of the plurality of vectors ([0042] relevance vectors; [0043] similar companies are close together in the topic vector space, a clustering algorithm is also applied at an operation 120 to automatically segment a broad set of companies into subsets or groups of companies that are similar to each other)
generating a description for a cluster (FIG 1 (122) business tags), the description for the cluster representing all vectors assigned to the cluster ([0044] For a cluster or any set of companies, the constituent companies' n-gram vector representations are merged into one n-gram vector via an aggregating function, a simple example of which is just the vector sum. From this merged n-gram vector…used to tag each cluster; FIG 7 illustrates detailed view of clusters with various business tags) wherein generating the description comprises:
calculating a term frequency-inverse document frequency (TF-IDF) value for a plurality of words in the cluster (GAO [0026] To reduce the contribution of phrases that are very common, an operation 110 applies a term-frequency (TF)inverse-document-frequency (IDF) (TF-IDF) transformation; [0030] (114) applies relevance transform to boost the TF-IDF… the relevance scores for each n-gram can be calculated as the n-gram strength times the weighted sum of the associated topic strengths; [0035] similarities between n-grams may also be used by a computer to determine the relevance scores )
identifying a selection of words based on the calculated TF-IDF values (GAO [0037] the top-ranking keyword phrases can be used as business tags); and
generating the description (e.g. the business tags) for the cluster based on the selection of words ([0044] For a cluster or any set of companies, the constituent companies' n-gram vector representations are merged into one n-gram vector via an aggregating function, a simple example of which is just the vector sum. From this merged n-gram vector…used to tag each cluster).
for each business of the plurality of businesses that has at least one vector assigned to the cluster, associating the business with the description (necessarily required for FIG 1: (122) cluster visualization of companies with business tags, where the tags are determined as in [0044]; also required to [0037] support lookups by company to find the company's descriptive tags (as shown below in FIG. 2 by operation 222)., and reverse lookups by business tag to retrieve all companies that specialize in that tag (as shown below in FIG. 2 by operation 220); and 
indexing the plurality of businesses within a database by the generated descriptions (interpreting “indexing…within a database” as storing the associated tags; necessarily required in order to [0037] support lookups by company to find the company's descriptive tags (as shown below in FIG. 2 by operation 222)., and reverse lookups by business tag to retrieve all companies that specialize in that tag (as shown below in FIG. 2 by operation 220; see also [0057] provides search engine for companies based on most relevant business tags).
As noted above, GAO does not expressly disclose the electronic documents are invoices which are extracted, vectored, and clustered in order to assign descriptions (tags) for the companies. Further, in FIG 1 of GAO shows TF-IDF analysis is done at steps 106-110 prior to any clustering of companies performed in step 120, thus FIG 1 on its own does not explicitly state that TF-IDF analysis to generate the description (tags) is done in response to generating the plurality of clusters of vectors (the generalize companies into topic spaces 118 operation and the segment into clusters 120 operation). 
Note that the only support for this limitation in the disclosure as originally filed is at [0075] … Cluster analysis module 116 may calculate a TFIDF value for each word within the cluster. In some embodiments, the TFIDF value may be calculated in relation to the entire cluster, as opposed to the TFIDF value being calculated in relation to the line item that contains the word.
BROSAMER is broadly directed to (abstract) industry vertical classification of merchants using merchant signals, based, in part, on data obtained from payment activity… identifying one or more clusters using the data associated with the merchant and classifying and/or reclassifying the merchant within a cluster and then a class using the one or more identified clusters. An “invoice” is properly considered part of “payment activity” (note BROSAMER (col 3, line 11) “payment products used (invoices, mobile reader)”. 
BROSAMER states (col 3 line 62) Deep learning models have been used for encoding contextual similarity between a variety of objects like words, images, drawings, and even 3D shapes. These models encode the complexity as a high-dimensional vector such as word2vec. Those vectors are extremely useful in that they define an immediately useful concept of similarity between objects. Things like 'coffee'-'cafe' or 'image of bicycle'-'image of unicycle.' In some implementations, deep learning models can be used to learn a vector representation of each merchant.
It is acknowledged that BROSAMER may also use a significant amount of other historical data (see e.g. (col 4 lines 22-30) in order to cluster merchants, collect merchant signals which include reported data, collected data, training data, and third party data associated with the merchant). However, note (col 5 line 25) In one implementation, in capturing the contextual features of a merchant, the methods and systems train an LDA topic model using sellers' itemization text to cluster sellers who provide similar goods and services and (col 5 line 44) For another example, the payment processing system can identify one or more class profiles based on the one or more class profiles including transactional information that is similar to the transactional information of the merchant. 
BROSAMER states (col 8 lines 24-30) The methods and systems disclosed herein describe an automated and more accurate classification within the merchant database thereby reducing the computing steps necessary for a merchant to classify their own business (e.g., multiple classification attempts) and the likelihood of an incorrect classification that may result in higher processing fee or violation of other legal requirements.
In FIG 6, BROSOMER shows (col 20 line 7) a flow diagram of a process 600 for a payment processing system obtaining merchant insights and applying in one or more use cases, for example for classifying a merchant within a class using reported data, training data, and collected data from the merchant) which has a significant amount of overlap with the cited features of GAO. Of particular interest is step 616 “compares and correlates the clusters to merchant classification”, which makes clear the correlation is done in response to the generation of the clusters of the contextual features (e.g. vectors) regarding the merchant, where (col 20 line 43) FIG. 7 illustrates graphical plots to illustrate some aspects of the clustering or classification of merchants. In one implementation, as shown in A, an LDA topic model can be used to tokenize the payment transactions. As shown, the topics for a bakery include cake, cookie, cupcake, bar, and so on. The graph shows the estimated term frequency within the selected topic versus the overall term frequency (thus teaching, in response to the generating of the clusters, performing some term frequency analysis with respect to the overall term frequency in order to identify the classifications for merchants (e.g. another mechanism for describing merchants similar to tags)).
Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention, having the teachings of GAO and BROSAMER before them, to have combined GAO and BROSAMER by using the invoice information (e.g. with itemization text) of BROSAMER as (or along with) the electronic documents analyzed in GAO, as each is used for the same purpose (to assist in classifying businesses), such that any TF-IDF analysis to generate the description (tags) is done in response to generating the plurality of clusters of vectors, the result of the combination being reasonable and expected. The combination is motivated at least by the improvement described in BROSAMER (col 8 lines 24-30).
Regarding dependent claim 2, incorporating the rejection of claim 1, GAO in view of BROSAMER, combined at least for the reasons discussed above, further teaches
detecting stop words in the plurality of texts from a pre-defined list of stop words removing the detected stop words from the extracted plurality of texts; (GAO [0020] the operation 102 may determine that it does not need to extract words such as "the," "best," etc., as they do not necessarily describe products and services of the company); 
lemmatizing (stemming) words within the extracted plurality of texts to obtain lemmatized texts (see GAO [0024] descriptive phrases are also referred to as n-grams [0032] the topics can be selected to be the individual stemmed works that make up the n-grams in order, where stemming refers to reduction of words to their word stem, base, or root form; note instant application [0047] a lemmatization process may be used to remove inflectional prefixes or suffices of a word and reduce the word to its "base" (i.e., the lemma)); and 
embedding the lemmatized texts to obtain the plurality of invoice vectors (GAO [0024] descriptive phrases are also referred to as n-grams; these are used for generating the relevance vectors which are subsequently used for clustering the businesses; relying on BROSAMER as in claim 1 to teach analyzing invoices).
Regarding dependent claim 3, incorporating the rejection of claim 2, GAO in view of BROSAMER, combined at least for the reasons discussed above, further teaches
calculating a term frequency-inverse document frequency (TFIDF) value for each word of the lemmatized texts (GAO [0026] To reduce the contribution of phrases that are very common, an operation 110 applies a term-frequency (TF)inverse-document-frequency (IDF) (TF-IDF) transformation; [0030] (114) applies relevance transform to boost the TF-IDF… the relevance scores for each n-gram can be calculated as the n-gram strength times the weighted sum of the associated topic strengths; [0035] similarities between n-grams may also be used by a computer to determine the relevance scores) ; 
ranking the calculated TFIDF values (GAO [0024] an n-gram having a higher count is ranked higher; [0026] To reduce the contribution of phrases that are very common, an operation 110 applies a term-frequency (TF)inverse-document-frequency (IDF) (TF-IDF) transformation); 
identifying a top pre-defined number of words with a highest rank (GAO [0037] the top-ranking keyword phrases can be used as business tags); and 
embedding the identified words to create the plurality of invoice vectors ([0026] To reduce the contribution of phrases that are very common, an operation 110 applies a term-frequency (TF)inverse-document-frequency (IDF) (TF-IDF) transformation; as the highest-ranked words are ultimately used for the business tags (see GAO [0037]), they must have been embedded in the relevance vectors for the documents of GAO, where, BROSAMER is relied upon to teach “invoice” document as explained in claim 1; note that the claim does not require the invoice vectors have embedded only the identified words, thus it is sufficient that the highest-ranking words exist in the vectors in order to be used for labeling (i.e. there may be other lemmatized words in the vectors as well per parent claim 2, there may be other words in the vectors per parent claim 1)).
Regarding dependent claim 7, incorporating the rejection of claim 3, GAO in view of BROSAMER, combined at least for the reasons discussed above, further teaches generating an invoice line vector for each line of the plurality of invoices; generating the plurality of clusters, wherein at least one invoice line vector is assigned to each cluster; and generating the description for a cluster, the description for the cluster representing all invoice line vectors assigned to the cluster (applying the same clustering operations of GAO to the invoice documents of BROSAMER, where BROSAMER describes identifying class profiles based on clustering items offered by merchant (col 5 lines 44-51) under the assumption that an “invoice line” includes at least the item that were part of the invoice transaction as would be understood in the art).
Regarding dependent claim 8 – canceled.
Regarding dependent claim 9, incorporating the rejection of claim 1, GAO in view of BROSAMER, combined at least for the reasons discussed above, further teaches wherein generating the plurality of clusters comprises clustering the plurality of invoice vectors in the vector space with at least one of mean-shift clustering, k-means clustering, affinity propagation clustering, spectral clustering, hierarchical clustering, density-based spatial cluster of applications with noise (DBSCAN), ordering points to identify a clustering structure (OPTICS), Gaussian mixture modeling, or Birch clustering (only one must be shown in the art when recited in the alternative; relying on BROSAMER to teach the vectors relate to invoice documents as in claim 1; relying on GAO [0043] Such clustering techniques include, but are not limited to K-Means, Spectral Clustering, DBSCAN, OPTICS, Hierarchical Clustering, and Affinity Propagation).
Regarding dependent claim 10, incorporating the rejection of claim 1, GAO in view of BROSAMER, combined at least for the reasons discussed above, further teaches 
receiving a plurality of bank transactions, each bank transaction being associated with a business of the plurality of businesses (other transactional data as taught in BROSAMER, in addition to the invoice data; e.g. (col 5 line 44) For another example, the payment processing system can identify one or more class profiles based on the one or more class profiles including transactional information that is similar to the transactional information of the merchant); 
embedding text from the plurality of bank transactions to the vector space to obtain a plurality of transaction vectors (applying the technique taught in GAO as rejected in claim 1 to the additional transaction data); 
generating the plurality of clusters in the vector space, each cluster of the plurality of clusters comprising at least one an invoice vector of the plurality of invoice vectors or a transaction vector of the plurality of transaction vectors (applying the technique taught in GAO as rejected in claim 1);
generating the description for a cluster, the description for the cluster representing all invoice vectors and transaction vectors assigned to the cluster (applying the technique taught in GAO as rejected in claim 1); and 
for each business of the plurality of businesses that has at least one invoice vector or transaction vector assigned to the cluster, associating the business with the description (applying technique taught in GAO as rejected in claim 1).
Regarding claim 11, GAO in view of BROSAMER, combined at least for the reasons discussed in claim 1, similarly teaches the method for determining a list of offerings for a business performed by a server, said method comprising: 
receiving a plurality of invoices associated with the business from one or more devices associated with the business (relying on BROSAMER to teach “invoice” documents as explained in claim 1, similarly used for clustering companies; relying on GAO FIG 1: (102, 104) web pages and other textual descriptions about multiple companies; see [0019-0020, 0022] to receive documents generally for analysis); 
extracting a plurality of texts from the plurality of invoices (GAO: FIG 1: (102, 104) extract words (106) extract keyword phrases; see [0019-0020, 0022, 0023], where BROSAMER is relied upon to teach “invoice” documents);
calculating a term frequency-inverse document frequency (TF-IDF) value for a plurality of words of the plurality of texts (GAO [0026] To reduce the contribution of phrases that are very common, an operation 110 applies a term-frequency (TF)inverse-document-frequency (IDF) (TF-IDF) transformation; [0030] (114) applies relevance transform to boost the TF-IDF… the relevance scores for each n-gram can be calculated as the n-gram strength times the weighted sum of the associated topic strengths; [0035] similarities between n-grams may also be used by a computer to determine the relevance scores; the TFIDF is calculated before clustering);
ranking the plurality of words based on the calculated TF-IDF values; selecting, from the plurality of words, a pre-defined number of highest ranking words (GAO [0037] the top-ranking keyword phrases can be used as business tags; will be used in the clustering process as shown in method of FIG 1);
for each of the plurality of texts, embedding only the pre-defined number of words found in the text to a vector space to obtain an invoice vector of a plurality of invoice vectors (GAO: FIG 1: (106) [0023] forming a vector of term frequencies to represent each company; (108) [0042] Representation Learning techniques are applied by an operation 118 on the TF-IDF or relevance vectors to generalize or project companies in the high dimensional n-gram space into a lower dimensional topic space; where BROSAMER is relied upon to teach “invoice” documents; note that while GAO does not explicitly state “only” the pre-defined number of words are embedded, this is presumed to be a simple optimization for limiting the size of the vector representations); 
{clustering in order to tag the companies associated with the documents: GAO [0041-0044]}
for each invoice vector of the plurality of invoice vectors: 
calculating a plurality of Euclidean distances between the invoice vector and a plurality of points, the plurality of points comprising a point from each of a pre-defined plurality of clusters (GAO [0042] Companies that are close together in the topic space, e.g. according to Euclidean or Cosine distance, are effectively similar to each other in terms of their business, product offerings, solutions or services).
selecting, based on the plurality of Euclidean distances, a pre-defined cluster closest to the invoice vector (GAO [0042] Companies that are close together in the topic space, e.g. according to Euclidean or Cosine distance, are effectively similar to each other in terms of their business, product offerings, solutions or services; thus “clustering” will result in placing the invoice (relevance) vector in its appropriate cluster); and 
assigning the invoice vector to the selected pre-defined cluster (the result of clustering);
creating the list of offerings for the business, the list of offerings comprising a description from each selected pre-defined cluster (GAO: [0037] top-ranking key phases by relevance score can be used as business tags that succinctly describe a company’s business or products);
receiving a request from a device of the one or more devices associated with the business (GAO [0037] dataset supports lookups by company to find the company's descriptive tags (as shown below in FIG. 2 by operation 222)., and reverse lookups by business tag to retrieve all companies that specialize in that tag (as shown below in FIG. 2 by operation 220)); and 
causing the list of offerings to be displayed on the device (using the look-up features of GAO to identify the descriptive tags for the company, where the tags include products; note also BROSAMER (col 5 lines 4-15) compare data in business profile of merchant to a collection of business profiles; data includes cluster of items associated with the class (e.g. items offered by the merchants assigned to the class, items acquired, etc.)).
While not explicitly relied upon in the rejection above, Applicant’s attention is also drawn to BROSAMER FIG 6 (col 20 lines 8-42) which has significant overlap with GAO FIG 1.
Regarding dependent claim 12, incorporating the rejection of claim 11, GAO in view of BROSAMER, combined at least for the reasons discussed above, further teaches 
detecting stop words in the extracted plurality of texts from a pre-defined list of stop words; removing the detected stop words from the extracted plurality of texts (GAO [0020] the operation 102 may determine that it does not need to extract words such as "the," "best," etc., as they do not necessarily describe products and services of the company); 
lemmatizing (stemming) words within the extracted plurality of texts to obtain lemmatized texts (see GAO [0024] descriptive phrases are also referred to as n-grams [0032] the topics can be selected to be the individual stemmed works that make up the n-grams in order, where stemming refers to reduction of words to their word stem, base, or root form; note instant application [0047] a lemmatization process may be used to remove inflectional prefixes or suffices of a word and reduce the word to its "base" (i.e., the lemma)); and 
embedding the lemmatized texts to obtain the plurality of invoice vectors (GAO [0024] descriptive phrases are also referred to as n-grams; these are used for generating the relevance vectors which are subsequently used for clustering the businesses; relying on BROSAMER as in claim 1 to teach analyzing invoices).
Regarding dependent claim 13 – canceled.
Regarding dependent claim 16, incorporating the rejection of claim 11, GAO in view of BROSAMER, combined at least for the reasons discussed above, further teaches 
receiving a plurality of bank transactions, each bank transaction being associated with a business of the plurality of businesses (other transactional data as taught in BROSAMER, in addition to the invoice data; e.g. (col 5 line 44) For another example, the payment processing system can identify one or more class profiles based on the one or more class profiles including transactional information that is similar to the transactional information of the merchant); 
embedding text from the plurality of bank transactions to the vector space to obtain a plurality of transaction vectors (applying the technique taught in GAO as rejected in claim 1 to the additional transaction data); 
generating the plurality of clusters in the vector space, each cluster of the plurality of clusters comprising at least one an invoice vector of the plurality of invoice vectors or a transaction vector of the plurality of transaction vectors (applying the technique taught in GAO as rejected in claim 1);
generating the description for a cluster, the description for the cluster representing all invoice vectors and transaction vectors assigned to the cluster (applying the technique taught in GAO as rejected in claim 1); and 
for each business of the plurality of businesses that has at least one invoice vector or transaction vector assigned to the cluster, associating the business with the description (applying technique taught in GAO as rejected in claim 1).
Regarding dependent claim 17, incorporating the rejection of claim 11, GAO in view of BROSAMER, combined at least for the reasons discussed above, further teaches wherein the description from each selected pre-defined cluster is determined by: 
calculating a term frequency-inverse document frequency (TFIDF) value for each word in a cluster (GAO [0026] To reduce the contribution of phrases that are very common, an operation 110 applies a term-frequency (TF)inverse-document-frequency (IDF) (TF-IDF) transformation; [0030] (114) applies relevance transform to boost the TF-IDF… the relevance scores for each n-gram can be calculated as the n-gram strength times the weighted sum of the associated topic strengths; [0035] similarities between n-grams may also be used by a computer to determine the relevance scores; the TFIDF is calculated before clustering as can be seen in method of FIG 1); 
ranking the calculated TFIDF values; identifying a top pre-defined number of words with a highest rank; and generating the description for the cluster with the identified words GAO [0037] the top-ranking keyword phrases can be used as business tags).
Regarding claim 18, GAO in view of BROSAMER, combined at least for the reasons discussed above in claim 1, similarly teaches the method for training a model to predict a category for a business performed by a server, said method comprising: 
receiving a plurality of invoices associated with the business (relying on BROSAMER to teach “invoice” documents as explained in claim 1, similarly used for clustering companies; relying on GAO FIG 1: (102, 104) web pages and other textual descriptions about multiple companies; see [0019-0020, 0022] to receive documents generally for analysis); 
extracting a plurality of texts from the plurality of invoices, the plurality of texts comprising a text from each of the plurality of invoices (GAO: FIG 1: (102, 104) extract words (106) extract keyword phrases; see [0019-0020, 0022, 0023], where BROSAMER is relied upon to teach “invoice” documents); 
calculating a term frequency-inverse document frequency (TF-IDF) value for a plurality of words of the plurality of texts (GAO [0026] To reduce the contribution of phrases that are very common, an operation 110 applies a term-frequency (TF)inverse-document-frequency (IDF) (TF-IDF) transformation; [0030] (114) applies relevance transform to boost the TF-IDF… the relevance scores for each n-gram can be calculated as the n-gram strength times the weighted sum of the associated topic strengths; [0035] similarities between n-grams may also be used by a computer to determine the relevance scores; the TFIDF is calculated before clustering);
ranking the plurality of words based on the calculated TF-IDF values; selecting, from the plurality of words, a pre-defined number of highest ranking words (GAO [0037] the top-ranking keyword phrases can be used as business tags; will be used in the clustering process as shown in method of FIG 1);
for each of the plurality of texts, embedding only the pre-defined number of words found in the text to a vector space to obtain an invoice vector of a plurality of invoice vectors (GAO: FIG 1: (106) [0023] forming a vector of term frequencies to represent each company; (108) [0042] Representation Learning techniques are applied by an operation 118 on the TF-IDF or relevance vectors to generalize or project companies in the high dimensional n-gram space into a lower dimensional topic space; where BROSAMER is relied upon to teach “invoice” documents; note that while GAO does not explicitly state “only” the pre-defined number of words are embedded, this is presumed to be a simple optimization for limiting the size of the vector representations); 
receiving a category associated with the business (relying on FIG 4A-4B of BROSAMER; (402) receives a selected classification from a merchant via a point-of-sale (POS) merchant device of the merchant (col 17 lines 35-46)); 
generating, based on the plurality of invoice vectors, a business vector representing the business (using the document vectors generated by GAO as part of the profile information generated in BROSAMER: (406) (col 17 lines 53-0064; as well as the training data (411) for payment activity (col 18 lines 15-37)); and 
training a neural network to predict the category based on the business vector (BROSAMER (412) executes one or more machine learning models or heuristic models on the training data, the reported data and/or the collected data for the merchant to the plurality of profiles to identify a second class [for when a merchant needs to be reclassified] (col 18 lines 37-50), where BROSAMER makes clear that machine learning algorithms include (col 18 line 25-32) supervised learning algorithms (e.g. neural networks …) as well as unsupervised learning algorithms (e.g. neural networks…), semi-supervised learning algorithms, and deep learning algorithms).
While not explicitly relied upon in the rejection above, Applicant’s attention is also drawn to BROSAMER FIG 6 (col 20 lines 8-42) which has significant overlap with BROSAMER FIG 4 and GAO FIG 1.
Claims 5-6, 14-15, 19 are rejected under 35 USC 103 as unpatentable over GAO in view of BROSAMER, further in view of MANERIKER et al (US 2020/0081964 A1).
Regarding dependent claim 5 (14), incorporating the rejection of claim 1 (11), GAO in view of BROSAMER, combined at least for the reasons discussed above, further suggests wherein embedding the plurality of texts comprises: applying a word2vec model, the applying comprising, for each text of an invoice (GAO [0075], other word embedding techniques include CBOW Word2Vec; relying on BROSAMER to teach the electronic document text could be an “invoice”; “CBOW” is an acronym known in the art which stands for “continuous bag of words” model, see MIKOLOV page ) However, GAO in view of BROSAMER fails to provide any details of this embedding technique, thus GAO cannot be relied upon to expressly disclose creating a bag-of-words representing the text, the bag-of-words including each word in the text and an associated multiplicity of each word in the text; and converting each word in the text into a vector based on the bag of words and not based on grammar and word order.
MANERIKER is broadly directed to (abstract) techniques for analyzing documents for fact replacement and style consistency. Of relevance is, when detecting changes in a document, [0045] trailing change module 204, for instance, may be configured to perform natural language processing using machine learning.
[0046, emphasis added] In a first example, a bag-of-words technique is employed by the trailing change module 204 to generate the internal representation as a bag-of-words model, also known as a vector space model. In this model, a multiset (i.e., bag) of words is used that may disregard grammar and word order but may address multiplicity. In a second example, the trailing change module 204 implements an embedding-based technique (e.g., word2vec) which refers to a set of language model and feature learning techniques in NLP in which words from a vocabulary are mapped to vectors of real numbers to generate the internal representation. In a third example, the trailing change module 204 implements machine learning techniques to generate the internal representation from features that are learned from words of the initial portion of the digital document 112 and/or candidate document portions. In these examples, words may include individual words as well as named entities.Once the internal representations are generated [0047], they are passed as input to a portion representation model to generate a portion representation of the digital document (e.g. using an aggregation operation… concatenation or averaging across the internal representations is used to generate the portion representation… or other machine learning techniques including use of a recurrent neural network, a long short-term memory (“LSTM”) network, and so forth).
It is clear from MANERIKER that the use of both bag-of-words and word2vec is known for generating a vector representation (embedding) of words from an initial text so that those vector representations may be used during the analysis of the text, and further that these vector representations may be combined using an aggregation operation such as concatenation or averaging.
Accordingly, it would have been obvious to one having ordinary skill in graphical user interfaces before the effective filling date of the claimed invention, having the teachings of GAO in view of BROSAMER and MANERIKER before them, to have combined GAO in view of BROSAMER and MANERIKER and arrived at the claimed invention, the combination motivated by the suggestion in GAO [0075].
Regarding dependent claim 6 (15), incorporating the rejection of claim 5 (14), GAO in view of BROSAMER, further in view of MANERIKER, combined at least for the reasons discussed above, further teaches wherein embedding the plurality of texts comprises, for a line of an invoice comprising a plurality of words: embedding each word of the invoice line using the word2vec model to obtain a plurality of word vectors; feeding the plurality of word vectors to a long short-term memory (LSTM) layer; and combining, via the LSTM layer, the plurality of word vectors to obtain a single vector representing the invoice line, wherein combining includes at least one of vector addition, vector subtraction, scalar multiplication, sigmoid function multiplication, or hyperbolic function multiplication (applying an aggregation operation of MANERIKER [0047] on the word vectors generated in MANERIKER [0046], e.g. using an RNN or LSTM; where the aggregation includes “vector addition” (e.g. concatenation) or any other known RNN/LSTM machine learning technique; note that only one mechanism must be shown in the art when recited in the alternative).
Regarding dependent claim 19, incorporating the rejection of claim 18, neither GAO nor BROSAMER may be relied upon to expressly disclose wherein generating the business vector comprises averaging all invoice vectors associated with the business because, at best, BROSAMER teaches combining the extracted information from invoices and other transactional information for training purposes, where GAO is relied upon to teach relevant vector representations of documents to be used for classification. However, when considering the teachings of MANERIKER [0045-0047] as discussed above, for at least the reasons discussed above, once a set of vectors (whether words or collections of words) has been determined for a particular document during the analysis phase, these vectors [0047] are passed as input to a portion representation model to generate a portion representation of the digital document (e.g. using an aggregation operation… concatenation or averaging across the internal representations is used to generate the portion representation (where portion can be a sub-part of the document or the entirety of the document), thus “averaging vectors” in order to create a single vector is a known technique which may be applied during document analysis.
Dependent claim 4 is rejected under 35 USC 103 as unpatentable over GAO in view of BROSAMER, further in view of GHOSHAL et al. (US 2020/0125575 A1).
Regarding dependent claim 4, incorporating the rejection of claim 2, GAO in view of BROSAMER, combined at least for the reasons discussed above, further teaches 
calculating a TFIDF value for each word of the lemmatized texts (see e.g. GAO [0026] To reduce the contribution of phrases that are very common, an operation 110 applies a term-frequency (TF)inverse-document-frequency (IDF) (TF-IDF) transformation; [0030] (114) applies relevance transform to boost the TF-IDF… the relevance scores for each n-gram can be calculated as the n-gram strength times the weighted sum of the associated topic strengths; [0035] similarities between n-grams may also be used by a computer to determine the relevance scores); 
However, GAO in view of BROSAMER may not be relied upon to expressly disclose identifying a pre-defined number of words with TFIDF values above a certain threshold; and embedding the identified words to obtain the plurality of invoice vectors although GAO makes clear in [0037] that the highest-ranked words are used for the classification, thus the words must exist in the vectors, relying on BROSAMER to teach the electronic documents used for business classification may be “invoices”. Note that the claim does not require that only the identified words are embedded (i.e. there may be other lemmatized words in the vectors as well per parent claim 2, there may be other words in the vectors per parent claim 1). 
GHOSHAL is similarly in the field of natural language processing (e.g. content classification and recommendation; (abstract) analyze text and/or visual input, extract keywords or topics from the input, classify and tag the input content, and store the classified/tagged content in one or more content repositories; solely for informational purposes, note FIG. 8 is an example of a text document illustrating a keyword extraction process; FIG. 12 is a flowchart illustrating another process for comparing feature vectors and identifying related content within a content repository; FIG. 20 is an example of a keyword analysis within a 3D word vector space; FIG. 21 is a diagram illustrating a keyword-to-tag vector space analysis). GHOSHAL may be relied upon to teach identifying a pre-defined number of words with TFIDF values above a certain threshold; and embedding the identified words to obtain the plurality of vectors (see [0095] which explains how text classification/relatedness techniques determine vectors of input text by calculating TFIDF scheme to quantify weights, then using an inverted index to both map words to a list of concepts as well as discard insignificant associations by removing concepts whose weights are below a certain threshold; interpreting “pre-defined number” as the number of entries in the inverted index).
Thus, GHOSHAL clearly teaches a technique which may be applied during document classification which reduces the number of concept words to those which are the most relevant based on TFIDF weights prior to creating a vector representation of the document).
Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention, having the teachings of GAO in view of BROSAMER and GHOSHAL before them, to have used the technique of GHOSHAL when generating the vector embeddings of the invoice documents and arrived at the claimed invention with a reasonable expectation of success, the combination motivated by the goal of the technique which is to reduce the number of vector words (because they have irrelevant associations).
Claim 20 is rejected under 35 USC 103 as unpatentable over GAO in view of BROSAMER, further in view of RALHAN (US 2019/0354809 A1).
Regarding dependent claim 20, incorporating the rejection of claim 18, GAO in view of BROSAMER, combined at least for the reasons discussed above, further teaches or suggests:
Deep learning models have been used for encoding contextual similarity between a variety of objects like words, images, drawings, and even 3D shapes. These models encode the complexity as a high-dimensional vector such as word2vec. Those vectors are extremely useful in that they define an immediately useful concept of similarity objects; where deep learning, as is known in the art, may use several neural networks and/or neural network layers) and 
training the neural network comprises: 
receiving a plurality of business pairs as labelled training data, each business pair including a business vector and a label indicating a category that the business vector belongs to (suggested at least at BROSAMER (col 18 line 25) supervised learning algorithms for neural network; as well as (col 18 line 31) deep learning algorithms; as well as (col 4 lines 5-20) deep learning models can be used predict merchant type using vector representations of each merchant; “supervised learning” for classification in a neural network is a known technique using <input pattern, known classification> labelled pairs); 

training a softmax layer within the neural network to map category predictions of the neural network to probabilities (see BROSAMER (col 20 line 20) process 600 can apply Soft Max based classification; see also (col 4 lines 5-15) deep learning models can use soft max when learning vector representations for classification; note (col 26 lines 1-5) predicting the classification based on clustering includes probability).
Any deficiencies in GAU in view of BROSAMER may be cured by RALHAN, which is broadly directed to (abstract) computational model management …, an apparatus may include a processor and a memory storing instructions, which, when executed by the processor, cause the processor to generate a candidate computational model for a model function based on an objective and function data, and provide the candidate computational model to a model assessment process; and further explains at least one use of the computational model for classifying business documents (see [0123]), thus RALHAN is analogous art.
RALHAN explains in [0028] computational model process may operate to manage a machine learning (ML) computational model lifecycle. In various embodiments, the models may include various computational models, including, without limitation, artificial intelligence (AI) frameworks, neural network models, and/or the like. Non-limiting examples of a computational model may include an ML model, an artificial intelligence (AI) model, a neural network (NN), an artificial neural network (ANN), a convolutional neural networks (CNN), a deep learning (DL) network, a deep neural network (DNN), a recurrent neural network (RNNs), combinations thereof, variations thereof, and/or the like.
A specific example of a machine-learning model may be seen in FIG 6, which is [0079, emphasis added] neural network 600 is represented as multiple layers of interconnected neurons, such as neuron 608, that can exchange data between one another. The layers include an input layer 602 for receiving input data, a hidden layer 604, and an output layer 606 for providing a result (three layers).
Neural network 600 [0080] can be trained using training data provided to the input layer, for example using [0081] back propagation, as wells both [0083] forward and backward propagation (when neural network 600 is a recurrent neural network). [0084] The neural network 600 operates by receiving a vector or numbers from one layer, transforming the vector into a new vector, and providing the vector to a subsequent layer.
Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention, having the teachings of GAO in view of BROSAMER and RALHAN before them, to have combined GAO in view of BROSAMER (using business document vectors to train a neural network for classification of a business) and RALHAN (providing specific examples of the structural details of a neural network, how it may be trained, and a use case of classifying business documents) by using the details of RALHAN to implement the neural network taught in BROSAMER (albeit without structural detail), motivated at least in part by the teachings of BROSAMER of using any of a number of different machine learning mechanisms including neural network as explained above.
It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned. They are part of the literature of the art, relevant for all they contain.” In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)). Further, a reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill the art, including nonpreferred embodiments. Merck & Co. v. Biocraft Laboratories, 874 F.2d 804, 10 USPQ2d 1843 (Fed. Cir.), cert. denied, 493 U.S. 975 (1989). See also Upsher-Smith Labs. v. Pamlab, LLC, 412 F.3d 1319, 1323, 75 USPQ2d 1213, 1215 (Fed. Cir. 2005); Celeritas Technologies Ltd. v. Rockwell International Corp., 150 F.3d 1354, 1361, 47 USPQ2d 1516, 1522-23 (Fed. Cir. 1998).



CONCLUSION
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 2010/0145678 A1 (CSOMAI) method and system for keyword extraction
US 2019/0213407 A1 (TOIVANEN) tokenizing and analyzing input documents

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMY M LEVY whose telephone number is (571)270-3771. The examiner can normally be reached Mon-Fri 8am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KIEU VU can be reached on (571) 272-4057. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Amy M Levy/Primary Examiner, Art Unit 2173