DETAILED ACTION
Drawings
The drawings are objected to because they contain color.  Specifically Figures 6B and 9B (see the similarity columns) contain grey colorations.  The grey colorations within these figure do not follow the guidelines required for lines (See CFR 1.84(l)) or shaded objects (See CFR 1.84(m)).
Drawings are normally required to be submitted in black ink on white paper.  Color photographs and color drawings are not accepted unless a petition filed under 37 CFR 1.84(a)(2) is granted. Any such petition must be accompanied by the appropriate fee set forth in 37 CFR 1.17(h), three sets of color drawings or color photographs, as appropriate, and, unless already present, an amendment to include the following language as the first paragraph of the brief description of the drawings section of the specification:
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
 Color photographs will be accepted if the conditions for accepting color drawings and black and white photographs have been satisfied. See 37CFR 1.84(b)(2).  Note that the requirement for three sets of color drawings under 37 CFR 1.84(a)(2)(ii) is not applicable to color drawings submitted via EFS-Web.  Therefore, only one set of such color drawings is necessary when filing via EFS-Web.  

Claim Interpretation
The specification recites the following.  These recitations in the specification been identified as suggestions of term scope, (i.e. examples) and not explicit definitions as each is recited with the permissive term “may”.  
Paragraph [0002] of the specification recites “As used herein, the term “string” may refer to any series of characters such as words, sentences, paragraphs, codes, etc.” (emphasis added)
Paragraph [0024] of the specification recites “As used herein, the term “automated” may refer to, for example, actions that can be performed with little (or no) intervention by a human”. (emphasis added) Please note that the term “little” is a relative term with no clarity of what would qualify as sufficient human intervention to discount an action from being automated.  As this recitation has been identified as a suggestion of scope no ambiguity or vagueness issues are raised.
Paragraph [0029] of the specification recites “As used herein, the phrase “stop words” may refer to words which typically do not carry much significance and therefore can be filtered out before processing natural language (e.g., text strings).” (emphasis added) Please note that the phrase ‘typically do not carry much’ is a relative phrase with no clarity of what qualifies a word as carrying sufficient significance as to not be identified as a stop word.  As this recitation has been identified as a suggestion of scope no ambiguity, or vagueness issues are raised.

Claim Objections
Claims 1-9 are objected to because of the following informalities.  Appropriate correction is required.

With regard to claim 1, the claim use comma’s and indentions to separate between the receiving, storing, computing, constructing, analyzing, and arranging steps performed by the computer processor.  While there may be plural indentions to segregate sub combinations or related steps. “In general, the printed patent copies will follow the format used by printing difficulties or expense may prevent the duplication of unduly complex claim formats” (MPEP 608.01(m).  
This means that the indentations present in the claims may not be able to be maintained during the printing of the patent.  It is therefore suggested that the claims be amended to use semi-colons to separate these steps.  The use of semi-colons to separate the steps performed by the processor would ensure that the structure of the claim is maintained even if the indentions are not duplicated during printing.

	With regard to claim 1, the claim recites " (b) the back-end application computer server, coupled to the input data store, including: … ".  The structure of the claim language may reasonably be interpreted to mean either that the back-end application computer server or the input data store includes the recited devices.  It is suggested that the claims be amended to make it clear which device includes the recited devices.  For examination purposes this claim limitation has been construed to mean --(b) the back-end application computer server coupled to the input data store; the back-end computer server including: ... --.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-22 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the enablement requirement.  The claim(s) contains subject matter which was not described in the specification in such a way as to enable one skilled in the art to which it pertains, or with which it is most nearly connected, to make and/or use the invention.
Claim 1 recites “construct a two-column result table via a self-join on the single column, with shorter strings being kept in a first column of the result table.”  Claims 10 and 19 appear to recite substantially similar limitations and are rejected based upon the same rational.  There is not sufficient description in the specification to enable one of ordinary skill in the art to make or use a device that performs the claimed functionality without undue experimentation.
The following factors have been evaluated to come to this determination:
(A) The breadth of the claims (See MPEP2164.08):  The instant claim limitation explicitly recites that the results table is constructed as a ‘two-column results table’.  The instant claims do not describe how the result table is generated beyond stating the use of a self-join on the single column.  There is no further description or detail provided in the claims for this operation.
(B) The nature of the invention (See MPEP 2164.05(a)):  The field of art is directed to computer instructions, and one of ordinary skill in the art would understand that a SQL query may be generated to perform this function. 
(C) The State of the prior art, (D) The level of one of ordinary skill (See MPEP 2164.05(b)), (E) The level of predictability in the art (See 2164.03):  The operations within SQL are standardized within the field of art, and highly predictable.  w3schools.com detail the standard for SQL, including the SQL Self Join function (See w3schools.com [SQL Self Join]).  One of ordinary skill in the art would recognize that a table alias may be used to refer to the table twice, making the join a self-join operation.  The difficulty one of ordinary skill in the art would face, is determining the specific WHERE conditions to generate the two-column result table, with the shorter strings being kept in a first column as claimed.
(F) The amount of direction provided by inventor; (G) The existence of working examples, and (H) The quantity of experimentation needed to make or use the invention based on the content of the disclosure:  
The claimed construction of a two-column table via the application of a self-join on a single column table is incommensurate with the disclosed figures and description in the specificity.
Within the drawings, multiple examples of input tables are provided.  Figure 1, depicts table 112 that includes at least three distinct columns.  Figure 6A is described as a “one-column table” in Paragraph [0034], yet is depicted in the drawings as a two-column table.  Figure 9A is described as representing input data 900, yet the drawing for table 900 is a two-column table, not a one column table.  Each of the examples of the input table, are referred to as one-column tables, yet the figures clearly show two-column tables.  
Similarly, there are multiple examples of the result tables provided within the drawings, none of which are two-columns.  Figure 6B element 610 is described as “results” in Paragraph [0035], yet this table clearly contains 7 distinct columns.  Figure 9B element 910 is described as a “result table database” in Paragraph [0049], yet this result table clearly contains 7 distinct columns.
The claimed subject matter is not commensurate with the disclosed subject matter, and the drawings provided.  The disclosed subject matter and the drawings do not detail using a single column table within a self-join to generate a resultant two column table.  One of ordinary skill in the art has not been provided with the details to build the claimed device, but has instead been provided with contradictory details that do not appear to describe the claimed device.
There are no details provided regarding how the results tables are actually formed beyond the use of a ‘self-join’ for combining columns from one table by using common values (See Paragraph [0029] of the specification).  There is no detail regarding how to formulate the self-join function to enable one of ordinary skill to construct the appropriate functions necessary to generate the result table.  The general statement of a WHERE condition to ‘keep shorter strings to the left in the result table’ does not detail how to generate the condition or the result table.  The claims detail that the two-column result table is constructed via a self-join on the single column.  Neither the claims nor the specification recites the use of the length in the join operation at all.  There is no description or detail regarding how this self-join operation is to be formulate, or performed.  There is no description or detail regarding the inputs or the conditions detailed in the join operation.  One of ordinary skill in the art would be required to determine all of these critical conditions and details themselves.  Nowhere in the specification is the condition of the were clause described or detailed, nor is there any suggestion or explanation of how it is to be formulated.
Within the field of the art, if a WHERE condition is applied based on the length of the strings, the system would be expected to generate a results list of the strings that satisfy the length threshold condition.  This does not generate two columns, but instead is expected to generate a single results list of strings that satisfy the condition.  If the reader assumes that the system divides the strings based on a length threshold, then one may determine that strings less than or equal to a threshold maybe placed in a first result list, and strings greater than the threshold may be placed in a second results list.  Or strings less than in one list, while strings greater than or equal in a second list.  Yet within the examples provided for the results table in Figure 9B, this is clearly not the case within the described invention.  
Within the lengths of the strings depicted in Figure 9B, the strings 1001-1006 were duplicated and divided somehow to produce a results table where all the strings 1001-1006 are duplicated multiple times, and are divided into two groups, one group with strings of length 13-27 the other group with strings of length 24-41.  The claims recite that the results table is comprised of a first column and a second column, ‘with shorter strings being kept in a first column of the result table”.  It is unclear what is expected to be identified as ‘a first column’ and ‘a second column’ within Figure 9B, 910.  But it is clear that the ‘shorter’ strings are not fully contained within ‘a single column’, as strings of length 24-27 exist in multiple columns.  Figure 6B suffers from similar issues.  Again, there is a duplication of the strings, with no details provided as to why the strings are duplicated, and there is the overlap of string lengths, with the “alpha_lengths” ranging from 48-78 and the ‘beta_lengths” ranging from 51-78.  Meaning that strings of length 51-78 are placed in both columns.
The details to generate the result tables in 6B and 9B from the input tables 6A and 9A are not provided within the specification in a manner to enable one of ordinary skill to be able to build a device to perform this function.

35 USC § 101
With regard to claim 10, please note that the term "back-end application computer server" has been understood to refer to a computer including a processor.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 8-15, 17-21 are rejected under 35 U.S.C. 103 as being unpatentable over Maan [2020/0272692] in view of Khayyat [2017/0060944] and Kumaresan [2002/0171800].

With regard to claim 1, Mann teaches An alphanumeric string as the words in the patent (Maan, ¶28 “the array generating module 204 may generate an array that includes a plurality of tuples from the patent document text.  It may be appreciated by those skilled in the art that the tuples are a finite ordered sequence of elements, such as words”) similarity analysis system as identifying pairings of words to substitute words (¶48 “for each of the identified one or more words sequences within the patent document text, a text box 506 shows a substitute word-sequence that would be used to replace an associated original word-sequence in the input patent document text”; Figure 5A, 506, see the pairing of “electronic message => message”, etc.) implemented via a back- end application computer server (Maan, Figure 1, 102 “Computing Device” which is located behind the “network” 106), comprising: 
(a) an input data store as memory (Maan, Figure 7, 730) that contains electronic records as patents (Maan, ¶25 “The data repository creating module 202 may receive a plurality of patent documents”), each electronic record as a patent (Id)  being associated with enterprise data as the patent is associated with chemical or electronical engineering data (Maan, ¶45 “by way of an example, for a chemical patent… in a patent from the field of electronical engineering”) and including an electronic record identifier as each patent has an application number (Maan, ¶25 “The data repository creating module 202 may receive a plurality of patent documents”) and an alphanumeric string as the words in the patent (Maan, ¶28 “the array generating module 204 may generate an array that includes a plurality of tuples from the patent document text.  It may be appreciated by those skilled in the art that the tuples are a finite ordered sequence of elements, such as words”); 
(b) the back-end application computer server (Maan, Figure 1, 102 “Computing Device” which is located behind the “network” 106; ¶59 “a general-purpose computer system, such as a personal computer (PC) or server computer”), coupled to the input data store (Maan, Figure 7, see that the memory 730 is part of the computer system 702), including: 
a computer processor (Maan, Figure 7, 704; Figure 1, 110), and 
a computer memory (Maan, Figure 7, see 726, 728; Figure 1, 112), coupled to the computer processor (Maan, Figure 7, storage interface 724 connecting processor 704 to the RAM 726 and ROM 728), storing instructions (Maan, ¶22 “The memory 112 may store instructions that, when executed b the processor 110, cause the processor to…”) that, when executed by the computer processor cause the back-end application computer server (Id) to: 
receive, from the input data store, information about electronic records to be analyzed, including alphanumeric strings as generating an array of words from the patent document (Maan, ¶28 “the array generating module 204 may generate an array that includes a plurality of tuples from the patent document text.  It may be appreciated by those skilled in the art that the tuples are a finite ordered sequence of elements, such as words”), 
store the alphanumeric strings in a single column as an array (Id), 
compute a length of each alphanumeric string as determining that a word sequence is long or short (Maan, ¶52 “an associated substitute short word-sequence may be determined for each of the one or more long word-sequences”) in the single column as the array of words (Maan, ¶28), 
…, with shorter strings (Maan, ¶52 “an associated substitute short word-sequence may be determined for each of the one or more long word-sequences”)…, 
automatically analyze the result table using … to generate a similarity score as identifying pairings of words to substitute words (¶48 “for each of the identified one or more words sequences within the patent document text, a text box 506 shows a substitute word-sequence that would be used to replace an associated original word-sequence in the input patent document text”; Figure 5A, 506, see the pairing of “electronic message => message”, etc.) for an alphanumeric string as the words in the patent (Maan, ¶28 “the array generating module 204 may generate an array that includes a plurality of tuples from the patent document text.  It may be appreciated by those skilled in the art that the tuples are a finite ordered sequence of elements, such as words”) …, and 
arrange to output indications of the similarity scores as the replacement of the word with the substitute word, indicates that the terms are similar (Maan, Figure 5B, see 508; ¶49 “A text box 508 in the FIG. 5B illustrates the patent document text, such that, the identified one or more word-sequences have been replaced with associated substitute word-sequences”); and 
(c) a communication port (Maan, Figure 1, 106; Figure 7, 714) coupled to the back-end application computer server (Maan, Figure 1, 102; Figure 7, 702) to facilitate a transmission of data as sending the data to be displayed (Maan, ¶42 “the original word-sequence may be displayed to the user in a display window; ¶44 “a drop-down menu may be provided that may include a list of alternative word-sequence suggestions”) with remote user devices to support interactive user 18Docket No.: H00553 (H03.223)interface displays (Maan, ¶41 “a predefined action may be received through an external interface on a highlighted attribute of a substitute word-sequence.  By way of example, the external interface may include a mouse, a touch screen, or a keyboard.  By way of example, the predefined action may include at least one of clicking on the highlighted attribute, hovering a mouse pointer over the highlighted attribute, clicking on the highlighted attribute for a predefined duration through a mouse, or performing a right-click on the highlighted attribute”), including the similarity scores, via a distributed communication network (Maan, Figure 1, 106; Figure 7, 716). 
Maan does not explicitly teach to construct a two-column result table via a self-join on the single column, with [condition] being kept in a first column of the result table… in the first column and a corresponding string in a second column of the result table.
Khayyat teaches construct a two-column result table as the pairs (Khayyat, ¶36 “returns a set of pairs {(si,sj)} where si takes more time than sj; the result is {(s2,si), (s2,s3), (s2,s4), (s1,s3), (s1,s4), (s4,s3)}) via a self-join on the single column as the query when t_id is the same, such as a single relation input (Khayyat, ¶34 “Qs: SELECT s1.t_id, s2.t_id”; ¶60 “(ii) apply Cartesian product (or self-Cartesian product for a single relation input) on the data blocks”), with [condition] as the time being more than, see the WHERE condition (Khayyat, ¶36 “returns a set of pairs {(si,sj)} where si takes more time than sj; the result is {(s2,si), (s2,s3), (s2,s4), (s1,s3), (s1,s4), (s4,s3)}) being kept in a first column of the result table as the left most result in each pair (Id) … in the first column as the left most result in each pair (Id) and a corresponding string in a second column of the result table as the right most result in each pair (Id).
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the device taught by Maan to determine candidate pairings between words using the techniques taught by Khayyat to yield the predictable results of generating a set of word sequences pairing relatively shorter words with relatively longer words within the text document.  Please note that within the proposed combination, Maan takes an array of terms (¶47, Figure 5, 504) and somehow generates a list of paired terms (Figure 5, 506), wherein the system is able to identify which term in the pairing is the shortest term (Maan, ¶32, ¶52).  Maan does not explicitly state how this functionality is performed.  The techniques taught by Khayyat provide a means of generating term pairings, where the system is able to identify which term is the shorter term within each pairing.
Khayyat does not teach explicitly teach automatically analyze the result table using cosine similarity to generate a similarity score … arrange to output indications of the similarity scores… displays, including the similarity scores.  Maan teaches determining substitute words based on a mapping dictionary (Maan, ¶52) but does not state how the mapping dictionary is used to determine such substitute words.  
Kumaresan teaches automatically analyze the result table using cosine similarity to generate a similarity score as the cosine similarity assigned to records within the dictionary (Kumaresan, ¶98 “a dictionary token may have a greater cosine similarity to another vector in the vector space”; ¶99 “the dictionary of word emending’s may assign records to the same group … similarity may be determined based on the Euclidean distance or cosine similarity between different embeddings”)
arrange to output indications (Kumaresan, ¶127 “the interactive interface may present links between different sets of clusters”) of the similarity scores as the links (Id) which include the cosine similarity (¶98 “cosine similarity”)… displays, including the similarity scores as displaying links, i.e. cosine similarity, between clusters of words (Kumaresan, ¶127 “the interactive interface may present links between different sets of clusters”; ¶98 “cosine similarity”).
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the dictionary mapping within the proposed combination, using the word clustering techniques to build the dictionary.  The proposed combination yields the predictable results of providing a means of generating a dictionary that is capable of relating words based on similar meaning. 

With regard to claims 2, and 20 the proposed combination further teaches wherein the self-join (Khayyat, ¶33 “self-join query”) is associated with a Structured Query Language ("SQL")-type (Khayyat, ¶70 “Spark SQL allows users to query structed data on top of Spark”) WHERE condition (Khayyat, ¶36 “WHERE s1.time>s2.time;”) to keep shorter stings as one of ordinary skill in the art would recognize that the condition (Khayyat, ¶36 “s1.time>s2.time”) can be replaced to determine which string is shorter (Maan, ¶32 “replacing one or more occurrences of word-sequences with relatively shorter word-sequences”, ¶52 “an associated substitute short word-sequence may be determined for each of the one or more long word-sequences based on a mapping dictionary”) in order to facilitate combination put forth above.  to the left in the result table as the results that si takes more time (Khayyat, ¶36 “returns a set of pairs {(si,sj)} where si takes more time than sj; the result is {(s2,si), (s2,s3), (s2,s4), (s1,s3), (s1,s4), (s4,s3)}).  

With regard to claims 3, 12 and 21 the proposed combination further teaches wherein said analysis includes comparing the similarity scores to a pre-defined threshold value (Kumaresan, ¶99 “the semantic meaning conveyed by the keywords may be substantially similar (e.g., within a threshold Euclidean distance or cosine similarity) of the word embeddings included in the matched records”) and automatically creating families (Kumaresan, ¶99 “During clustering operations, the dictionary of word embeddings may assign records to the same group even though the keywords may not exactly match if the semantic meanings of the keywords are similar”) of alphanumeric strings as the keywords (Id) based at least in part on said comparisons as the cosine being within a threshold (Id).  

With regard to claims 4, and 13 the proposed combination further teaches wherein the back-end application computer server removes stop words from the alphanumeric strings as removing the stop words during the array generation (Maan, ¶28 “the array generating module 204 may generate an array that includes a plurality of tuples from the patent document text… In some embodiments, the plurality of tuples may be identified by removing one or more stop words in the data repository from the patent document text”) before computing the length of each alphanumeric string as the replacement is done based on the word length, which is required to be done after the array is generated (Maan, ¶30 “replace each of the identified one or more word-sequences with an associated substitute word-sequence”; ¶52 “an associated substitute short word-sequence may be determined for each of the one or more long word-sequences”).  

With regard to claims 5 and 14 the proposed combination further teaches wherein the back-end application computer server automatically replaces longer alphanumeric strings with shorter versions from the same family as replacing the longer words with shorter word that are mapped to each other (Maan, ¶30 “replace each of the identified one or more word-sequences with an associated substitute word-sequence”; ¶52 “an associated substitute short word-sequence may be determined for each of the one or more long word-sequences based on a mapping dictionary”).  

With regard to claims 6 and 15 the proposed combination further teaches wherein at least one of the computation of the length of each alphanumeric string as determining that a word sequence is long or short (Maan, ¶52 “an associated substitute short word-sequence may be determined for each of the one or more long word-sequences”) in the single column as the array of words (Maan, ¶28) and the construction of the two-column result table as the pairs (Khayyat, ¶36 “returns a set of pairs {(si,sj)} where si takes more time than sj; the result is {(s2,si), (s2,s3), (s2,s4), (s1,s3), (s1,s4), (s4,s3)}) via a self-join on the single column as the query when t_id is the same, such as a single relation input (Khayyat, ¶34 “Qs: SELECT s1.t_id, s2.t_id”; ¶60 “(ii) apply Cartesian product (or self-Cartesian product for a single relation input) on the data blocks”) is associated with a PySpark instruction (Khayyat, ¶60 “Spark SQL; ¶71 “a new Spark SQL physical join operator”.  One of ordinary skill in the art would recognize that PySpark was released to support Python API for Spark.  The distinction between PySpark and Spark is merely the coding language in which the interface is written, and thus would be recognized as substantially equivalent by one of ordinary skill in the art). 
  
With regard to claims 8 and 17 the proposed combination further teaches wherein the alphanumeric strings are associated with at least one of: (i) business data of the enterprise as the patent is associated with chemical or electronical engineering data (Maan, ¶45 “by way of an example, for a chemical patent… in a patent from the field of electronical engineering”), (ii) business control statements, (iii) insurance information, (iv) insurance claim descriptions, (v) an industry category, and (vi) medical information.  

With regard to claims 9 and 18 the proposed combination further teaches wherein the output indications of similarity scores as the replacement of the word with the substitute word, indicates that the terms are similar (Maan, Figure 5B, see 508; ¶49 “A text box 508 in the FIG. 5B illustrates the patent document text, such that, the identified one or more word-sequences have been replaced with associated substitute word-sequences”) are associated with a Hadoop big data Hive table (Khayyat, ¶71 “If the result does not fit in the memory of a single machine, the result is temporarily stored into Hadoop Distributed File System (HDFS)”).  

With regard to claim 10 Maan teaches A computerized alphanumeric string as the words in the patent (Maan, ¶28 “the array generating module 204 may generate an array that includes a plurality of tuples from the patent document text.  It may be appreciated by those skilled in the art that the tuples are a finite ordered sequence of elements, such as words”) similarity analysis method as identifying pairings of words to substitute words (¶48 “for each of the identified one or more words sequences within the patent document text, a text box 506 shows a substitute word-sequence that would be used to replace an associated original word-sequence in the input patent document text”; Figure 5A, 506, see the pairing of “electronic message => message”, etc.) implemented via a back-end application computer server (Maan, Figure 1, 102 “Computing Device” which is located behind the “network” 106), comprising: 
receiving, by a computer processor of the back-end application computer server (Maan, Figure 1, 102 “Computing Device” which is located behind the “network” 106) from the input data store as memory (Maan, Figure 7, 730), information about electronic records to be analyzed as patents (Maan, ¶25 “The data repository creating module 202 may receive a plurality of patent documents”), wherein each electronic record as a patent (Id) is associated with enterprise data as the patent is associated with chemical or electronical engineering data (Maan, ¶45 “by way of an example, for a chemical patent… in a patent from the field of electronical engineering”) and includes an electronic record identifier as each patent has an application number (Maan, ¶25 “The data repository creating module 202 may receive a plurality of patent documents”) and an alphanumeric string as generating an array of words from the patent document (Maan, ¶28 “the array generating module 204 may generate an array that includes a plurality of tuples from the patent document text.  It may be appreciated by those skilled in the art that the tuples are a finite ordered sequence of elements, such as words”); 
storing the alphanumeric strings in a single column as an array (Id); 
computing a length of each alphanumeric string as determining that a word sequence is long or short (Maan, ¶52 “an associated substitute short word-sequence may be determined for each of the one or more long word-sequences”) in the single column as the array of words (Maan, ¶28); 
…, with shorter strings (Maan, ¶52 “an associated substitute short word-sequence may be determined for each of the one or more long word-sequences”) …; 
automatically analyzing the result table using … to generate a similarity score as identifying pairings of words to substitute words (¶48 “for each of the identified one or more words sequences within the patent document text, a text box 506 shows a substitute word-sequence that would be used to replace an associated original word-sequence in the input patent document text”; Figure 5A, 506, see the pairing of “electronic message => message”, etc.)  for an alphanumeric string as the words in the patent (Maan, ¶28 “the array generating module 204 may generate an array that includes a plurality of tuples from the patent document text.  It may be appreciated by those skilled in the art that the tuples are a finite ordered sequence of elements, such as words”) …; and 
arranging to output indications of the similarity scores as the replacement of the word with the substitute word, indicates that the terms are similar (Maan, Figure 5B, see 508; ¶49 “A text box 508 in the FIG. 5B illustrates the patent document text, such that, the identified one or more word-sequences have been replaced with associated substitute word-sequences”).
Maan does not explicitly teach to constructing a two-column result table via a self-join on the single column, with [condition] being kept in a first column of the result table… in the first column and a corresponding string in a second column of the result table.
Khayyat teaches constructing a two-column result table as the pairs (Khayyat, ¶36 “returns a set of pairs {(si,sj)} where si takes more time than sj; the result is {(s2,si), (s2,s3), (s2,s4), (s1,s3), (s1,s4), (s4,s3)}) via a self-join on the single column as the query when t_id is the same, such as a single relation input (Khayyat, ¶34 “Qs: SELECT s1.t_id, s2.t_id”; ¶60 “(ii) apply Cartesian product (or self-Cartesian product for a single relation input) on the data blocks”), with [condition] as the time being more than, see the WHERE condition (Khayyat, ¶36 “returns a set of pairs {(si,sj)} where si takes more time than sj; the result is {(s2,si), (s2,s3), (s2,s4), (s1,s3), (s1,s4), (s4,s3)}) being kept in a first column of the result table as the left most result in each pair (Id) … in the first column as the left most result in each pair (Id) and a corresponding string in a second column of the result table as the right most result in each pair (Id).
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the device taught by Maan to determine candidate pairings between words using the techniques taught by Khayyat to yield the predictable results of generating a set of word sequences pairing relatively shorter words with relatively longer words within the text document.  Please note that within the proposed combination, Maan takes an array of terms (¶47, Figure 5, 504) and somehow generates a list of paired terms (Figure 5, 506), wherein the system is able to identify which term in the pairing is the shortest term (Maan, ¶32, ¶52).  Maan does not explicitly state how this functionality is performed.  The techniques taught by Khayyat provide a means of generating term pairings, where the system is able to identify which term is the shorter term within each pairing.
Khayyat does not teach explicitly teach automatically analyzing the result table using cosine similarity to generate a similarity score … arranging to output indications of the similarity scores.  Maan teaches determining substitute words based on a mapping dictionary (Maan, ¶52) but does not state how the mapping dictionary is used to determine such substitute words.  
Kumaresan teaches automatically analyzing the result table using cosine similarity to generate a similarity score as the cosine similarity assigned to records within the dictionary (Kumaresan, ¶98 “a dictionary token may have a greater cosine similarity to another vector in the vector space”; ¶99 “the dictionary of word emending’s may assign records to the same group … similarity may be determined based on the Euclidean distance or cosine similarity between different embeddings”)
arranging to output indications (Kumaresan, ¶127 “the interactive interface may present links between different sets of clusters”) of the similarity scores as the links (Id) which include the cosine similarity (¶98 “cosine similarity”).
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the dictionary mapping within the proposed combination, using the word clustering techniques to build the dictionary.  The proposed combination yields the predictable results of providing a means of generating a dictionary that is capable of relating words based on similar meaning. 


With regard to claim 11, the proposed combination further teaches wherein the input data store as the word from the patent document that will be replaced with the generated result substitute word (Maan, Figure 5B, see 508; ¶49 “A text box 508 in the FIG. 5B illustrates the patent document text, such that, the identified one or more word-sequences have been replaced with associated substitute word-sequences”) is associated with a Hadoop big data Hive table (Khayyat, ¶71 “If the result does not fit in the memory of a single machine, the result is temporarily stored into Hadoop Distributed File System (HDFS)”).

With regard to claim 19, Maan teaches A non-transitory, computer-readable medium storing instructions, that, when executed by a processor, cause the processor to perform (Maan, ¶22 “The memory 112 may store instructions that, when executed b the processor 110, cause the processor to…”) an alphanumeric string similarity analysis method as the words in the patent (Maan, ¶28 “the array generating module 204 may generate an array that includes a plurality of tuples from the patent document text.  It may be appreciated by those skilled in the art that the tuples are a finite ordered sequence of elements, such as words”) implemented via a back-end application computer server (Maan, Figure 1, 102 “Computing Device” which is located behind the “network” 106; ¶59 “a general-purpose computer system, such as a personal computer (PC) or server computer”) , the method comprising: 
receiving, by a computer processor of the back-end application computer server (Maan, Figure 1, 102 “Computing Device” which is located behind the “network” 106) from the input data store as memory (Maan, Figure 7, 730), information about electronic records to be analyzed as patents (Maan, ¶25 “The data repository creating module 202 may receive a plurality of patent documents”), wherein each electronic record as a patent (Id) is associated with enterprise data as the patent is associated with chemical or electronical engineering data (Maan, ¶45 “by way of an example, for a chemical patent… in a patent from the field of electronical engineering”) and includes an electronic record identifier as each patent has an application number (Maan, ¶25 “The data repository creating module 202 may receive a plurality of patent documents”) and an alphanumeric string as generating an array of words from the patent document (Maan, ¶28 “the array generating module 204 may generate an array that includes a plurality of tuples from the patent document text.  It may be appreciated by those skilled in the art that the tuples are a finite ordered sequence of elements, such as words”); 
storing the alphanumeric strings in a single column as an array (Id); 
computing a length of each alphanumeric string as determining that a word sequence is long or short (Maan, ¶52 “an associated substitute short word-sequence may be determined for each of the one or more long word-sequences”) in the single column as the array of words (Maan, ¶28); 
…, with shorter strings (Maan, ¶52 “an associated substitute short word-sequence may be determined for each of the one or more long word-sequences”) …; 
automatically analyzing the result table using … to generate a similarity score as identifying pairings of words to substitute words (¶48 “for each of the identified one or more words sequences within the patent document text, a text box 506 shows a substitute word-sequence that would be used to replace an associated original word-sequence in the input patent document text”; Figure 5A, 506, see the pairing of “electronic message => message”, etc.)  for an alphanumeric string as the words in the patent (Maan, ¶28 “the array generating module 204 may generate an array that includes a plurality of tuples from the patent document text.  It may be appreciated by those skilled in the art that the tuples are a finite ordered sequence of elements, such as words”) …; and 
arranging to output indications of the similarity scores as the replacement of the word with the substitute word, indicates that the terms are similar (Maan, Figure 5B, see 508; ¶49 “A text box 508 in the FIG. 5B illustrates the patent document text, such that, the identified one or more word-sequences have been replaced with associated substitute word-sequences”).
Maan does not explicitly teach to constructing a two-column result table via a self-join on the single column, with [condition] being kept in a first column of the result table… in the first column and a corresponding string in a second column of the result table.
Khayyat teaches constructing a two-column result table as the pairs (Khayyat, ¶36 “returns a set of pairs {(si,sj)} where si takes more time than sj; the result is {(s2,si), (s2,s3), (s2,s4), (s1,s3), (s1,s4), (s4,s3)}) via a self-join on the single column as the query when t_id is the same, such as a single relation input (Khayyat, ¶34 “Qs: SELECT s1.t_id, s2.t_id”; ¶60 “(ii) apply Cartesian product (or self-Cartesian product for a single relation input) on the data blocks”), with [condition] as the time being more than, see the WHERE condition (Khayyat, ¶36 “returns a set of pairs {(si,sj)} where si takes more time than sj; the result is {(s2,si), (s2,s3), (s2,s4), (s1,s3), (s1,s4), (s4,s3)}) being kept in a first column of the result table as the left most result in each pair (Id) … in the first column as the left most result in each pair (Id) and a corresponding string in a second column of the result table as the right most result in each pair (Id).
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the device taught by Maan to determine candidate pairings between words using the techniques taught by Khayyat to yield the predictable results of generating a set of word sequences pairing relatively shorter words with relatively longer words within the text document.  Please note that within the proposed combination, Maan takes an array of terms (¶47, Figure 5, 504) and somehow generates a list of paired terms (Figure 5, 506), wherein the system is able to identify which term in the pairing is the shortest term (Maan, ¶32, ¶52).  Maan does not explicitly state how this functionality is performed.  The techniques taught by Khayyat provide a means of generating term pairings, where the system is able to identify which term is the shorter term within each pairing.
Khayyat does not teach explicitly teach automatically analyzing the result table using cosine similarity to generate a similarity score … arranging to output indications of the similarity scores.  Maan teaches determining substitute words based on a mapping dictionary (Maan, ¶52) but does not state how the mapping dictionary is used to determine such substitute words.  
Kumaresan teaches automatically analyzing the result table using cosine similarity to generate a similarity score as the cosine similarity assigned to records within the dictionary (Kumaresan, ¶98 “a dictionary token may have a greater cosine similarity to another vector in the vector space”; ¶99 “the dictionary of word emending’s may assign records to the same group … similarity may be determined based on the Euclidean distance or cosine similarity between different embeddings”)
arranging to output indications (Kumaresan, ¶127 “the interactive interface may present links between different sets of clusters”) of the similarity scores as the links (Id) which include the cosine similarity (¶98 “cosine similarity”).
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the dictionary mapping within the proposed combination, using the word clustering techniques to build the dictionary.  The proposed combination yields the predictable results of providing a means of generating a dictionary that is capable of relating words based on similar meaning.

Claims 7, 16, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Maan in view of Khayyat, Kumaresan, and O’krafka [11327966].

	With regard to claims 7 and 16, the proposed combination further teaches wherein the PySpark instruction (Khayyat, ¶60 “Spark SQL; ¶71 “a new Spark SQL physical join operator”.  One of ordinary skill in the art would recognize that PySpark was released to support Python API for Spark  The distinction between PySpark and Spark is merely the coding language in which the interface is written, and thus would be recognized as substantially equivalent by one of ordinary skill in the art) …
Khayyat does not explicitly teach that the PySpark instruction is associated with a user-defined function.  O’krafka teaches wherein the PySpark instruction (O’krafka, Column 12, lines 62-67 “SNE may be implemented as an “execution Fabric” in additional data analysis applications… and artificial intelligence (AI) and machine learning frameworks (e.g. PySpark… and may be integrated into other databases and frameworks”) is associated with a user-defined function (O’krafka, Column 13, lines 1-5 “Execution Fabrics architecture of an embodiment may invoke user defined functions (UDF) for fast native machine learning (ML) kernel invocations”).  It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the proposed device using UDF functions as such functions are known to be used within such frameworks and analysis applications, and would be expected to yield the predictable results of providing specially crafted user defined code to execute the necessary functionality.

With regard to claim 22 the proposed combination further teaches wherein at least one of the computation of the length of each alphanumeric string as determining that a word sequence is long or short (Maan, ¶52 “an associated substitute short word-sequence may be determined for each of the one or more long word-sequences”) in the single column as the array of words (Maan, ¶28) and the construction of the two-column result table as the pairs (Khayyat, ¶36 “returns a set of pairs {(si,sj)} where si takes more time than sj; the result is {(s2,si), (s2,s3), (s2,s4), (s1,s3), (s1,s4), (s4,s3)}) via a self-join on the single column as the query when t_id is the same, such as a single relation input (Khayyat, ¶34 “Qs: SELECT s1.t_id, s2.t_id”; ¶60 “(ii) apply Cartesian product (or self-Cartesian product for a single relation input) on the data blocks”) is associated with a PySpark (Khayyat, ¶60 “Spark SQL; ¶71 “a new Spark SQL physical join operator”.  One of ordinary skill in the art would recognize that PySpark was released to support Python API for Spark  The distinction between PySpark and Spark is merely the coding language in which the interface is written, and thus would be recognized as substantially equivalent by one of ordinary skill in the art) …
Khayyat does not explicitly teach that the PySpark instruction a user-defined function.  O’krafka teaches  a PySpark (O’krafka, Column 12, lines 62-67 “SNE may be implemented as an “execution Fabric” in additional data analysis applications… and artificial intelligence (AI) and machine learning frameworks (e.g. PySpark… and may be integrated into other databases and frameworks”) user-defined function (O’krafka, Column 13, lines 1-5 “Execution Fabrics architecture of an embodiment may invoke user defined functions (UDF) for fast native machine learning (ML) kernel invocations”).  It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the proposed device using UDF functions as such functions are known to be used within such frameworks and analysis applications, and would be expected to yield the predictable results of providing specially crafted user defined code to execute the necessary functionality.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMANDA WILLIS whose telephone number is (571)270-7691. The examiner can normally be reached Monday-Friday 8am-2pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara Kyle can be reached on 571-272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/AMANDA L WILLIS/Primary Examiner, Art Unit 2156