DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed on 11/27/2020 in parent Application No. 17/072,340, filed on 10/16/2020.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/05/2020. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
The independent claims 1, 11, and 15 recite:
receiving input text data comprising tokens of a plurality of related text documents;
processing the input text data to identify a corresponding plurality of sentences;
forming one or more topic groups, each comprising a subset of the plurality of sentences selected according to topic-related tokens within each sentence;
generating a machine summary of the related text documents by determining part-of-speech (POS) data for each token in the text, wherein the POS data represents a grammatical function of the token in its corresponding sentence, and whereby the tokens are substituted with token/POS pairs;
constructing, for each topic group, a graph data structure having a plurality of nodes and connecting edges, wherein each node represents a unique token/POS pair, and sequences of edges represent sequences of token/POS pairs comprising sentences of the corresponding topic group, generating, for each topic group, a plurality of ranked candidate summary sentences based upon subgraphs of the graph data structure having initial and final nodes corresponding with valid sentence start and end token/POS pairs, and composing the machine summary by selecting, for each topic group, at least one representative summary sentence from the ranked candidate summary sentences;
generating a natural-language summary corresponding with the machine summary by computing, for each topic group, numerical suitability measures that provide a comparison between the representative summary sentence and sentences of the corresponding topic group, and
composing the natural-language summary by selecting, for each topic group, a preferred summary sentence based on the corresponding numerical suitability measures; and
obtaining summarized text data of the input text data comprising the natural- language summary. 
The limitations of “receiving…”, “processing …”, “forming …”, “generating …”, “constructing…”, “generating…,” “computing…,” “composing…,” and “obtaining…” as drafted cover a human organizing of activities. 
More specifically, a human based on:
receiving data (i.e., a plurality of written texts consisting of a plurality of words); 
processing/analyzing said received data and identifying sentences;
forming/identifying topics from sentences in said received data;
generating a summary of said received data by identifying and labeling each word with it corresponding part-of-speech (POS) (i.e., noun, verb, etc) and re-writing on a piece of paper said data with words substituted with word/POS label;
organizing/drawing/writing a graphical structure or tree for different topics with multiple nodes or categories (i.e., words with corresponding POS labels), creating for each topic, multiple candidate sentences and ranking them considering the start/end of each sentence (i.e., first/last words/nodes), and writing down a summary by selecting a sentence associated to each topic;
calculating a similarity measure (i.e., how similar are the generated summary sentence compared to the topic);
writing down the summary by selecting preferred sentences based on the calculation of the similarity measure; and
receive further data (i.e., summarized written text of data initially received).

This judicial exception is not integrated into a practical application because for example: independent claims recite “machine summary”. As an example, in [0035] of the as filed specification, “Computing systems may include conventional personal computer architectures, or other general-purpose hardware platforms... the exemplary embodiments are described herein with illustrative reference to single-processor general-purpose computing platforms, commonly available operating system platforms, and/or widely available consumer products, such as desktop PCs, notebook or laptop PCs, smartphones, tablet computers, and so forth.” Therefore, a general-purpose computer or computing device is described and mainly used as an application thereof. Accordingly, these additional elements do not integrate the abstract idea into a practical idea because it does not impose any meaningful limits on practicing the abstract idea. 
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of using a computer is listed as a general computing device as noted. The claim is not patent eligible. 

With respect to claim 2, the claim recites:
wherein each graph data structure comprises positional reference information associated with each node, which tracks sentences of the corresponding topic group that contain the token/POS pair represented by the node, and the candidate summary sentences are ranked using a weighting that depends upon the positional reference information of the nodes comprising the corresponding subgraphs.

This relates to a human organizing of activities. This reads on a human labeling each node (i.e., word) with position information and ranking each sentence based on the position of each node/word. No additional limitations are present. 	
With respect to claim 3, the claim recites:
wherein, for each topic group, a single representative summary sentence is selected having a highest-ranking value of the candidate summary sentences.

This relates to a human organizing of activities. This reads on a human selecting a sentence with a highest calculated value. No additional limitations are present. 	
With respect to claim 4, the claim recites:
wherein the numerical suitability measure between the representative summary sentence and sentences of the corresponding topic group comprises a semantic distance measure computed using numerical vector representations of tokens generated using a machine learning training process configured to capture semantic content in a characteristic of the numerical vector representations.

This relates to a human organizing of activities. This reads on a human calculating a similarity measure (i.e., numerical suitability measure) between two sentences using a predetermined mathematical relationship (i.e., semantic distance) using a list (i.e., numerical vector) of words generated via a set of predetermined criteria (i.e., machine learning training process) to obtain the meaning of the words (i.e., semantic content). No additional limitations are present. 	
With respect to claim 5, the claim recites:
wherein the characteristic of the numerical vector representations is vector direction, and the semantic distance measure is cosine similarity.

This relates to a human organizing of activities. This reads on a human calculating a similarity measure (i.e., numerical suitability measure) between two sentences using a predetermined mathematical relationship (i.e., cosine similarity), where the list of words (i.e., vector) contains direction information. No additional limitations are present. 	
With respect to claim 6, the claim recites:
wherein the numerical suitability measure between the representative summary sentence and sentences of the corresponding topic group further comprises a vocabulary distance measure based upon the occurrence of shared and distinct tokens between sentences.

This relates to a human organizing of activities. This reads on a human calculating a similarity measure (i.e., numerical suitability measure) between two sentences using a predetermined mathematical relationship (i.e., vocabulary distance) based on repeated words in the sentences. No additional limitations are present. 	
With respect to claim 7, the claim recites:
wherein the vocabulary distance measure is Jaccard similarity.

This relates to a human organizing of activities. This reads on a human calculating a similarity measure (i.e., numerical suitability measure) between two sentences using a predetermined mathematical relationship (i.e., Jaccard similarity) No additional limitations are present. 	
With respect to claims 8 and 13, the claims recite:
wherein the numerical suitability measure is computed according to:.

    PNG
    media_image1.png
    49
    281
    media_image1.png
    Greyscale
where (s1, s2) represent first and second sentences,             
                σ
            
        _semantic(s1, s2) is the semantic distance measure, and             
                σ
            
         _vocabulary(s1, s2) is the vocabulary distance measure

This relates to a human organizing of activities. This reads on a human calculating a similarity measure (i.e., numerical suitability measure) between two sentences using predetermined mathematical relationships (i.e., combination of semantic distance and vocabulary distance measures). No additional limitations are present. 	
With respect to claim 9, the claim recites:
wherein, for each topic group, the preferred summary sentence is identified having a largest value of the numerical compatibility from all compared sentences of the corresponding topic group.

This relates to a human organizing of activities. This reads on a human selecting a sentence with a highest calculated value.  No additional limitations are present. 	
With respect to claims 10 and 14, the claims recite:
performing a sentiment analysis on sentences within each one of the plurality of related text documents of the input text data, and discarding sentences and/or text documents that do not satisfy a prescribed sentiment criterion.

This relates to a human organizing of activities. This reads on a human interpreting the sentiment/intent/meaning of the sentences and disregarding sentences that do not meet expected/predefined criterion. No additional limitations are present. 	
With respect to claim 12, the claim recites:
wherein the numerical suitability measure between the representative summary sentence and sentences of the corresponding topic group comprises:
a semantic distance measure computed using numerical vector representations of tokens generated using a machine learning training process configured to capture semantic content in a characteristic of the numerical vector representations; and
a vocabulary distance measure based upon the occurrence of shared and distinct tokens between sentences.
This relates to a human organizing of activities. This reads on a human calculating a similarity measure (i.e., numerical suitability measure) between two sentences using a predetermined mathematical relationship (i.e., semantic distance) using a list (i.e., numerical vector) of words generated via a set of predetermined criteria (i.e., machine learning training process) to obtain the meaning of the words (i.e., semantic content); and calculating a similarity measure (i.e., numerical suitability measure) between two sentences using a predetermined mathematical relationship (i.e., vocabulary distance) based on repeated words in the sentences. No additional limitations are present. 	

Claim 15 is also drawn to a “signal” per se as recited in the preamble and as such is non-statutory subject matter. On paragraph [0018], [0041-0042] of the as filed Specification, the term “computer-readable medium" is not defined as to what the scope of the term is meant to encompass. Hence, one of ordinary skilled in the art can interpret such term to include transitory signals and non-transitory signals. It does not appear that a claim reciting a signal encoded with functional descriptive material falls within any of the categories of patentable subject matter set forth in § 101. First, a claimed signal is clearly not a "process" under § 101 because it is not a series of steps. The other three § 101 classes of machine, compositions of matter and manufactures "relate to structural entities and can be grouped as ‘product’ claims in order to contrast them with process claims." 1 D. Chisum, Patents § 1.02 (1994).

The Applicant’s Specification presents a broad definition as to what the “computer-readable medium” (or “computer readable medium”) covers and is being made to include transitory and non-transitory signals. The Applicant’s as filed Specification in paragraph [0018], [0041-0042], refers to the “computer-readable medium” (or “computer readable medium”) and to a “computer readable storage media”. In these paragraphs, there are non-limiting words for the “computer-readable medium” (or “computer readable medium”) such as “other types”. Hence, it appears that the claims appear to be drawn towards transitory signals, which is not subject matter eligible. In order to overcome the present rejection, the Applicant is advised to amend the claims by using the following terminology: "non-transitory machine-readable storage medium." Such example terminology has been also found in the Official Gazette 1851 OG 212.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 9 recites the limitation " the numerical compatibility ". There is insufficient antecedent basis for this limitation in the claim. It is unclear whether this term is intended to refer to “the numerical suitability measure” as disclosed in previous claims (including claim 8). 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 11, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Haslam et al. (US 20080154848 A1) and further in view of Delgo et al. (US 20180232443 A1) and Srinivasan et al. (US 20190197184 A1). 

As to independent claim 1, Haslam et al. teaches:
computer-implemented method for summarising text (see ¶ [0043]: “… The embodiment allows for rich reports, and may display reports of content analysis and comparison [i.e., text summary]. The embodiment may be used to automatically carry out a prior art search on one or many (e.g. thousands) of patents within a patent portfolio.”), comprising:
receiving input text data comprising tokens of a plurality of related text documents (see ¶ [0043]: “One embodiment of the invention includes software for automated search, analysis and comparison of text content. … The embodiment allows for rich reports, and may display reports of content analysis and comparison. The embodiment may be used to automatically carry out a prior art search on one or many (e.g. thousands) of patents within a patent portfolio.”);
processing the input text data to identify a corresponding plurality of sentences (see ¶ [0052-0053 and 0059]: “[0052] … At starting step A 100, a user or another application has identified a subject patent identifier for analysis. At step 102, software receives the subject patent identifier, and in step 104 the application obtains text associated with the subject patent identifier. The subject patent text retrieved in step 104 may comprise some or all of the subject patent specification and subject patent claims. … [0053] In step 108, software parses the subject patent text. If the current subject content comes from a content database, it may already be broken into appropriate fields (e.g. Title, Abstract, Authors, etc). However, if a raw document associated with the current subject content is retrieved, the text content may have to be parsed and broken into corresponding data structures. [0059] One embodiment of the invention includes software for automated search, analysis and comparison of text content.… . The comparison and scoring component 410 may also use the natural language processing component 408 to identify strings or phrases in one text segment that are similar in meaning to strings or phrases in another text segment. 
(i.e., sentences = claims; i.e., tokens = words (e.g., words in claims; claim strings); i.e., input text data = subject/reference patent contents)”);
forming one or more topic groups, each comprising a subset of the plurality of sentences selected according to topic-related tokens within each sentence (see ¶ [0072] and [0123]: “[0072]: The natural language processing system, such as the one described by U.S. Pat. No. 6,871,174, allows lexical and semantic information to be extracted about terms and phrases inside text. This information may then be used to compare text in subject content and reference content for similarity in meaning.” “[0123]: When dividing a claim [i.e., sentence(s), topic group(s)] into claim strings [i.e., tokens], one technique is to parse the claims by punctuation to find groups of words that make up a limitation or element [i.e., topic-related tokens]. For example, semi-colons, commas, colons and periods may be used as separators of claim strings. More refined methods may consider parts of speech in order to determine the end of clauses, and may further employ natural language processing tools to determine distinct claim strings.”);
generating a machine summary of the related text documents by determining part-of-speech (POS) data for each token in the text (see  Figure 14 [e.g., “opening a file.” [Wingdings font/0xE0] “open__Verb Tobj file__Noun”] and ¶  [0053]:  “…Criteria by which file history strings are associated with claim strings may include finding one or more matching terms in the strings being compared, finding one or more terms that have a certain part of speech (e.g. noun, verb, etc) within strings being compared, or having a similarity score higher than a threshold value, or using any similar text matching algorithms implemented by natural language processing systems.”), 
wherein the POS data represents a grammatical function of the token in its corresponding sentence, and whereby the tokens are substituted with token/POS pairs (see  Figure 14 [e.g., “opening a file.” [Wingdings font/0xE0] “open__Verb Tobj file__Noun”] and ¶  [0053] citations as above and [0073]:  “…In particular, the natural language processing tool can break text into strings and output logical relations associated with those strings. In the example shown in output 1400, input string 1402 [“Opening a file.”] has one logical relation 1404 [“@@ open__Verb Tobj file__Noun”] associated with it.”);
generating a natural-language summary corresponding with the machine summary by computing, for each topic group, numerical suitability measures that provide a comparison between the representative summary sentence and sentences of the corresponding topic group (see ¶ [0059] … The content analysis software 401 may also contain a natural language processing component 408. The natural language processing component 408 may ascertain information about content text, so that a similarity in meaning between text segments may be ascertained.”; ¶ [0072] “The natural language processing system, such as the one described by U.S. Pat. No. 6,871,174, allows lexical and semantic information to be extracted about terms and phrases inside text. This information may then be used to compare text in subject content and reference content for similarity [i.e., comparison between the representative summary sentence (text in reference content) and sentences of the corresponding topic group (text in subject content)] in meaning.”; Figure 22 and ¶ [0110] “FIG. 22 illustrates a sample output report showing a limitation mapping chart, similar in nature to the kind of chart that might appear in a Markman hearing, and reports of this type may be automatically generated from an embodiment of content analysis software. Referring to FIG. 22, the chart report 2200 includes one or more claims from subject patent content starting in column 2201, and similar strings found from the subject patent specification and/or the subject patent prosecution history, displayed in column 2207.”; Figure 17 and ¶ [0090] “Still referring to FIG. 17, step 1712 indicates a calculation of the logical claim string score between the text from the current claim string retrieved from step 1708 and the reference text obtained from the reference that was retrieved in step 1710. The logical claim string score [i.e., numerical suitability measure] calculated in step 1712 may be an indication of how well the claim string retrieved form step 1708 reads on the text retrieved from step 1710.”), and
composing the natural-language summary by selecting, for each topic group, a preferred summary sentence based on the corresponding numerical suitability measures (see Figures 17 (method of finding references similar to subject content for each claim; i.e., the process of composing/selecting summary for each topic group (i.e., each topic group) based on a score/similarity) and 25 (sample output; i.e., composed summary) and ¶ [0091] and [0114]: “[0091] Referring still to FIG. 17, conditional step 1714 tests if the temporary score is greater than csScore, and if so, the program executes step 1716 which stores the claim score, claim text, reference identifier and similar reference string(s). […] [0114]…Column 2554 may contain strings from the reference item 2512 determined to be similar to (based on logical or centroid scores) each corresponding claim string listed in column 2550.”); and 
obtaining summarized text data of the input text data comprising the natural- language summary (see Figure 25 and ¶ [0114] citation as in previous limitation and Figures 22-24 for more summarized text data (i.e., spec., claims text that best match the subject patent claim strings/limitations (topics)).).

However, Haslam et al. does not explicitly teach, but Delgo et al. does teach:
constructing, for each topic group, a graph data structure having a plurality of nodes and connecting edges (see Figure 4 (text mapped to an ontology (i.e., graph data structure)), 
wherein each node represents a unique token/POS pair, and sequences of edges represent sequences of token/POS pairs comprising sentences of the corresponding topic group (see Figure 4 (text mapped to an ontology (i.e., graph data structure)) and ¶ [0081]: “…FIG. 4 shows an illustration of a process of mapping text to the ontology. In step 1, named entities are identified and are associated with their entries in the knowledge graph, and their type in the ontology. In this case ‘Austin, Tex.’ was identified as an entity with a type of “City”. In step 2, Part-of-speech tags and a dependency tree parse of the sentence are obtained. In step 3, the named entity is associated with a node in the dependency parse—in this case the node would be the one corresponding to the token “Austin,” with a POS tag of “NNP”.”
Here, considering topic group (i.e., each claim as taught in Haslam et al.), being a single sentence, the graph data structure corresponds to a single sentence.), 
Haslam et al. and Delgo et al. are both considered to be analogous to the claimed invention because they are in the same field of endeavor in document (i.e., text) search and retrieval/match of data (i.e., text, information). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Haslam et al.  to incorporate the teachings of Delgo et al. of constructing, for each topic group, a graph data structure having a plurality of nodes and connecting edges, wherein each node represents a unique token/POS pair, and sequences of edges represent sequences of token/POS pairs comprising sentences of the corresponding topic group which provides the benefit of allowing the system to perform automated matching ([0006] of Delgo et al.).

However, Haslam et al. in combination with Delgo et al. do not explicitly teach, but Srinivasan et al. does teach:
generating, for each topic group, a plurality of ranked candidate summary sentences based upon subgraphs of the graph data structure having initial and final nodes corresponding with valid sentence start and end token/POS pairs (see ¶ [0054]: “At block 410, the sentences from the selection graph are parsed into word tokens. At block 412, the word tokens are mapped to a compression graph (i.e., subgraph). In embodiments, mapped tokenized words are represented using nodular notation, in which each word is represented by a node on the compression graph, and each sentence represents a directed path in the compression graph. Specialized nodes serve as start-of-sentence [i.e., initial node(s)] and end-of-sentence nodes [i.e., final node(s)]. […] At block 416, the mapped word tokens are compressed into candidate sentences to be used in a final content [i.e., generation of ranked candidate sentences]. Such candidate sentence generation can iteratively repeat until all relevant information mapped to the selection graph is exhausted.”), and 
composing the machine summary by selecting, for each topic group, at least one representative summary sentence from the ranked candidate summary sentences (see ¶ [0054]: “[0054] Referring now to FIG. 4, a flow chart diagram is illustrated showing an exemplary method 400 generating candidate sentences [i.e., topic group(s)] using multi-sentence compression, in accordance with embodiments of the present invention. In embodiments, the method of 400 is performed by a content generation engine, such as content generation engine 200 of FIG. 2. Initially, and as indicated at block 402, source content from a corpus is retrieved. At block 404, the source content is parsed into sentences. Referring to block 406, the sentences are then mapped to a selection graph. As described, the mapped sentence tokens are mapped in nodular notation, wherein each node represents a single sentence. At block 408, the mapped sentences are assigned an initial reward (i.e., ranking the sentences/topic group(s)) and an edge weight. In embodiments, the nodes mapped to the selection graph are weighted based on their similarity to the query, and the edges between each pair of nodes are assigned an edge weight based on their information overlap. […] A single node mapped to the compression graph can represent all occurrences of a word within the same POS tag. Referring to block, 414, an edge weight between each pair of word-nodes is assigned. The edge weight can represent the number of times the ordered combination of those node-words occurs across all sentences within set S. The shortest paths (normalized by path length) are identified and the top K generated sentences are used for further processing.”);
Haslam et al. in combination with Delgo et al. and Srinivasan et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in data processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Haslam et al. in combination with Delgo et al.  to incorporate the teachings of Srinivasan et al.  of generating, for each topic group, a plurality of ranked candidate summary sentences based upon subgraphs of the graph data structure having initial and final nodes corresponding with valid sentence start and end token/POS pairs, and composing the machine summary by selecting, for each topic group, at least one representative summary sentence from the ranked candidate summary sentences which provides the benefit of improving the coherence of the final content ([0003] of Srinivasan et al.).

As to independent claim 11, Haslam et al. in combination with Delgo et al. and Srinivasan et al.  teach all of the limitations as in claim 1.
Haslam et al. further teaches:
A computing system for summarising text comprising:
a processor (see ¶ [0058]: “Referring to FIG. 4, an overview of a system for search, analysis and comparison of content is shown. Computer 400 runs content analysis software 401. Content analysis software 401 comprises user interface 402 for allowing users to enter input, such as subject document identifiers for references to be searched, analyzed or compared, and/or directories where content may be found, analyzed or compared.”);
at least one memory device accessible by the processor (see ¶ [0058] citation as in limitation above); and
at least one text data source, wherein the memory device contains a body of program instructions (see ¶ [0058] citation as in limitation above and [0054]: “… The user of computer program”) which, when executed by the processor, cause the computing system to:
[perform the limitations as in claim 1].
As to independent claim 15, Haslam et al. in combination with Delgo et al. and Srinivasan et al.  teach all of the limitations as in claim 1.
Haslam et al. further teaches:
A computer program product comprising a tangible computer-readable medium having instructions stored thereon (see ¶ [0054]: “… The user of computer program may specify the subject content strings by specifying the location of the subject content, giving the program an identifier by which the program may find the content (e.g. a patent document number), or by directly typing or providing subject content text.”) which, when executed by a processor configure the processor to:
[perform the limitations as in claim 1].

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Haslam et al. (US 20080154848 A1) and further in view of Delgo et al. (US 20180232443 A1) and Srinivasan et al. (US 20190197184 A1) as applied to claim 1, above and further in view of Newman (US 20040117736 A1). 

Regarding claim 2, Haslam et al. in combination with Delgo et al. and Srinivasan et al. teach all of the limitations as in claim 1, above.

However, Haslam et al. in combination with Delgo et al. and Srinivasan et al. do not explicitly teach, but Newman does teach:
wherein each graph data structure comprises positional reference information associated with each node, which tracks sentences of the corresponding topic group (see ¶ [0057]: “After sentences [i.e., topic group(s)] are extracted to represent a node they are sorted [i.e., weighted] based on their position [i.e., positional reference information] in the text associated with a node, duplicates are eliminated, and the remaining sentences are concatenated into a string representing the node in the overview, with intervening gap indicators such as ellipsis symbols " . . . " to represent sentences in the text not included in the overview. These strings are then arranged in the tree order of the cluster to form the cluster overview, using either a reduced width tree, such as shown in FIGS. 1 and 2, representing a completed overview for a discussion, or an indented tree, such as shown in FIG. 3, representing the part of the summary (see below) associated with group 3 of the same discussion.”
i.e., token/POS pairs as already taught by Haslam et al. as discussed in claim 1).
Haslam et al. in combination with Delgo et al. and Srinivasan et al.  and Newman are considered to be analogous to the claimed invention because they are in the same field of endeavor in data processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Haslam et al. in combination with Delgo et al. and Srinivasan et al.  to incorporate the teachings of Newman of wherein each graph data structure comprises positional reference information associated with each node, which tracks sentences of the corresponding topic group represented by the node, and the candidate summary sentences are ranked using a weighting that depends upon the positional reference information of the nodes comprising the corresponding subgraphs which provides the benefit of permitting significant text to be presented for each node ([0029] of Newman)

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Haslam et al. (US 20080154848 A1) and further in view of Delgo et al. (US 20180232443 A1) and Srinivasan et al. (US 20190197184 A1) as applied to claim 1, above and further in view of Guo et al. (US 20020052901 A1). 

Regarding claim 3, Haslam et al. in combination with Delgo et al. and Srinivasan et al. teach all of the limitations as in claim 1, above.

However, Haslam et al. in combination with Delgo et al. and Srinivasan et al. do not explicitly teach, but Guo et al. does teach:
wherein, for each topic group, a single representative summary sentence is selected having a highest-ranking value of the candidate summary sentences (see ¶ [0016]: “Outputting the top-ranked sentences [i.e., topic group(s)] as the summary of the set of documents, the top-ranked words as keywords list of the set of documents.”).
Haslam et al. in combination with Delgo et al. and Srinivasan et al.  and Guo et al. are both considered to be analogous to the claimed invention because they are in the same field of endeavor in data processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Haslam et al. in combination with Delgo et al. and Srinivasan et al. to incorporate the teachings of Guo et al. of wherein, for each topic group, a single representative summary sentence is selected having a highest-ranking value of the candidate summary sentences which provides the benefit of not only generating summaries for respective documents, but also generating a comprehensive prompt for the important ideas of the documents ([0009] of Guo et al.).

Claims 4 and 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over Haslam et al. (US 20080154848 A1) and further in view of Delgo et al. (US 20180232443 A1) and Srinivasan et al. (US 20190197184 A1) as applied to claim 1 and 11, above and further in view of Mansour et al. (US 20160232157 A1). 

Regarding claim 4, Haslam et al. in combination with Delgo et al. and Srinivasan et al. teach all of the limitations as in claim 1, above.

However, Haslam et al. in combination with Delgo et al. and Srinivasan et al. do not explicitly teach, but Mansour et al. does teach:
wherein the numerical suitability measure between the representative summary sentence and sentences of the corresponding topic group comprises a semantic distance measure computed using numerical vector representations of tokens generated using a machine learning training process configured to capture semantic content in a characteristic of the numerical vector representations (see ¶ [0036-0037, 0040] and Figure 5: “[0036] A user node topic aggregator takes as input the topics computed for a plurality of documents authored or consumed by the user. The user node topic aggregator may merge, aggregate or select from the topics of the documents authored or consumed by the user [i.e., sentence(s) of the topic group]. In this way the number of topics annotating a node representing a given person is prevented from becoming overly large. For example, topics most frequently identified with respect to documents authored and/or consumed by the person are selected. This process favors overlapping topics between multiple documents. For the rest of non-overlapping documents, semantic similarity distance measures like Jensen-Shannon divergence, cosine similarity or distance between embeddings of the topics [i.e., sentence(s) in the non-overlapping documents being compared/matched/identified based on the topics of the documents authored or consumed by the user (i.e., sentence(s( of the topic group(s))] may be used. [0037] FIG. 5 is a schematic diagram of a method of building a semantic interpreter comprising a weighted inverted index, and of using the semantic interpreter to compute topics from input text. […] [0040] Features of the observed n-grams may be found such as location within the document (such as whether the n-gram is in the title, summary, abstract, conclusion, or in the body of the document), whether capitalization is used, the length of the n-gram in terms of number of characters, or other features. These features, together with the observed n-gram, are input to a trained classifier which classifies the n-grams as being topics or not topics. For example, the classifier may be a neural network [i.e., machine learning process (used to compute 506, below)], a support vector machine, a random decision forest, or other classifier which has been trained to classify n-grams as being topics or not. Figure 5: 506: Weighted Inverted Index [i.e., contains the numerical vector representations (i.e., column of words with associated weighted vector of topics)], 508: Input text [i.e., sentences of corresponding topic group], 510: Semantic Interpreter [i.e., compares text from 506 (weighted inverted index) and 508 (input text)]” [0042]: “FIG. 5 the weighted inverted index 506 is depicted as a column of words, each word having an associated weighted vector (or list) of topics (depicted as nodes connected by lines and shown for only one cell rather than each cell, for clarity). That is, the columns are collapsed into a single column.” [0043] “… The semantic interpreter 510 extracts a word from the input text 508, looks up the word in memory storing the weighted inverted index 506 and finds the weighted vector of topics stored with the indexed word in the memory.”).
Haslam et al. in combination with Delgo et al. and Srinivasan et al.  and Mansour et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in data processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Haslam et al. in combination with Delgo et al. and Srinivasan et al.  to incorporate the teachings of Mansour et al. of wherein the numerical suitability measure between the representative summary sentence and sentences of the corresponding topic group comprises a semantic distance measure computed using numerical vector representations of tokens generated using a machine learning training process configured to capture semantic content in a characteristic of the numerical vector representations which provides the benefit of improving relevance of the output, and enabling more efficient and robust operation for huge document repositories ([0018] of Mansour et al.).
Regarding claim 6, Haslam et al. in combination with Delgo et al., Srinivasan et al. and Mansour et al. teach all of the limitations as in claim 4, above.

Delgo et al. further teaches:
wherein the numerical suitability measure between the representative summary sentence and sentences of the corresponding topic group further comprises a vocabulary distance measure based upon the occurrence of shared and distinct tokens between sentences (see ¶ [0059]: “3. In both cases, the appropriate k-shingles (character level or word level) are extracted from the user provided text, and are then used to query the appropriate index. Items in the index corresponding to named entity surface forms may be evaluated based on their Jaccard set similarity distance to the k-shingles (i.e., word-level - vocabulary distance measure) extracted from the user text. All matches above a certain similarity threshold T are then returned as candidates, along with the similarity score that was obtained for them.”).
Haslam et al. in combination with Delgo et al., Srinivasan et al., and Mansour et al., are considered to be analogous to the claimed invention because they are in the same field of endeavor in data processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Haslam et al. in combination with Delgo et al., Srinivasan et al., and Mansour et al. to further incorporate the teachings of Delgo et al. of wherein the numerical suitability measure between the representative summary sentence and sentences of the corresponding topic group further comprises a vocabulary distance measure based upon the occurrence of shared and distinct tokens between sentences which provides the benefit of allowing the system to perform automated matching ([0006] of Delgo et al.).

Regarding claim 7, Haslam et al. in combination with Delgo et al., Srinivasan et al., and Mansour et al., teach all of the limitations as in claim 6, above.
Delgo et al. further teaches:
wherein the vocabulary distance measure is Jaccard similarity (see ¶ [0059] as in claim 6, above: “named entity surface forms may be evaluated based on their Jaccard set similarity distance to the k-shingles (i.e., word-level - vocabulary distance measure) extracted from the user text.”).
Haslam et al. in combination with Delgo et al., Srinivasan et al., and Mansour et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in data processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Haslam et al. in combination with Delgo et al., Srinivasan et al., and Mansour et al. to further incorporate the teachings of Delgo et al. of wherein the vocabulary distance measure is Jaccard similarity which provides the benefit of allowing the system to perform automated matching ([0006] of Delgo et al.).

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Haslam et al. (US 20080154848 A1) and further in view of Delgo et al. (US 20180232443 A1) and Srinivasan et al. (US 20190197184 A1) and Mansour et al. (US 20160232157 A1) as applied to claim 4, above and further in view of Cheng et al. (CN 107729403 A). 
Regarding claim 5, Haslam et al. in combination with Delgo et al., Srinivasan et al. and Mansour et al. teach all of the limitations as in claim 4, above.

Mansour et al. further teaches:
the semantic distance measure is cosine similarity (see ¶ [0036] citation as in claim 4: “[0036] …For the rest of non-overlapping documents, semantic similarity distance measures like Jensen-Shannon divergence, cosine similarity or distance between embeddings of the topics may be used.”).
Haslam et al. in combination with Delgo et al. and Srinivasan et al.  and Mansour et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in data processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Haslam et al. in combination with Delgo et al. and Srinivasan et al.  to incorporate the teachings of Mansour et al. of the semantic distance measure is cosine similarity which provides the benefit of improving relevance of the output, and enabling more efficient and robust operation for huge document repositories ([0018] of Mansour et al.).

However, Haslam et al. in combination with Delgo et al., Srinivasan et al. and Mansour et al. do not explicitly teach, but Cheng et al. does teach:
wherein the characteristic of the numerical vector representations is vector direction (see ¶ 9 of page 3: In the embodiment of the invention using a word vector reflecting the association relation in the semantic space, the direction of vector expressing specific semantics, the distance between vector reflects the association between vocabularies, using angle cosine value of the two vectors is measured vector distance, the larger the cosine distance is near, the vocabulary correlation degree…”)
Haslam et al. in combination with Delgo et al., Srinivasan et al. and Mansour et al. and Cheng et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in data processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Haslam et al. in combination with Delgo et al., Srinivasan et al. and Mansour et al. to incorporate the teachings of Cheng et al. of wherein the characteristic of the numerical vector representations is vector direction which provides the benefit of improving the identification, classification ability to complex semantic text information (¶ 6 of page 2 of Cheng et al.).

Claim 10 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Haslam et al. (US 20080154848 A1) and further in view of Delgo et al. (US 20180232443 A1) and Srinivasan et al. (US 20190197184 A1) as applied to claims 1 and 11, above, and further in view of Levanon et al. (US 10410224 B1).

Regarding claims 10 and 14, Haslam et al. in combination with Delgo et al., and Srinivasan et al., teach all of the limitations as in claims 1 and 11, above.

However, Haslam et al. in combination with Delgo et al. and Srinivasan et al. do not explicitly teach, but Levanon et al. does teach:
further comprising performing a sentiment analysis on sentences within each one of the plurality of related text documents of the input text data, and discarding sentences and/or text documents that do not satisfy a prescribed sentiment criterion (see Col. 40, lines 14-32: “(179) …Those sentences determined to express a neutral sentiment, i.e. those sentences with a non-neutral sentiment score below a particular threshold, may then be discarded from the collection of sentences.”). 
Haslam et al. in combination with Delgo et al. and Srinivasan et al.  and Levanon et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in data processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Haslam et al. in combination with Delgo et al. and Srinivasan et al. to incorporate the teachings of Levanon et al. of performing a sentiment analysis on sentences within each one of the plurality of related text documents of the input text data, and discarding sentences and/or text documents that do not satisfy a prescribed sentiment criterion which provides the benefit of  improving the accuracy of the classification process (Col. 40, lines12-13 of Levanon et al.).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Haslam et al. (US 20080154848 A1) and further in view of Delgo et al. (US 20180232443 A1) and Srinivasan et al. (US 20190197184 A1) as applied to claims 1 and 11, above, and further in view of Kogilavani et al. (Kogilavani, S. V., R. Thangarajan, and CS Kanimozhiselvi3 Dr S. Malliga. "Detecting Paraphrases in Tamil Language Sentences." (2017).; https://www.irjet.net/archives/V4/i5/IRJET-V4I588.pdf) and Cheng et al. (CN 107729403 A). 

Regarding claim 12, Haslam et al. in combination with Delgo et al., and Srinivasan et al., teach all of the limitations as in claim 11, above.

However, Haslam et al. in combination with Delgo et al. and Srinivasan et al. do not explicitly teach, but Kogilavani et al. does teach:
wherein the numerical suitability measure between the representative summary sentence and sentences of the corresponding topic group (see ¶ 3. Related work: “In the proposed method, the statistical and the semantic analysis is used to determine the paraphrases. In this system, the statistical analysis [i.e., numerical suitability measure] is based on word set [i.e., word sets (vocabulary); Jaccard similarity], word vector, word order [i.e., semantic; Word Order similarity], and word distance and semantic analysis is based on word order between sentences”) comprises:
a semantic distance measure computed using numerical vector representations of tokens generated see ¶ 3. Related work: “…semantic analysis is based on word order between sentences…” and 3.5 Word Order Similarity: “Sentence similarity based on the word order requires constructing the order vectors of the two sentences. If the sentence Sa has words (wa1,wa2,......,wai) and sentence Sb has words  (wb1,wb2,......,wbi) [i.e., vector representation of tokens] then word order vectors for Sa and Sb are represented as follows. […] Then the similarity between Sa and Sb can be calculated based on the orders of words”); and
a vocabulary distance measure based upon the occurrence of shared and distinct tokens between sentences (see ¶ 3.1 Jaccard similarity Measure: “Jaccard similarity is a word set based measure in which the word sets of the two sentences are taken into account for similarity calculation..”).

Haslam et al. in combination with Delgo et al. and Srinivasan et al. and Kogilavani  et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in data processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Haslam et al. in combination with Delgo et al. and Srinivasan et al.  to incorporate the teachings of Kogilavani et al. of wherein the numerical suitability measure between the representative summary sentence and sentences of the corresponding topic group comprises: a semantic distance measure computed using numerical vector representations of tokens in a characteristic of the numerical vector representations; and a vocabulary distance measure based upon the occurrence of shared and distinct tokens between sentences. which provides the benefit of outperforming methods to identify paraphrases (Conclusion of Kogilavani et al.).  

However, Haslam et al. in combination with Delgo et al. and Srinivasan et al.  and Kogilavani  et al. do not explicitly teach, but Cheng et al. does teach:
using a machine learning training process configured to capture semantic content (see ¶ 9 of page 3: “…certain linear correlation exists between any two word vectors in the subsequent neural network model training can automatically find learning and extracting such features. In the embodiment of the invention using a word vector reflecting the association relation in the semantic space, the direction of vector expressing specific semantics, the distance between vector reflects the association between vocabularies, using angle cosine value of the two vectors is measured vector distance, the larger the cosine distance is near, the vocabulary correlation degree...”).

Haslam et al. in combination with Delgo et al. and Srinivasan et al. and Kogilavani  et al. and Cheng et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in data processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Haslam et al. in combination with Delgo et al. and Srinivasan et al.  and Kogilavani  et al.  to incorporate the teachings of Cheng et al. of using a machine learning training process configured to capture semantic content which provides the benefit of improving the identification, classification ability to complex semantic text information (¶ 6 of page 2 of Cheng et al.).

Allowable Subject Matter
Claims 8 and 13 would be allowable if rewritten to overcome the rejection under 35 U.S.C. 101 set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
The following is a statement of reason for the indication of allowable subject matter:
Haslam et al. in combination with Delgo et al., Srinivasan et al., Mansour et al., and Kogilavani et al. teach of all the limitations as in claim 6 and 12, above. 
However, with respect to claim 8, Haslam et al. in combination with Delgo et al., Srinivasan et al., Mansour et al., and Kogilavani et al. fail to teach:
“wherein the numerical suitability measure is computed according to:.

    PNG
    media_image1.png
    49
    281
    media_image1.png
    Greyscale

where (s1, s2) represent first and second sentences,             
                σ
            
        _semantic(s1, s2) is the semantic distance measure, and             
                σ
            
         _vocabulary(s1, s2) is the vocabulary distance measure”. 
Claim 9 would be allowable because it is dependent on claim 8.

Claim 9 would be allowable if rewritten to overcome the rejection under 35 U.S.C. 101 and 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Keisha Y Castillo-Torres whose telephone number is (571)272-3975. The examiner can normally be reached Monday - Friday, 9:00 am - 4:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on (571)272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Keisha Y. Castillo-Torres
Examiner
Art Unit 2659



/Keisha Y. Castillo-Torres/Examiner, Art Unit 2659                                                                                                                                                                                                        
/Paras D Shah/Primary Examiner, Art Unit 2659                                                                                                                                                                                                        
08/09/2022