DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of the Claims
Claims 1-17 and 21-23 are pending of which claims 1, 11 and 21 are in independent form.  Claims 1-17 and 21-23 are rejected under 35 U.S.C. 103.

Response to Claim Amendments and Arguments
The claim amendments and arguments filed on 11 May 2021 as part of a Request for Continued Examination as they apply to the 35 U.S.C. rejections of the claims have been fully considered.  On pages 14-16 of the remarks Applicant’s representative appears to argue that the Stanton reference fails to disclose the newly amended independent claim limitations in three areas.  First, Stanton fails to disclose, …analyzing…the plurality of terms to determine a statistically improbable phrase (SIP) value for each term by comparing a frequency occurrence of the term in the first plurality of multiple electronic text documents with a corpus frequency occurrence of the term in a text corpus, wherein the first plurality of multiple electronic text documents is from a separate data source than the text corpus…  Rather, Applicant’s representative argues, Stanton discloses comparing a phrase in a single text document to other related texts in a database of documents.

Third, Applicant’s representative argues that the claim limitation specifying …the first plurality of multiple electronic text documents is from a separate source than the text corpus is unlike machine learning approaches that use training and testing data as both the training and testing data typically come from the training set.
Examiner has applied a new reference to address the claims as amended detailed in the rejection below.
 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-9, 11-13 and 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Stanton et al. U.S. Pub. No. 2009/0157714 (hereinafter “Stanton”) in view of Jesensky et al. U.S. Patent No. 9,298,700 (hereinafter “Jesensky”) in view of Marvit et al. U.S. Pub. No. 2009/0094233 (hereinafter “Marvit”) in further view of Evans et al. U.S. Pub. No. 2009/0287668 (hereinafter “Evans”).
Regarding independent claim 1, Stanton discloses:
accessing a first plurality of multiple electronic text documents comprising a plurality of terms (Stanton in the Abstract discloses in part, “A system and method are provided for analyzing elements of text for comparative purposes. Text is provided to the system in an electronic format readable by the system.”  Additionally, Stanton at paragraph [0023] discloses comparing data from one text with a database of reference texts [i.e., a first plurality of multiple electronic text documents].  Lastly, Stanton at paragraph [0067] discloses in part, “system 10…measuring the frequency and occurrence patterns of individual words and combinations of words in texts [i.e., plurality of terms]…”)

While Stanton in paragraph [0068] discloses analyzing a text compared to a plurality of texts using a SIP value, Stanton does not disclose:
analyzing, by at least one processor, the plurality of terms to determine a statistically improbable phrase (SIP) value for each term by comparing a frequency occurrence of the term in the first plurality of multiple electronic text documents with a corpus frequency occurrence of the term in a text corpus, wherein the first plurality of multiple electronic text documents is from a separate data source than the text corpus.
In other words, while Stanton at paragraph [0068] discloses in part, “Statistically Improbable Phrases is another manner in which the system 10 may identify a text's subject matters and attempt to predict the level of a user's interest in those subject matters. Statistically Improbable Phrases are phrases which occur frequently in a given text [i.e., text corpus], but do  a frequency occurrence of the term in the plurality of electronic text documents].” And Stanton at paragraph [0023] discloses comparing data from a text with a database or reference texts, Stanton does not disclose comparing a frequency of the term in the first plurality of multiple electronic text documents with a corpus frequency where the multiple electronic text documents is from a separate source than the text corpus.
However Jesensky at Column 3, Lines 5-11 teaches in part, “To generate these phrases, the described techniques may analyze one or more sources. These sources may include books, magazines, online content, audio content, video content and/or any other source from which words may be extracted or assembled. With use of the sources, the described techniques may extract phrases or may extract words for use in creating phrases.”  Additionally, Jesensky at Column 12, Lines 15-17 teaches, “To find mined phrases, Phrase-mining module 810 includes a contiguous-words module 814 and a statistically-improbable-phrase (SIP) module 816.”
Both the Stanton reference and the Jesensky reference, in the sections cited by the Examiner, are in the field of endeavor of identifying phrases in text documents.  Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the comparing of a text document to a corpus of text documents and identifying SIP values as disclosed in Stanton with the analyzing of multiple documents and multiple sources and comparing SIP values as taught in Jesensky to facilitate in extracting relevant phrases from text (See Jesensky at Column 3 lines 5-11).

identifying, based on the SIP value determined for each term, a key term from the plurality of terms (Stanton at paragraph [0068] discloses in part, “Since it is statistically topic of a current text.”)

determining, from the plurality of terms, one or more related terms associated with the key term based on the one or more related terms meeting a similarity threshold; generating a topic cluster comprising the key term and the one or more related terms associated with the key term (Stanton at paragraph [0067] discloses the following:
It does this by measuring the frequency and occurrence patterns of individual words and combinations of words in texts that the user has consumed. Words and phrases that are encountered more frequently than is statistically likely, or are determined unique or of comparatively more importance by a similar measure, are assigned a greater value. Words and phrases that are encountered less frequently than is statistically likely, or are determined to be uniquely absent or of comparatively less importance, are reduced in value. The idea is that related words and phrases will be used more commonly in texts with similar content. As an example, a book that takes place on the sea is likely to use related words such as sea, sand, water, beach, ship, boat, swimming, and other words or phrases that are associated with the sea. If a user consistently adds texts to their channels that are either about or set near the sea, these words consistently gain in significance as they occur repeatedly across texts and channels. Commonly reoccurring words that share a similar theme can optionally be grouped and labeled into categories based on the frequency and pattern of their appearance.

Examiner is of the position that Stanton at paragraph [0067] discloses a key term of sea and identifying related words such as sand, water, beach, ship, boat and swimming, and grouping or clustering these related words together based on a determination that they are occur together with sufficient frequency [i.e., meeting a threshold].  However, Stanton does not explicitly disclose topic clustering or a similarity threshold.

Both the Stanton reference and the Marvit reference are in the field of endeavor of identifying topics of text documents and the grouping of related and key words.  Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine grouping of related words having a common theme or topic based on frequency and textual analysis as disclosed in Stanton with the topic clustering using an affinity and similarity calculation and threshold taught in Marvit to facilitate in identifying relevant information in large amounts of data (See Marvit at paragraph [0003]).

While Stanton at paragraphs [0066] – [0068] and [0092] disclose determining the subject matter of a text, and a user’s interest in the subject matter using customizable and adjustable user interest values to make future text recommendations, Stanton does not disclose a user adjusting the determined topic of a text.  Additionally, while Marvit teaches document clustering using keywords, clusters representing topics and Marvit at paragraph [0144] teaches a user through a user interface adding or removing and adjusting the weight of document tags related to topics, identify the key term of the topic cluster, more specifically, Stanton in view of Marvit does not disclose:
providing, for presentation on a client device associated with a user, a graphical user interface comprising a selectable option to identify the key term with the topic cluster as a first topic of interest, designating the key term and the topic cluster as the first topic of interest based on receiving an indication of a user selection of the selectable option: 
However, Evans at paragraph [0009] teaches in part, “Accordingly, the present inventors have determined that a semi-supervised, interactive document clustering method would be desirable, wherein the method can allow the user to preview the most popular coherent topics in the database, guide the clustering process, and then create document clusters only for selected topics.”  Additionally, Evans at Figure 1a provided below teaches a graphical user interface to facilitate in providing user interaction and user direction in the clustering process (See Evans at paragraph [0015].

    PNG
    media_image1.png
    492
    689
    media_image1.png
    Greyscale

Both the Stanton reference and the Evans reference, in the portions cited by the Examiner are in the field of endeavor of grouping text based in part on identified keywords, topics and user input.  Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine grouping of the documents and identifying of text subject matter using SIP in part and customizable user interest levels the GUI for user interaction in document clustering by topic and allowing for topic selection as taught in Evans to facilitate in providing user interaction and user direction in the clustering process (See Evans at paragraph [0015].

receiving a second plurality of multiple electronic text documents; wherein the second plurality of multiple electronic text documents is from a separate data source than the text corpus and providing, to the client device, an electronic text document from the second plurality of multiple electronic text documents that corresponds to the first topic of interest based on the electronic text document relating to the topic cluster (Stanton in the Abstract discloses in part, “The system may use data from one text to identify other texts that a user may like, and present information about the text to the user in various forms.”  Additionally, Jesensky at Column 3, Lines 5-6 teaches in part, “To generate these phrases, the described techniques may analyze one or more sources.”)

Regarding dependent claim 2, all of the particulars of claim 1 have been addressed above.  While Stanton at paragraphs [0066] – [0068] discloses assigning interest value to words and phrases based on keyword ranking, frequency and SIP, Stanton does not explicitly disclose:
ranking the plurality of terms based on the SIP value for each term; and identifying the key term from the plurality of terms by determining that the key term is a highest ranked term from the plurality of terms.
However, Marvit at paragraphs [0124] – [0125] teaches in part, “The method starts at step 410, where terms of the documents of a corpus are ranked using any suitable ranking technique… One or more highly ranked terms are selected as the keywords of the documents at step 414.” 

Regarding dependent claim 3, all of the particulars of claim 1 have been addressed above.  Additionally, Stanton as modified discloses:
the frequency occurrence of the term in the plurality of multiple electronic text documents comprises a number of times the term appears in the plurality of multiple electronic text documents; and the corpus frequency occurrence of the term in the text corpus comprises a number of times the term appears in the text corpus over a total number of words in the text corpus; the first plurality of electronic text documents is not included in the text corpus; and the second plurality of electronic text documents is not included in the text corpus. (Stanton at paragraph [0068] discloses that the statistically improbable phrases are phrases which occur frequently in a given text [i.e., corpus frequency] but do not appear frequently in other texts [i.e., number of times the term appears in the plurality of electronic documents not including the corpus].  Additionally, Stanton at paragraph [0067] discloses frequency analysis and Stanton at paragraph [0023] discloses comparing text to other text and a database of reference texts.  Additionally, Jesensky at Column 3, Lines 5-6 teaches in part, “To generate these phrases, the described techniques may analyze one or more sources.”)

Regarding dependent claim 4, all of the particulars of claims 1 and 3 have been addressed above.  Additionally, Stanton discloses:
Determining that the term does not appear in the text corpus; and replacing the corpus frequency occurrence of the term in the text corpus with a default non-zero number when the term does not appear in the text corpus (Stanton at paragraphs [0066] – [0067] discloses assigning methods by which to assign a default interest value to a word based on the frequency of its use.  Additionally, Stanton at paragraph [0067] discloses in part, “Words and phrases that are encountered less frequently than is statistically likely, or are determined to be uniquely absent or of comparatively less importance, are reduced in value.”)

Regarding dependent claim 5, all of the particulars of claim 1 have been addressed above.  Additionally, Stanton as modified discloses:
determining the corpus frequency occurrence of the term by determining the corpus frequency occurrence of the term from a subset of documents in the text corpus (Stanton at paragraph [0068] discloses determining a text’s subject matter by comparing SIP terms to that of other texts.  Additionally, Jesensky at Column 3, Lines 5-6 teaches in part, “To generate these phrases, the described techniques may analyze one or more sources.”  Examiner is of the position that Jesensky as cited above teaching analyzing a single source of multiple sources reads on a subset of documents in the text corpus.)

Regarding dependent claim 6, all of the particulars of claim 1 have been addressed above.  While Stanton at paragraph [0104] discloses using a parts of speech tagger to determine whether certain emotional modifying words such as happy occur with a certain frequency and within a certain distance to such target words as modify the score of the target word, Stanton does not explicitly disclose:
wherein the one or more related terms associated with the key term are located proximate the key term when one or more key terms are located within a threshold distance from the key term in n-dimensional vector space.
However, Marvit at paragraph [0085] teaches in part, “Affinity vectors may be similar if one affinity vector is proximate to the other affinity vector as determined by a suitable distance function.”  Additionally, Marvit at paragraph [0072] teaches affinity thresholds.

Regarding dependent claim 7, all of the particulars of claim 1 have been addressed above.  Stanton does not disclose:
generating a word embedding for each term of the plurality of terms; and generating a vector mapping comprising one or more terms based on the word embeddings corresponding to the one or more term.
However, Marvit at paragraphs [0075] and [0083] – [0086] teaches clustering affinity vectors of words.

Regarding dependent claim 8, all of the particulars of claims 1 and 5-6 have been addressed above.  Additionally, Stanton discloses:
further comprising: receiving an indication of a user selection to expand the topic cluster; adjusting, in response to the indication of a user selection to expand the topic cluster, the threshold distance from the key term in the n-dimensional vector space to include one or more additional terms associated with the key term; modifying the topic cluster to comprise the key term, the one or more related terms associated with the key term, and the one or more additional terms associated with the key term; and providing, for presentation on the client device and in response to the indication of the user selection to expand the topic cluster, an additional electronic text document from the plurality of multiple electronic text documents that includes at least one term from the one or more additional terms associated with the key term (Stanton at paragraph [0083] discloses the following:
While the most obvious way to match two metrics is a direct one-to-one comparison (i.e. all scenes rated as "3" match other scenes rated as a "3"), the system sensitivity is adjusted to allow a range of responses to be identified as a match depending on how important the element is to the specific user's preference. For example, when matching a metric of 3, the system will match other metrics with 3, plus or minus a range determined by either a system default setting, user input, or by the self-learning system described later in this patent. The ability to adjust the match sensitivity level by widening or narrowing the scope of what is considered by the system to be a positive match can be applied at any point that metrics are compared. This will be referred to herein as adjusting "sensitivity."

Examiner is of the position that Stanton above discloses allowing user input to adjust and widen [i.e., expand] the scope of what is considered a positive match and that ability can be applied to any metric, and Examiner is of the position that it would have been obvious to one of ordinary skill in the art to apply such a disclosure to adjust the affinity vector and distance measure taught in Marvit and recited in the rejection of claim 7 above.  Examiner is of the position that if the affinity vector distance of Marvit was adjusted allowing for terms separated by a greater distance to be clustered together the result would be more documents clustered in the cluster.  Additionally, Jesensky at Column 3, Lines 5-6 teaches in part, “To generate these phrases, the described techniques may analyze one or more sources.”)

Regarding dependent claim 9, all of the particulars of claim 1 have been addressed above.  Additionally, Stanton discloses:
further comprising: receiving an indication of a user selection of a term to exclude from the one or more related terms associated with the key term; modifying the topic cluster by removing the term to exclude from the topic cluster; and providing, for presentation to the user and in response to the indication of the user selection of the term to exclude, one or more electronic text documents from the plurality of multiple electronic text documents that have at least one term from the modified topic cluster (Stanton at paragraph [0083] discloses the following:
While the most obvious way to match two metrics is a direct one-to-one comparison (i.e. all scenes rated as "3" match other scenes rated as a "3"), the system sensitivity is adjusted to allow a range of responses to be identified as a match depending on how important the element is to the specific user's preference. For user input, or by the self-learning system described later in this patent. The ability to adjust the match sensitivity level by widening or narrowing the scope of what is considered by the system to be a positive match can be applied at any point that metrics are compared. This will be referred to herein as adjusting "sensitivity."

Examiner is of the position that Stanton above discloses allowing user input to adjust and narrow the scope of what is considered a positive match and that ability can be applied to any metric, and Examiner is of the position that it would have been obvious to one of ordinary skill in the art to apply such a disclosure to adjust the affinity vector and distance measure taught in Marvit and recited in the rejection of claim 6 above.  Examiner is of the position that if the affinity vector distance of Marvit was adjusted allowing only terms separated by a smaller distance to be clustered together the result would be terms would be excluded from the original cluster and less documents clustered in the cluster.)


Regarding independent claim 11, claim 11 is rejected under the same rationale as claim 1.  Additionally, with respect to the hardware limitations of the system, at least one process; and at least one non-transitory computer-readable storage medium storing instructions that…(See Stanton at paragraphs [0074] – [0075]).

Regarding dependent claim 12, all of the particulars of claim 11 have been addressed above.  Additionally, Stanton as modified discloses:
display, within the graphical user interface, a portion of the electronic text document from the second plurality of multiple electronic text documents that includes at least one term corresponding to the first topic of interest, and emphasizing the at least one term in the portion of the electronic text document (Evans at Figure 9 provided below teaches presenting a portion of text along with emphasized cluster terms on the right hand side, horse, gait, etc.)

    PNG
    media_image2.png
    534
    690
    media_image2.png
    Greyscale

Regarding dependent claim 13, all of the particulars of claim 11 have been addressed above.  Additionally, Stanton discloses:
further comprising instructions that, when executed by the at least one processor, cause the system to: identify a second key term from the plurality of terms based on the second key term having a second highest SIP value; determine, from the plurality of terms, a second set of related terms associated with the second key term; and generate a second topic cluster comprising the second key term and the second set of related terms associated with the second key term. (Stanton at paragraph [0068] discloses in part, “Statistically Improbable Phrases is another manner in which the system 10 may identify a text's subject matters [i.e., Examiner notes subject matters plural] and attempt to predict the level of a user's interest in those subject matters.”)

Regarding independent claim 21, claim 21 is rejected under the same rationale as claim 1.

Regarding dependent claim 22, all of the particulars of claim 21 have been addressed above.  Additionally, claim 22 is rejected under the same rationale as claim 12.

Regarding dependent claim 23, all of the particulars of claim 21 have been addressed above.  Additionally, Stanton at paragraph [0092] discloses a user saving their settings and preferences, Examiner is of the position that future text analyzed would be identified using the saved settings.  

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Stanton in view of Jesensky in view of Marvit in view of Evans in further view of Kostorizos et al. U.S. Pub. No. 2008/0263022 (hereinafter “Kostorizos”).
Regarding dependent claim 10, all of the particulars of claims 1 and 9 have been addressed above.  Additionally, Stanton as modified with Marvit and Evans does not disclose:
wherein receiving the indication of the user selection to exclude the term from the one or more related terms associated with the key term comprises an indication to split the term as a new key term, the method further comprising: determining, from the plurality of terms, one or more new related terms associated with the new key term; generating a new topic cluster comprising the new key term and the one or more new related terms associated with the new key term; and providing, to the client device associated with the user, at least one electronic text document from the plurality of multiple electronic text documents that corresponds to the new topic cluster.
However, Kostorizos at paragraph [0091] teaches, “The GUI should also allow the user or another mechanism to define the cluster diameter--this allows the user to split large, generalized concept clusters into component concept clusters without altering the search terms. The simplification of cluster display is also desirable. This provides the capability of suppressing nodes for the purpose of reducing clutter within the search results, and hence, allows the user to better investigate the structure of the cluster.”
Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the topic clustering of a plurality of documents using  term analysis and the user adjustment of matches and metrics as disclosed in Stanton as modified with user splitting clusters selection taught in Kostorizos to facilitate in the accuracy of the topic clustering of the system.

Claims 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over Stanton in view of Jesensky in view of Marvit in view of Evans in further view of Majkowska U.S. Pub. No. 2015/0161248 (hereinafter “Majkowska”).
Regarding dependent claim 14, all of the particulars of claims 11 and 13 have been addressed above.  Stanton does not disclose:
further comprising instructions that, when executed by the at least one processor, cause the system to: receive an indication of a user selection to merge the first topic cluster with the second topic cluster; merge, the second topic cluster with the first topic cluster to create a merged topic cluster; and provide the merged topic cluster for presentation to the user.

Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the topic clustering of a plurality of documents using related terms and user adjustments of matches and metrics as disclosed in Stanton as modified with merging topic clusters taught in Majkowska to facilitate in the accuracy of the topic clustering of the system.

Regarding dependent claim 15, all of the particulars of claims 11 and 13-14 have been addressed above.  Additionally, Stanton in the Abstract discloses in part, “The system may use data from one text to identify other texts that a user may like, and present information about the text to the user in various forms.”

Regarding dependent claim 16, all of the particulars of claims 11 and 13-15 have been addressed above.  Additionally, Stanton at paragraph [0112] discloses a system user interface for inputting data presenting results as does Evans in Figure 1.

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Stanton in view of Jesensky in view of Marvit in view of Evans in view of Majkowska in further view of Nicholls et al. U.S. Pub. No. 2016/0070762 (hereinafter “Nicholls”).
Regarding dependent claim 17, all of the particulars of claims 11 and 13-16 have been addressed above.  While Stanton at paragraph [0083] discloses adjusting match sensitivity, Stanton does not disclose explicitly a user adding terms to a topic cluster.

Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the topic clustering of a plurality of documents using related terms and user adjusted match sensitivity as disclosed in Stanton as modified with the adding of terms to a topic cluster taught in Nicholls to facilitate in increasing user control and input over topic clustering.

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
U.S. Patent No. 9,372,592
Column 23, Lines 39 – 49 as it relates to the definition of a statistically improbable phrase.
U.S. Pub. No. 2010/0223273
Paragraph [0012] as it relates to grouping of documents based on a corresponding statistically improbable phrase.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANTHONY G GEMIGNANI whose telephone number is (571)272-1018. The examiner can normally be reached M-F 8-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/A.G.G./Examiner, Art Unit 2154                                                                                                                                                                                                        
/SYED H HASAN/Primary Examiner, Art Unit 2154