DETAILED ACTION
	Receipt of Applicant’s Amendment, filed October 7, 2021 is acknowledged.  
Claims 1, 6-8, 13-14, 19-20, 23-24, and 26 were amended.
Claims 2, 3, 5, 9, 11, 25, and 27 were cancelled.
Claim 28 was newly presented.
Claims 1, 4, 6-8, 10, 12-24, 26 and 28 are pending in this office action.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4, 6-8, 10, 12, 19-24, 26 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang [10095686] in view of Najork [7139747] and Sravanapudi [2007/0174255].

With regard to claim 1 Zhang teaches A computing system comprising: 
a source container as the social engine (Zhang, Figure 1, 104, Figure 2, 208) in a distributed (Zhang, Column 6, line 15 “distributed components”) network (Zhang, Figure 2, 206) … configured to store (Zhang, Column 12, lines 49 “memory 612”) a plurality of social media data records (Zhang, Column 5, lines 35-37 “data from users is communicated from the social stream 102 to the trending topic tool 106 via the social analysis tool 104, such as ADOBE SOCIAL”) collected from a plurality of social media networks as the more than one social networking services (Zhang, Column 5, lines 38-39 “The data comprises a sample of posts made by users to one or more social networking services”; Fig 2, 202A though 202N); 
a topic analysis server (Zhang, Figure 2, 210) …, 
download all of the configured web pages … identified by the topic analysis server (Zhang, Column 5, lines 42-46 “retrieve a portion of the data from the social data stream”) …, and 
extract … from the downloaded web pages and … (Zhang, Column 5, lines 40-50 “retrieve a portion of data from the social data stream”) to generate the plurality of social media data records (Zhang, Column 5, lines 35-37 “data from users is communicated from the social stream 102 to the trending topic tool 106 via the social analysis tool 104, such as ADOBE SOCIAL”) stored in the source container as stream data is sent to the social engine (Zhang, Figure 1, 104, Figure 2, 208; see where the social stream 102 (which is gathered from the social networks 202A0N) is sent to the social engine (104 and 208) to be processed by the trending topic tool (106 and 210)), 
wherein the topic analysis server (Zhang, Figure 2, 210) is further configured to use a semi-supervised learning process (Zhang, Column 5, lines 23-26 “A lightweight NPL method, also described below, is utilized for topic extraction which provides an efficient mechanism for handling dynamically-changing content without requiring a user to pre-identify topics… The social analytics system further helps users understand the topics by labeling topic categories and grouping semantically-similar topics”) on the plurality of social media data records in the source container as the data samples sent from the social engine (Figure 1, see “data sample” from 104 to 106), …; 
a frequency processor (Zhang, Figure 2, 212 and 216 which are used to identify topics Column 9, lines 5-10 and 30-35; which is used to group the semantically-similar topics Column 9, lines 65-67) configured to generate a term-document frequency matrix as the BTF (Zhang, Column 9, lines 27-28 “Here, BTF is a Boolean frequency value that is set to 1 if topic t occurs in document d.  Otherwise, the BTF is set to 0”), document frequency vector as inverse document frequency (IDF) (Zhang, Column 3, line 54-56 “an indication of how common or rare a particular term is among a collection of posts”), and a collection frequency vector as the Accumulated Term Frequency (Zhang, Column 3, line 48-51) for each of the … social media data records and combine (Zhang, Column 9, lines 25-27 “The relevance score or ranking of a candidate topic t can then be defined as R(t)=(Ʃ|{dϵD:tϵd}|BTF(t,d))*IDF(t,D)”) the generated term-document frequency matrix as the BTF (Id) and the document frequency vector as the IDF (Id) into a single entity to generate a weighted data matrix as the relevance score or ranking (Id), wherein the frequency processor is further configured to generate a record similarity matrix based on the weighted data matrix (Zhang, Column 10, lines 15-19 “The clustering component may calculate the distance between two topics by determining there text similarity, namely entity similarity, or surrounding text similarity.  Additionally, the clustering component may apply similarity metrics (e.g. Cosine similarity metric)”), wherein the record similarity matrix is used by the topic analysis server (Zhang, Figure 2, 210 includes the clustering component 218) to cluster the social media data records into multiple groups of clustered social media data records (Zhang, Column 9, line 65-67 “Clustering component groups semantically-similar topics”); 
… processor configured to derive implicit text representation of text semantics (Zhang, Column 9, lines 65-67 “clustering component 218 groups semantically-similar topics”) by performing … on each of the multiple groups of clustered social media data records using the weighted data matrix generated (Zhang, Column 3, lines 60-63 “The ‘relevance score’ is the numerical indication of the relevance of a particular topic.”) by the frequency processor as the ranking component (Zhang, Column 9, lines 5-7) to determine a topic of each of the multiple groups as the topic for the document which is then grouped by the clustering component (Zhang, Column 9 lines 20-30 “a candidate topic T”) of clustered social media data records (Zhang, Column 9, line 65-67 “Clustering component groups semantically-similar topics”); 
wherein the topic analysis server (Zhang, Figure 2, 210) is further configured to generate a hierarchical topic domain (Zhang, Column 10, lines 6-7, “The clustering algorithm builds a dendogram (i.e., a tree data structure)”) comprising a plurality of nodes as a tree structure (Id), wherein the plurality of nodes comprise a root node as the top level of the tree (Zhang, Column 10, lines 7-12 “a single cluster”) and a plurality of leaf nodes as the bottom level of the tree (Zhang, Column 10, lines 7-12 “the bottom level”), …; and  2 4830-2924-7404 2 0740 14-000004Docket No. 074014-000004 Application No. 15/133,090 
a target container (Zhang, Column 10, lines 6-7, “The clustering algorithm builds a dendogram (i.e., a tree data structure)”) in the distributed (Zhang, Column 6, line 15 “distributed components”) network (Zhang, Figure 2, 206) …, wherein the hierarchical topic domain (Zhang, Column 9, lines 65-67 “clustering component 218 groups semantically-similar topics”) is stored in the target container (Zhang, Column 10, lines 6-7, “The clustering algorithm builds a dendogram (i.e., a tree data structure)”).  
Zhang does not explicitly teach a cache… a web sniffer engine configured to: dynamically sniff configured web uniform resource locators (URLs) corresponding to web pages to determine if they are in a current context, download all of the configured web pages and nested web pages identified by the topic analysis server as having the current context, … cache.
Najork teaches …a distributed network cache (Najork, Column 3, lines 65 “an optional cache 174”) configured to store a plurality of social media data records (Najork, Column 8, lines 9-11 “the Cache C, in this embodiment, is one of the data structures 113 … for storing known addresses”) …
… comprising a web sniffer engine (Najork, Column 1, lines 36-37 “A web crawler is a program that automatically finds and downloads documents from host computers in an intranet or the world wide web”) configured to: 
dynamically sniff as automatically find and download documents (Id) configured web uniform resource locators (URLs) (Najork, Column 1, lines 20 “a distinct address called its uniform resource locator (URL) corresponding to web pages as the document being processed (Id; Column 4, lines Column 4, lines 30 “downloads the document corresponding to the URL, and processes the document”) to determine if they are in a current context (Najork, Column 5, lines 5-6 “identifies the web page’s host computer”), 
download all of the configured web pages (Najork, Column 4, lines 36 “candidates for downloading”) and nested web pages (Najork, Column 4, lines 35-39 “identifies URL’s in the downloaded document .. that are candidates for downloading and processing.  Typically, these URL’s are found in hypertext links in the document being processed”) identified by the topic analysis server as having the current context as only the websites of the assigned host are downloaded by the current web crawler (Najork, Column 6, lines 5-10 “if the web crawler identifier is W it is assigned to the current web crawler… then the URL u is passed to the address filter procedure, which is described below (step 166)”), and 
extract text  as extracting the words from the document to put in the index (Najork, Column 4, line 31-33 “The processing may include indexing the words in the document so as to make the document accessible via a search engine”) from the downloaded web pages and nested web pages in the current context as the document being processed (Id) to generate the plurality of social media data records as the index record for the document (Id)… cache (Najork, Column 3, lines 65 “an optional cache 174”)...  

It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the proposed combination using the cache storage taught by Najork as it yields the predictable results of providing a means of storing the data within a known storage device.
	Taken alone Zhang does not explicitly teach wherein the number of social media data records in each of the plurality of leaf nodes is between 1,000 and 10,000.  Zhang does provide an example where there are at least 1000 samples taken (Zhang, Column 10, lines 55-56 “the user may select to retrieve 1000 samples of data from the social media stream”).  Najork teaches that there were approximately 800 million web pages in 1999 (Najork, Column 1, lines 50-55 “As of 1999 there were approximately 800 million web pages on the world wide web”).  One of ordinary skill in the art would recognize that too few records would not give the system sufficient data to make accurate analysis, and that too many records would take the system too long to process.  The specific number is simply a matter of device optimization.  It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to wherein the plurality of nodes comprise a root node and a plurality of leaf nodes, wherein the number of social media data records in each of the plurality of leaf nodes is between 1,000 as a reasonable minimum number of records to be able to perform a proper analysis (Zhang, Column 10, lines 55-56 “the user may select to retrieve 1000 samples of data from the social media stream”) and 10,000 as a reasonable maximum number of records to be able to process with a reasonable processing time based on the specific physical limitations of the device (Zhang, Column 10, lines 55-58).
	Zhang does not explicitly teach to standardize the plurality of social media records in a topic domain set… for each of the standardized social media records…latent semantic analysis (LSA) processor configured to derive implicit text representations of text semantics by performing LSA.  To be clear Zhang uses a different form of analysis to derive the implicit text representations.  Sravanapudi teaches … to standardize the plurality of social media data records in a topic domain set (Sravanapudi, ¶75 “input phrases may be processed prior to matching against hook rules… misspelled words within the input phrase may be corrected.  ; … for each of the standardized social media data records (Id)…
a latent semantic analysis (LSA) (Sravanapudi, ¶43 “Lexical Semantic Analysis (LSA) may be used to identify the concepts included in the extracted text”) processor configured to derive implicit text representation of text semantics (Sravanapudi, ¶43 “Lexical Semantic Analyses (LSA) may be used to identify the concepts included in the extracted text”) by performing LSA (Sravanapudi, ¶43 “Lexical Semantic Analysis (LSA) may be used to identify the concepts included in the extracted text”)…
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented concept analysis performed by the proposed combination using the LSA analysis techniques Sravanapudi as it yields the predictable results of providing a known technique for identifying concepts in text.
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the proposed combination to normalize the terms as taught by Sravanapudi as it yields the predictable results of correcting spelling issues within the documents being analyzed.

With regard to claim 4 Zhang teaches A computer-implemented method comprising: 
…of a plurality of social media networks (Zhang, Column 5, lines 35-37 “data from users is communicated from the social stream 102 to the trending topic tool 106 via the social analysis tool 104, such as ADOBE SOCIAL”) …
saving the plurality of social media data records (Zhang, Column 5, lines 35-37 “data from users is communicated from the social stream 102 to the trending topic tool 106 via the social analysis tool 104, such as ADOBE SOCIAL”) in a source container as the social engine (Zhang, Figure 1, 104, Figure 2, 208); 
…
generating a term-document frequency matrix as the BTF (Zhang, Column 9, lines 27-28 “Here, BTF is a Boolean frequency value that is set to 1 if topic t occurs in document d.  Otherwise, the BTF is set to 0”), a document frequency vector as inverse document frequency (IDF) (Zhang, Column 3, line 54-56 “an indication of how common or rare a particular term is among a collection of posts”) and a collection frequency vector as the Accumulated Term Frequency (Zhang, Column 3, line 48-51) for each of the … social media data records(Zhang, Column 9, lines 8-12 “a TF-IDF ranking measures how important a word is to a document in a collection of documents.  In this way, the TF-IDF score is the product of the TF and the IDF”); 
generating a weighted matrix (Zhang, Column 9, lines 25-27 “The relevance score or ranking of a candidate topic t can then be defined as R(t)=(Ʃ|{dϵD:tϵd}|BTF(t,d))*IDF(t,D)”) from the term-document frequency matrix as BTF (Id), document frequency vector as IDF (Id) and the collection frequency vector as the Accumulated Term Frequency (Zhang, Column 3, line 48-51; Column 9, lines 19-20 “ATF is an accumulated term frequency value in a document set”); 
generating a record similarity matrix from the weighted matrix (Zhang, Column 10, lines 15-19 “The clustering component may calculate the distance between two topics by determining there text similarity, namely entity similarity, or surrounding ; 
clustering the standardized social media data records based on the record similarity matrix into multiple groups of clustered social media data records (Zhang, Column 9, line 65-67 “Clustering component groups semantically-similar topics”); 
deriving an implicit text representation of text semantics (Zhang, Column 9, lines 65-67 “clustering component 218 groups semantically-similar topics”) … each of the multiple groups of as the topic for the document which is then grouped by the clustering component (Zhang, Column 9 lines 20-30 “a candidate topic T”) clustered social media data records (Zhang, Column 9, line 65-67 “Clustering component groups semantically-similar topics”) to determine a topic of each of the multiple groups of clustered social media data records as the topic for the document which is then grouped by the clustering component (Zhang, Column 9 lines 20-30 “a candidate topic T”); 4 4833-0520-6509 1 074014-000004Docket No. 074014-000004 Application No. 15/133,090 
generating a hierarchical topic domain (Zhang, Column 10, lines 6-7, “The clustering algorithm builds a dendogram (i.e., a tree data structure)”) comprising a plurality of nodes as a tree structure (Id), wherein the plurality of nodes comprise a root node as the top level of the tree (Zhang, Column 10, lines 7-12 “a single cluster”) and a plurality of leaf nodes as the bottom level of the tree (Zhang, Column 10, lines 7-12 “the bottom level”), …; and 
storing the hierarchical topic domain in a target container (Zhang, Column 10, lines 6-7, “The clustering algorithm builds a dendogram (i.e., a tree data structure)”)  in a distributed (Zhang, Column 6, line 15 “distributed components”) network (Zhang, Figure 2, 206) ….
	Zhang does not explicitly teach sniffing web uniform resource locators (URLs) of a plurality of … media networks to determine if web pages associated with the URLs are within a current context by a web sniffer engine; Application No. 15/133,090  analyzing linked URLs in the web pages to determine if linked web pages are within the current context;  downloading all of the web pages and linked web pages that are within the current context;  extracting text from all of the web pages and linked web pages that are within the current context to form a plurality of social media data records.  Najork teaches sniffing (Najork, Column 1, lines 36-37 “A web crawler is a program that automatically finds and downloads documents from host computers in an intranet or the world wide web”) web uniform resource locators (URLs) (Najork, Column 1, lines 20 “a distinct address called its uniform resource locator (URL) of a plurality of … media networks to determine if web pages associated with the URLs as the document being processed (Id; Column 4, lines Column 4, lines 30 “downloads the document corresponding to the URL, and processes the document”) are within a current context as only the websites of the assigned host are downloaded by the current web crawler (Najork, Column 6, lines 5-10 “if the web crawler identifier is W it is assigned to the current web crawler… then the URL u is passed to the address filter procedure, which is described below (step 166)”) by a web sniffer engine (Najork, Column 1, lines 36-37 “A web crawler is a program that automatically finds and downloads documents from host computers in an intranet or the world wide web”); 3 4833-0520-6509 1 074014-000004Docket No. 074014-000004 Application No. 15/133,090 
analyzing linked URLs in the web pages to determine if linked web pages (Najork, Column 4, lines 35-39 “identifies URL’s in the downloaded document .. that are candidates for downloading and processing.  Typically, these URL’s are found in hypertext links in the document being processed”) are within the current context as only the websites of the assigned host are downloaded by the current web crawler (Najork, Column 6, lines 5-10 “if the web crawler identifier is W it is assigned to the current web crawler… then the URL u is passed to the address filter procedure, which is described below (step 166)”); 
downloading all of the web pages and linked web pages that are within the current context as only the websites of the assigned host are downloaded by the current web crawler (Najork, Column 6, lines 5-10 “if the web crawler identifier is W it is assigned to the current web crawler… then the URL u is passed to the address filter procedure, which is described below (step 166)”); 
extracting text from as extracting the words from the document to put in the index (Najork, Column 4, line 31-33 “The processing may include indexing the words in the document so as to make the document accessible via a search engine”) all of the web pages and linked web pages that are within the current context as the document being processed (Id) to form a plurality of social media data records as the index record for the document (Id);
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the tending topic extraction from social media taught by Zhang using the crawling techniques taught by Najork as it yields the predictable results for providing an efficient downloading of data 
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the proposed combination using the cache storage taught by Najork as it yields the predictable results of providing a means of storing the data within a known storage device.
	Taken alone Zhang does not explicitly teach wherein the number of social media data records in each of the plurality of leaf nodes is between 1,000 and 10,000.  Zhang does provide an example where there are at least 1000 samples taken (Zhang, Column 10, lines 55-56 “the user may select to retrieve 1000 samples of data from the social media stream”).  Najork teaches that there were approximately 800 million web pages in 1999 (Najork, Column 1, lines 50-55 “As of 1999 there were approximately 800 million web pages on the world wide web”).  One of ordinary skill in the art would recognize that too few records would not give the system sufficient data to make accurate analysis, and that too many records would take the system too long to process.  The specific number is simply a matter of device optimization.  It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the proposed combination where each leaf node has at least 1000 samples as this is an example of a reasonable number of records indicated by Zhang as being useable.  It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the proposed wherein the number of social media data records in each of the plurality of leaf nodes is between 1,000 as a reasonable minimum number of records to be able to perform a proper analysis (Zhang, Column 10, lines 55-56 “the user may select to retrieve 1000 samples of data from the social media stream”) and 10,000 as a reasonable maximum number of records to be able to process with a reasonable processing time based on the specific physical limitations of the device (Zhang, Column 10, lines 55-58).
Zhang does not explicitly teach standardizing the plurality of social media records in the topic domain set… for each of the standardized social media records…deriving implicit text representations of text semantics based on latent semantic analysis (LSA).  To be clear Zhang uses a different form of analysis to derive the implicit text representations.  Sravanapudi teaches … standardizing the plurality of social media data records in the topic domain set (Sravanapudi, ¶75 “input phrases may be processed prior to matching against hook rules… misspelled words within the input phrase may be corrected.  Words of the input phrase may be replaced with their base or stem forms…”); … for each of the standardized social media data records (Id)…
deriving an implicit text representation of text semantics (Sravanapudi, ¶43 “Lexical Semantic Analyses (LSA) may be used to identify the concepts included in the extracted text”) based on latent semantic analysis (LSA) (Sravanapudi, ¶43 “Lexical Semantic Analysis (LSA) may be used to identify the concepts included in the extracted text”)…

It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the proposed combination to normalize the terms as taught by Sravanapudi as it yields the predictable results of correcting spelling issues within the documents being analyzed.

With regard to claim 6 the proposed combination further teaches wherein the generating the hierarchical topic domain (Zhang, Column 10, lines 6-7, “The clustering algorithm builds a dendogram (i.e., a tree data structure)”) is performed by a topic analysis server (Zhang, Figure 2, 210 which includes the clustering component 218; wherein the clustering component performs the clustering Column 8 line 65 through Column 10 line 7).

With regard to claim 7 the proposed combination further teaches wherein the clustering the standardized social media data records into multiple groups based on the record similarity matrix as clustering (Zhang, Column 9, lines 65-67 “clustering component groups semantically-similar topics.  Because extracted topics are keyword based”) is performed by a frequency processor (Figure 2, 212 and 216 which are .

With regard to claim 8 the proposed combination further teaches wherein the deriving the implicit text representation of the text semantics as semantics (Zhang, Column 9, lines 65-67 “clustering component 218 groups semantically-similar topics”; Sravanapudi, ¶68 “the concept extractor multiplies the sparse vector by an LSA matrix”) based on the latent semantic analysis (LSA) is performed by a latent semantic analysis (LSA) processor as LSA (Id).

With regard to claim 10 the proposed combination further teaches wherein the standardizing the received social media data records (Sravanapudi, ¶75 “input phrases may be processed prior to matching against hook rules… misspelled words within the input phrase may be corrected.  Words of the input phrase may be replaced with their base or stem forms…”) comprises at least one of converting text to lowercase, eliminating irregular spacing, removing stop words, correcting misspellings as misspelled words may be corrected (Id) and replacing words with corresponding root words as words of the input phrase may be replaced with their base or stem forms (Id).

With regard to claim 12 the proposed combination further teaches transforming the term-document frequency matrix as the BTF (Zhang, Column 9, lines 27-28 “Here, BTF is a Boolean frequency value that is set to 1 if topic t occurs in document d.   using term frequency as BTF is a function of the term and the document (Zhang, Column 9, lines 27 “BTF(t, d)… BTF is a Boolean frequency value that is set to 1 if topic t occurs in document d”) the accumulated term frequency (Zhang, Column 3, lines 58-51 “(ATF) refers to an indication of the total number of times a term occurs in a sample comprising a number of posts”) and inversed document frequency (Zhang, Column 9, lines 22-23 “The IDF is defined by: IDF(t,D)=log N/|{dϵD:tϵD}|”) (TF-IDF) as ATF-IDF (Column 9, lines 18-19 the ranking component 214 does not use a TF-IDF ranking algorithm and instead uses ATF-IDF to perform topic ranking”). 

With regard to claim 19 the proposed combination further teaches wherein generating the hierarchical topic domain uses web uniform resource locators (URLs) to control the generating (Sravanapudi, ¶42 “in addition to accessing all the text located at the URL identified in the RSS feed, the text extractor may extract other text included in the feed, such as a headline or other text describing the item located at the URL”).

With regard to claim 20 the proposed combination further comprises wherein the term-document frequency matrix as the BTF (Zhang, Column 9, lines 27-28 “Here, BTF is a Boolean frequency value that is set to 1 if topic t occurs in document d.  Otherwise, the BTF is set to 0”) comprises average term distribution vectors as the number of times the term appears to a number of posts which is calculated by using BTF (Zhang, Column 9, lines 27-28 “BTF is a Boolean frequency value that is set to 1 if 
 
With regard to claim 21 the proposed combination further teaches wherein the each of the multiple groups are determined by calculating a similarity index (Zhang, Column 9, lines 65-67 “clustering component 218 groups semantically-similar topics”) between each standardized social media data record as the keywords extracted from the record (Zhang, Column 9, lines 65-67 “because extracted topics are keyword based, the topics may be clustered in to groups”) and a record in the average each term distribution vectors (Zhang, Column 9, lines 19-29 “ATF is an accumulated term frequency value in a document set”).

With regard to claim 22 the proposed combination further teaches wherein the single entity is implemented for frequency calculations (Zhang, Column 9, line 65-67 “Clustering component group’s semantically-similar topics”) which is calculated based on the relevance score (Zhang, Column 9, lines 25-29).

With regard to claim 23 the proposed combination further teaches wherein the topic analysis server is further configured to build-up the hierarchical topic domain (Zhang, Column 10, lines 5-7, “a hierarchical clustering algorithm to group similar keywords/topics.  In this regard, the clustering algorithm builds a dendogram (i.e., a tree data structure)”) by a dynamic (Abstract, “dynamically-changing content”) and iterative process as the process is done for each (Zhang, Column 7, lines 61; Column 8, lines 30; Column 11, lines 45-47 “The process continues until all the topics are merged into a single cluster”) that continuously collects new documents (Zhang, Figure 1, see the loop between 104 and 106) continuously and updates the hierarchical topic domain (Zhang, Column 10, lines 5-7, “a hierarchical clustering algorithm to group similar keywords/topics.  In this regard, the clustering algorithm builds a dendogram (i.e., a tree data structure)”) for detection of topic domains using similarity analysis and subsequent accuracy analysis (Zhang, Figure 2, 212 and 216 which are used to identify topics Column 9, lines 5-10 and 30-35; which is used to group the semantically-similar topics Column 9, lines 65-67).

With regard to claim 24 the proposed combination further teaches wherein the dynamic and iterative process comprises receiving input URLs (Najork, Column 1, lines 20 “a distinct address called its uniform resource locator (URL)”), dynamically sniff the input URLs as the document being processed (Id; Column 4, lines Column 4, lines 30 “downloads the document corresponding to the URL, and processes the document”), download all of the configured web pages (Najork, Column 4, lines 30 “The thread then downloads the document corresponding to the URL (step 151), and processes the document”) and nested web pages (Najork, Column 4, lines 35-39 “identifies URS’s in the downloaded document that are candidates for downloading and processing.  Typically, these URL’s are found in hypertext links in the document being processed”) identified by the topic analysis server as having the current context (Najork, Column 5, lines 5-6 “identifies the web page’s host computer”), and abstracts the hierarchical topic domain (Zhang, Column 10, lines 5-7, “a hierarchical clustering algorithm to group similar keywords/topics.  In this regard, the clustering algorithm builds a dendogram (i.e., a tree data structure)”) and repeating the process until all pages within the current context (Najork, Column 5, lines 5-6 “identifies the web page’s host computer”) are exhausted or a configured nest level is reached as the process is done for each (Zhang, Column 7, lines 61; Column 8, lines 30; Column 11, lines 45-47 “The process continues until all the topics are merged into a single cluster”).  

With regard to claim 26 the proposed combination further teaches wherein the node comprises: 
a current node text representation; 7 4810-4372-8079 1 074014-000004Docket No. 074014-000004 Application No. 15/133,090 
a current node name; 
a current node identifier; 
a last update time; 
an array of most frequent used non words in current nodes; 
a total number of documents in the node as a bin count (Zhang, Column 3, lines 33-36); 
links to branches or categories in the node as a link from a parent node in the tree structure to a child node in the tree structure (Zhang, Column 10, lines 7 “a tree data structure”); 
links to individual documents; 
a total word count for all documents in the node;
a term by document count in the node  (Zhang, Column 3, lines 37-43 “An “expected sample count” is the expected number of times a term is expected in a particular bin and is based on the number of posts in that bin”); and 
the term document frequency matrix as text similarity, or cosine similarity metric (Zhang, Column 10, lines 15-19).  

Claims 13-18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Sravanapudi, Najork and Carver [8180760].

With regard to claim 13 the proposed combination further teaches calculating the record similarity matrix (Zhang, Column 10, lines 15-19 “The clustering component may calculate the distance between two topics by determining there text similarity, namely entity similarity, or surrounding text similarity.  Additionally, the clustering component may apply similarity metrics (e.g. Cosine similarity metric)”) using as the BTF is used to calculate the ranking (Zhang, Column 9, lines 25-27 “The relevance score or ranking of a candidate topic t can then be defined as R(t)=(Ʃ|{dϵD:tϵd}|BTF(t,d))*IDF(t,D)”) which is used to identify the topics (Column 9, lines 6-8 “After the can date topics are extracted or identified by the identification component 212, they are ranked by ranking component 214”) which is used to perform the clustering (Zhang, Column 9, line 67 “topics may be clustered into groups”) the transformed term-document frequency matrix as the BTF (Zhang, Column 9, lines 27-28 “Here, BTF is a Boolean frequency value that is set to 1 if topic t occurs in document d.  Otherwise, the BTF is set to 0”) wherein similarity values for records inside a single group of the multiple groups are … than the values for records outside the single group (Zhang, Column 10, lines 19 “cosine similarity metric”)
Zhang does not explicitly teach wherein similarity values for records inside a single group of the multiple groups are higher than the values for records outside the single group.
Carver teaches wherein similarity values for records inside a single group of the multiple groups are higher than the values for records outside the single group (Carver, Column 9, lines 418-20 “The intermediate clusters 228 have a property such that every keyword is closely related to one other keyword in the intermediate cluster 228 so that the pair of keywords have a similarity above a threshold (a sematic distance below a threshold)”).
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the clustering component of the proposed combination using the HAC engine taught by Carver as it provides a means of accurately clustering similar keywords using a known clustering algorithm. 

With regard to claim 14 the proposed combination further teaches clustering the social media data records (Zhang, Column 9, lines 65-67 “clustering component 218 groups semantically-similar topics”) by ranking a popularity index (Zhang, Column 10, lines 15-20 “The clustering component 218 may calculate the distance between th two topics by determining their text similarity… the clustering component may apply similarity metrics (e.g., cosine similarity metric)”) of each social media data record as wherein the popularity index as the distance (Id) is defined as a number of elements in each column of the record similarity matrix  as the similarity metric (Zhang, Column 10, lines 15-19 “The clustering component may calculate the distance between two topics by determining there text similarity, namely entity similarity, or surrounding text similarity.  Additionally, the clustering component may apply similarity metrics (e.g. Cosine similarity metric)”)...
Zhang does not explicitly teach wherein the popularity index is defined as a number of elements in each column of the record similarity matrix that exceed predefined criteria, and wherein a most popular record is selected as a first cluster representation and all records that exceed the criteria are eliminated from further selection.
Carver teaches wherein the popularity index is defined as a number of elements in each column of the record similarity matrix as the semantic distance (Carver, Column 9, lines 418-20 “The intermediate clusters 228 have a property such that every keyword is closely related to one other keyword in the intermediate cluster 228 so that the pair of keywords have a similarity above a threshold (a sematic distance below a threshold)”) that exceed predefined criteria as the similarity threshold (Id), and wherein a most popular record is selected as a first cluster representation as the keyword pair being placed in a cluster (Carver, Column 8, lines 50-55 “The first keyword pair is removed from the list 226 after the keywords of the pair are placed in a clusters 228”) and all records that exceed the criteria are eliminated from further selection as removing the keyword pair from the list (Id).  


With regard to claim 15 the proposed combination further teaches wherein the term-document frequency matrix as the BTF (Zhang, Column 9, lines 27-28 “Here, BTF is a Boolean frequency value that is set to 1 if topic t occurs in document d.  Otherwise, the BTF is set to 0”) is used to introduce a single value decomposition technique for topic analysis (Sravanapudi ¶63 “creates an LSA matrix using singular value decomposition (SVD)”).

With regard to claim 16 the proposed combination further teaches using POS tag information (Sravanapudi, ¶43 “text may be tagged with a part of speech”) to identify nouns in the term-document frequency matrix (Sravanapudi, ¶43 “the parts of speech may be used to identify the noun phrases included in the text”).

With regard to claim 17 the proposed combination further teaches wherein a POS tag module is used to define the POS tag information (Sravanapudi, ¶43 “text may be tagged with a part of speech”).

wherein the POS tag information (Sravanapudi, ¶43 “the concept extractor extracts concepts from the text… text may be tagged with a part of speech”) is further used to retrieve most common web pages (Sravanapudi, ¶44 the concept extractor also may weigh the concepts… depending on a frequency with which the concept appears in the text”) and topic word order as the specific phrase used for names of people, places, entities, and companies have a specific order for the words (Sravanapudi, ¶43 “a list of proper nouns may be used to recognize proper nouns from text.  The proper nouns may include names of people, places, entities, companies, and products”).

Claim 28 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Sravanapudi, Najork and Wolfram [Singular Value Decomposition].
With regard to claim 28, the proposed combination further teaches using a single value decomposition (Sravanapudi ¶63 “creates an LSA matrix using singular value decomposition (SVD)”) to detect topic words (Sravanapudi, ¶43 “The concept extractor 125 extracts concepts from text… Lexical Semantic Analysis (LSA) may be used ot identify the concepts included in the extracted text”) in the social media data records (Zhang, Column 5, lines 35-37 “data from users is communicated from the social stream 102 to the trending topic tool 106 via the social analysis tool 104, such as ADOBE SOCIAL”), such that the term-document frequency matrix as the BTF (Zhang, Column 9, lines 27-28 “Here, BTF is a Boolean frequency value that is set to 1 if topic t occurs in document d.  Otherwise, the BTF is set to 0”) is represented by:…
all elements being zero (Zhang, Column 9, lines 27-29 “Otherwise, the BTF is set to 0”) except its top p diagonal elements (Zhang, Column 9, lines 27-29 “BTF is a Boolean frequency value that is set to 1 if topic t occurs in document d”), p being a rank as the BTF is used to calculate the rank of the topic (Wolfman, line 25-26”) of the term-document frequency matrix T as the BTF (Zhang, Column 9, lines 27-28 “Here, BTF is a Boolean frequency value that is set to 1 if topic t occurs in document d.  Otherwise, the BTF is set to 0”).
Zhang does not explicitly teach that the term-document frequency matrix is represented by:  
    PNG
    media_image1.png
    22
    137
    media_image1.png
    Greyscale
 wherein T is the term-document frequency matrix having n by m values; wherein after the single value decomposition, U is an n by n orthogonal matrix represented by UT U = 1, and V is an m by m orthogonal matrix VTV = 1; and wherein Ʃ is a diagonal matrix with all elements being zero except its top p diagonal elements, p being a rank of the term-document frequency matrix T.
Wolfram teaches 
 
    PNG
    media_image1.png
    22
    137
    media_image1.png
    Greyscale
  (Wolfram, Formula 1 A=UDVT where “A is an m x n real matrix”).
wherein T is the term-document frequency matrix having n by m values (Wolfram, Above Formula 1, “A is an m x n real matrix with m > n”); 
wherein after the single value decomposition (Wolfram, above Formula 1 “A can be written using a so-called singular value decomposition”), U is an n by n (Wolfram, between formula 1 and 2, “Mathematica defines U as an m x m”) orthogonal matrix (Wolfram, between formula 1 and 2, “In both systems, U and V have orthogonal columns”) represented by UT U = 1 (Wolfman, Formula 2) , and V is an m by m orthogonal matrix (Wolfram, between formula 1 and 2, “In both systems, U and V have orthogonal columns”) VTV = 1 (Wolfman, Formula 3); and  
wherein Ʃ as D (Wolfman, Formula 1) is a diagonal matrix (Wolfman, between Formula 3 and 4 “D has entries only along the diagonal”) …except its top p diagonal elements (Wolfman, between Formula 3 and 4 “D has entries only along the diagonal”), …
The proposed combination creates an LSA matrix using singular value decomposition (Sravanapudi, ¶63) as described above.  The proposed combination of art does not detail how this calculation is performed.  It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the SVD using the mathematical definitions taught by Wolfram.  One of ordinary skill in the art would recognize the mathematical definitions taught by Wolfman as a means of calculating the SVD, and would expect such calculations to yield the predictable results of generating the needed LSA matrix using SVD.

Response to Arguments
Applicant's arguments filed July 6, 2021 have been fully considered.  

With regard to claims 1 and 4, applicant argues that Zhang teaches away from using TF-IDF (Page 8-11 of remarks).  
These arguments are moot in view of the new ground of rejection presented.

in a current context… downloading the web pages having the current context, and extracting text from web pages in the current context (Page 11-12 of remarks).  Specifically applicant argues that Najork limits the web sniffer to manage the web content for a particular URL, and is not limited to a current context.
In response to applicant's argument, applicant is reminded that although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).  Neither the claims nor the specification define the scope of what constitutes a “current context”.  One of ordinary skill may reasonably identify all the web pages for a particular URL as being a current context.  All the web pages for crawled and downloaded by the web sniffer taught by Najork are all related to a single context, specifically the specific URL that is used as the seed for the crawl.  It is suggested that applicant amend the claims do more clearly define the scope of the “current context”.
Based on the above reasoning the applied prior art reads on the claim language.

With regard to claims 20 and 21, applicant argues that Zhang does not teach both the average term distribution vectors and the collection frequency vector (Page 12 of arguments).
These arguments are moot in view of the new ground of rejection presented.

Conclusion                           
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMANDA WILLIS whose telephone number is (571)270-7691. The examiner can normally be reached Monday-Friday 8am-2pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara Kyle can be reached on 571-272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/AMANDA L WILLIS/           Primary Examiner, Art Unit 2156