DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of the Claims
Claims 1-7, 10-11 and 13-19 are pending of which claims 1 and 14 are in independent form.  Claims 1-7, 10-11 and 13-19 are rejected under 35 U.S.C. 103.  

Response to Claim Amendments and Arguments
The claim amendments and arguments entered as part of the Request for Continued Examination filed on 16 October 2020 and previously submitted in an After Final Action filed on 15 September 2020, as they apply to the 35 U.S.C. 103 rejections of the claims have been fully considered.  
On page 8 of the remarks, Applicant’s representative appears to argue that the Gaglani reference does not disclose the newly amended claim limitations recited in the independent claims, specifically, …displaying the multidimensional graphic visualization on a display device.  Examiner has applied a new reference to address the claim as amended detailed in the rejection below.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 7, 10 and 13-18 are rejected under rejected under 35 U.S.C. 103 as being unpatentable over Gaglani et al. U.S. Pub. No. 2015/0325133 (hereinafter “Gaglani”) in view of Cross, III et al. U.S. Pub. No. 2016/0328386 (hereinafter “Cross”) in view of Crockett et al. U.S. Pub. No. 2012/0310863 (hereinafter “Crockett”) in further view of Gallivan U.S. Pub. No. 2016/0085849 (hereinafter “Gallivan”). 
Regarding independent claim 1, discloses:
obtaining the set of data comprising a stream of terms; analyzing the stream of terms using a modified data streaming generator to find a set of frequent terms… (Gaglani in the Abstract discloses, “…receiving a user data item representative of educational interests of the user. The computer extracts a plurality of words from the user data item, classifies the user data item into a related knowledge domain, and determines a frequency score for each of the plurality of words.”  Additionally, Gaglani at paragraph [0037] discloses a Pre-processing component comprising a tokenization system, tokenizing input data including words which Examiner is of the position discloses a data streaming generator.)

forming a dictionary of terms based on the set of frequent terms; identifying one or more of the set of frequent terms in the set of data; and analyzing the one or more of the set of frequent terms in the set of data to determine a context and/or structure based on neighboring terms in the set of data (Gaglani at paragraph [0037] discloses in part, “In some embodiments, the Pre-Processing Component 120A also includes a tokenization system which tokenizes the words in the input material. In this context, "tokenization" refers to the grouping of works that constitute a single term. For example, multiple sclerosis is normally seen by a computer as two words, but the machine learning system may be used to identify it as a single entity [i.e., identifying and analyzing]…In some embodiments, the tokenization system employed is Wikipedia Miner. Wikipedia Miner performs tokenization by analyzing the link structure of Wikipedia and uses this model to assign a probability that a certain word or set of words is a specific term given the surrounding textual context… In some embodiments, a corpus of terms may be created offline by analyzing a large set of documents to form a token lookup dictionary. This dictionary may then be used to perform the tokenization.”)

generating a multidimensional graphic visualization tool for visualizing the stream of terms based on the analyzing (Gaglani at paragraph [0038] discloses in part, “…, a map of terms to knowledge domains is used to create a histogram of the available or known knowledge domains.”  Examiner is of the position that a histogram discloses a multidimensional graphic visualization tool.  Additionally, Gaglani at paragraph [0041] discloses in part, “For example, in some embodiments, a score is determined for each word in the document and each word in the question. Then, the two texts are compared as if they were vectors in a high dimensional space (i.e., the cosine similarity is like finding the angle between two high dimensional vectors using the dot product).”)

While Gaglani does disclose analyzing the contextual relationship between terms, Gaglani does not specifically disclose:
wherein the analyzing the comprises parsing the set of data to identify syntactic relations, sematic relations, or both between terms...
However, Cross in the Abstract teaches the following:
A computer program that uses structured information, such as syntactic and semantic information, as context for representing words and/or phrases as vectors, by performing the following steps: (i) receiving a first set of natural language text and a set of information pertaining to the first set of natural language text, where the information includes metadata and corresponding contextual information indicating a relationship between the metadata and the first set of natural language text; and (ii) generating a first vector representation for the first set of natural language text utilizing the metadata and its corresponding contextual information.

Both the Gaglani reference and the Cross reference, in the portions cited by the Examiner are in the field of endeavor of natural language processing.  Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the analyzing of the contextual relationship between terms disclosed in Gaglani with the analyzing of syntactic and semantic information between terms taught in Cross to facilitate in the identification of terms and contextual (See Cross at paragraph [0012]).

While Gaglani at paragraphs [0037] – [0039] discloses a stream of terms being parsed in relationship to frequency and surrounding textual context (i.e. position) and keyword identification (i.e., importance), Gaglani does not disclose testing an entropy of a term, more specifically, Gaglani does not disclose:
further comprising testing a entropy or mutual information of each term in the stream of terms that was parsed in relationship to a frequency, a position, and importance.
However, Crockett in the Abstract and paragraph [0056] teaches classifying an unknown gene variant into a knowledge domain, using in part, frequency analysis, positional analysis and a Shannon entropy measurement.
Both the Gaglani reference and the Crockett reference, in the sections cited by the Examiner, are in the field of endeavor of analyzing terms using surrounding terms, frequency and position.  Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the classification of data items into knowledge domains utilizing in part frequency analysis and contextual analysis as disclosed in Gaglani with the classifying of unknown data items (i.e., unknown gene variants) into a knowledge domain utilizing in part, frequency analysis, positional analysis and a Shannon entropy measurement as taught in Crockett to facilitate in the correct classification of data items.

While Gaglani discloses generating a multidimensional graphic visualization tool, Gaglani does not disclose, displaying the multidimensional graphic visualization on a display device.
However, Gallivan at paragraph [0013] teaches modeling terms in multiple dimensions and the user of histograms and Gallivan at paragraph [0043] teaches displaying the visualization.
Both the Gaglani reference and the Gallivan reference, in the sections cited by the Examiner are in the field of endeavor of projecting terms into multidimensional space.  Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the generating of a multidimensional visualization of terms taught in Gaglani with the displaying of the multidimensional graphic visualization taught in Gallivan to facilitate in communicating the data visualization to a user.

Regarding dependent claim 2, all of the particulars of claim 1 have been addressed above.  Additionally, Gaglani discloses:
wherein a term from the stream of terms comprising a single word or a sequential set of words (Gaglani at paragraph [0037] discloses in part, “For example, multiple sclerosis is normally seen by a computer as two words, but the machine learning system may be used to identify it as a single entity.”)

Regarding dependent claim 3, all of the particulars of claim 1 have been addressed above.  Additionally, Gaglani discloses:
wherein the set of data comprises data from one or more data sources (Gaglani at paragraph [0031] discloses in part, “Users provide personal information related to their needs and interests to a software platform referred to herein as a "Knowledge Diffusion Platform." This personal information may include, without limitation, information related to an individual's academic work (e.g., calendars, course documents, lecture audio/video, etc.), web history (e.g., pages or articles read), and/or purchase information (e.g., books or academic journal articles purchased through web-based services).”)

Regarding dependent claim 4, all of the particulars of claim 1 have been addressed above.  Additionally, Gaglani discloses:
further comprises converting the stream of terms into a plurality of input vectors by the modified data streaming generator (Gaglani at paragraph [0041] discloses in part, “For example, in some embodiments, a score is determined for each word in the document and each word in the question. Then, the two texts are compared as if they were vectors in a high dimensional space (i.e., the cosine similarity is like finding the angle between two high dimensional vectors using the dot product).”)

Regarding dependent claim 5, all of the particulars of claims 1 and 4 have been addressed above.  Additionally, Gaglani discloses:
wherein subsequent to the converting the stream of terms, the method further comprises converting the plurality of input vectors into a corresponding plurality of sketch feature vectors by the data streaming generator, wherein each of the plurality of sketch feature vectors has a number of output dimensions that is less than a number of dimensions of a corresponding one of the input vectors (Gaglani at paragraph [0041] discloses in part, “Then, the two texts are compared as if they were vectors in a high dimensional space (i.e., the cosine similarity is like finding the angle between two high dimensional vectors using the dot product). A low threshold may be used to select possible question matches. This primarily weeds out completely irrelevant content or content that otherwise would be weighted as important relative to the knowledge domain simply because of a high frequency of occurrence (e.g., mistakenly pulling all the heart questions in cardio just because the term "heart" was mentioned).”  The Examiner is of the position that Gaglani at paragraph [0041] applying a low threshold for weeding out irrelevant content results in vector or dimensionality reduction.)

Regarding dependent claim 7, all of the particulars of claim 1 have been addressed above.  Additionally Gaglani discloses:
further comprising generating a multidimensional histogram to show interrelationships between terms of the stream of terms (Gaglani at paragraph [0038] discloses in part, “Various types of classification techniques may be used to map input materials to knowledge domains. For example, in some embodiments, a histogram-based classification is used. When an input document arrives, a map of terms to knowledge domains is used to create a histogram of the available or known knowledge domains. The highest histogram bin is then selected as the knowledge domain for the incoming document.”  Examiner is of the position that histograms are necessarily at least two dimensions.)

Regarding dependent claim 10, all of the particulars of claim 1 have been addressed above.  Additionally, Gaglani discloses:
wherein the set of data comprises a scientific or financial corpus of data (Gaglani at paragraph [0038] discloses in part, “The term "knowledge domain" as used herein refers to a particular field of study or subject area. For example, in the context of medical studies, examples of knowledge domain may include anatomy, genetics, cell physiology, immunology, microbiology, hematology, neurology, cardiology, etc.”)

Regarding dependent claim 13, all of the particulars of claim 1 have been addressed above.  Additionally, Gaglani discloses:
further comprising storing the one or more of the set of frequent terms in a data store for further analysis (Gaglani at paragraph [0044] discloses in part, “At step 210, the input data is pre-processed to transform it into a format suitable for storage and used by the Knowledge Diffusion Platform, as discussed above with reference to the Pre-Processing Component 120A.”)

Regarding independent claim 14, claim 14 is rejected under the same rationale as claim 1.  Additionally, with respect to the hardware limitations of the claim, one or more processors; and a memory system comprising one or more non-transitory computer-readable media…(See Gaglani at paragraphs [0068] – [0069]).

Regarding dependent claim 15, all of the particulars of claim 14 have been addressed above.  Additionally, claim 15 is rejected under the same rationale as claim 2.

Regarding dependent claim 16, all of the particulars of claim 14 have been addressed above.  Additionally, claim 16 is rejected under the same rationale as claim 3.

Regarding dependent claim 17, all of the particulars of claim 14 have been addressed above.  Additionally, claim 17 is rejected under the same rationale as claim 4.

Regarding dependent claim 18, all of the particulars of claims 14 and 17 have been addressed above.  Additionally, claim 18 is rejected under the same rationale as claim 5.


Claims 6 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Gaglani in view of Cross in view of Crockett in view of Gallivan in further view of Turner et al. U.S. Pub. No. 2013/0253910 (hereinafter “Turner”).
Regarding dependent claim 6, all of the particulars of claim 1 have been addressed above.  While Gaglani discloses measuring a frequency distribution of terms, Gaglani does not disclose:
further comprising determining Shannon entropy to evaluate a distribution of the terms across the set of data.
However, Turner at paragraph [0151] teaches the following:
Next, a measure of this frequency distribution is calculated to determine how "equally" or "unequally" the given word is associated with a range of other words in the language. FIG. 12. illustrates a hypothetical example of the terms "chair" and "thing" whose collocation distributions indicate strong collocations with few terms (low ambiguity) vs. weak collocations with many terms (high ambiguity). The inequality of the term's co-occurrence or collocation distribution can be measured by a variety of indicators such as Shannon's Entropy, the distribution's scaling exponent, or various measures of inequality.

Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the measuring of a frequency distribution of terms disclosed in Gaglani with the measuring of a frequency distribution of terms using indicators such as Shannon’s Entropy taught in Turner to facilitate in the identification of terms and contextual information.

Regarding dependent claim 19, all of the particulars of claim 14 have been addressed above.  Additionally, claim 19 is rejected under the same rationale as claim 6.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Gaglani in view Cross in view of Crockett in view of Gallivan in further view of Sevenster et al. U.S. Pub. No. 2016/0148374 (hereinafter “Sevenster”).
Regarding dependent claim 11, all of the particulars of claim 1 have been addressed above.  While Gaglani does disclose analyzing the contextual relationship between terms, Gaglani does not disclose:
wherein the analyzing the stream of terms, further comprises extracting negations and determining a structure surrounding the negation.
However, Sevenster at paragraph [0016] teaches in part, “The natural language processing engine further uses a negation detection application or other application that interconnects extracted findings to increase a granularity of the information extraction.”
Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the analyzing of the contextual relationship between terms disclosed in Gaglani with the natural language analysis detecting negation taught in Sevenster to facilitate in the identification of terms and contextual information.

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
2016/0147878
Abstract and Paragraphs [0003] and [0005] as it relates to utilizing a Shannon Entropy  calculation as part of a semantic search engine implementing natural language processing.
2010/0332287
Paragraphs [0022] and [0049] as it relates to using an entropy classifier in natural language processing tasks, and the creation of a category specific word database using a Shannon entropy measure.  
9,135,242
Column 1, Lines 42-47 as it relates to natural language processing in analyzing a large text corpora and Column 9, Lines 32-40 as it relates to the user of a Shannon Entropy measure in calculating uncertainty in a document.
2008/0288537
Paragraphs [0015] and [0017] as it relates to syntactic and semantic analysis of terms, neighboring terms and entropy.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANTHONY G GEMIGNANI whose telephone number is (571)272-1018.  The examiner can normally be reached on M-F 8-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain T Alam can be reached on 571-272-3978.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/A.G.G./Examiner, Art Unit 2154                                                                                                                                                                                                        
/HOSAIN T ALAM/Supervisory Patent Examiner, Art Unit 2154