DETAILED ACTION
Claims 1-20 are presented for examination. Claims 1, 6, 9, 10, 15, and 18-20 stand currently amended.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Finality of Office Action
The following is a brief summary description of new ground(s) of rejection (if any) and the reason why those new ground(s) are made necessary by this amendment:
US patent 8,380,642 B2 Stundner, et al. [herein “Stundner”] is added as a reference as necessitated by the added claim language of experts labeling clusters.
Response to Arguments
Applicant's remarks filed 11 August 2022 have been fully considered and Examiner’s response is as follows:
Applicant remarks page 7 argues:
Sidahmed et al. describe the use of a word cloud as a visualization method to highlight prominent concepts buried in the reports and their progression over time. See page 7 of Sidahmed et al. In contrast, amended claim 1 involves "displaying at least one set of text from the one or more clusters as a word cloud for each of the one or more clusters" and "labeling each of the one or more clusters by an expert user based on the displaying of the word cloud for each of the one or more clusters." In this manner, the word cloud for a cluster is displayed and the expert user can employ the word cloud as a tool to aid in properly labeling the cluster. These features are not taught or suggested in Sidahmed et al. 
This argument has been fully considered and is persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made over Sidahmed, M., et al. “Augmenting Operations Monitoring by Mining Unstructured Drilling Reports” Society of Petroleum Engineers, SPE-173429-MS (2015) [herein “Sidahmed”] in view of US patent 8,380,642 B2 Stundner, et al. [herein “Stundner”].
In particular, Stundner column 16 lines 28-31 teach “semi-supervised learning.”
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3 and 6-9 are rejected under 35 U.S.C. 103 as being unpatentable over Sidahmed, M., et al. “Augmenting Operations Monitoring by Mining Unstructured Drilling Reports” Society of Petroleum Engineers, SPE-173429-MS (2015) [herein “Sidahmed”] in view of US patent 8,380,642 B2 Stundner, et al. [herein “Stundner”].
Claim 1 recites “1. A method, comprising: generating a structured data object from a plurality of data files in a data repository.” Sidahmed page 4 last paragraph discloses:
We established a process for processing digital reports and extracting textual content. The next step to prepare the data, transformed the text to build a matrix representation in the form of document-term matrix, For scalability purposes. we designed the process to be able to ingest different report formats with various sizes. We found that storing concept level extractions from the report corpus in vector format to have performance efficiency and accelerate subsequent operations and analysis.
The matrix/vector format representation is a generated structured data object. The digital reports and text are data files.
Claim 1 further recites “preprocessing the structured data object to identify one or more features from the structured data object.” Sidahmed page 5 last two paragraphs disclose:
We adopted term frequency-inverted document frequency (TF-IDF) weighting schema with normalized word frequency to determine the relative importance of each word in reports. … Subsequent task involve finding units of text by constructing n-gram concepts. … It is also used as a preprocessing step for subsequent text analysis
The term weighting and relative importance is a feature of the structured data object. TF-IDF and n-gram preprocessing are preprocessing of the structured data object to identify corresponding term features.
Claim 1 further recites “executing an unsupervised machine-learning technique to identify one or more clusters of data files in the plurality of data files from the one or more features.” Sidahmed page 11 second paragraph discloses “some of the unsupervised learning algorithms can be adapted to address the issue of identifying relatively similar reports.” Sidahmed page 11 third paragraph discloses “We employed multiple competing clustering algorithms.” Sidahmed page 5 last line discloses “reports clustering.” Unsupervised learning algorithms are respective unsupervised machine-learning techniques. Clustering to identify similar reports is identifying clusters of data files based on the features.
Claim 1 further recites “displaying at least one set of text from the one or more clusters as a word cloud for each of the one or more clusters.” Sidahmed page 11 fourth paragraph discloses:
The clustering exercise significantly reduces dimensionality of the data and offer enhanced resolution for examining similarities among seemingly disparate collection of documents. Figure 8 presents five clusters of relevant topics identified by domain experts as relevant considerations. We specified segmentation of the sample data into five clusters and plotted scatter of clusters on two dimensions represented by the first two principal component factors.
The scatter plot of the clusters is a display of the text of the clusters.
Sidahmed page 6 figure 3 shows three word clouds. Sidahmed page 7 discloses “We used word cloud as a visualization mechanism for highlighting most prominent concepts buried in the reports and their progression over time. Figure 3 shows trending of key concepts over multiple time periods.” Using word clouds is displaying word clouds to a user for corresponding clusters.
Claim 1 further recites “and labeling each of the one or more clusters by an expert user based on the displaying of the word cloud for each of the one or more clusters.” Sidahmed page 11 fifth paragraph discloses “Extracted terms for this group of reports address common topic and support abstraction of trending patterns from the collection.” The group/collection corresponds to a cluster. The extracted terms and topics are respective labels for the respective cluster.
Sidahmed does not explicitly disclose further labeling by an expert after the word clouds are displayed; however, in analogous art of decision support tools, Stundner column 16 lines 28-31 teach “semi-supervised learning, in that the map corrector portion 418 ‘adds’ some human supervision to an originally unsupervised method, and active learning, in that supervisory feedback from the user (or expert) may be involved.” The added feedback from an expert is labelling by an expert user.”
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Sidahmed and Stundner. One having ordinary skill in the art would have found motivation to use semi-supervised learning into the system of augmenting mining operations with drilling reports for the advantageous purpose of adding some human supervision to an originally unsupervised method. See Stundner column 16 lines 28-31.
Claim 2 further recites “2. The method of claim 1, further comprising visualizing a representation of a multi-dimensional space including representations of the data files based on a similarity of each of the data files.” Sidahmed page 11 fourth paragraph discloses:
The clustering exercise significantly reduces dimensionality of the data and offer enhanced resolution for examining similarities among seemingly disparate collection of documents. Figure 8 presents five clusters of relevant topics identified by domain experts as relevant considerations. We specified segmentation of the sample data into five clusters and plotted scatter of clusters on two dimensions represented by the first two principal component factors.
The scatter plot of the clusters is a presentation of the text of the clusters based on a similarity of the data files.
Claim 3 further recites “3. The method of claim 1, further comprising: generating one or more oilfield analytics based on the plurality of data files in at least one of the one or more clusters; and executing one or more oilfield operations based on the one or more oilfield analytics.” Sidahmed page 2 first paragraph discloses “extending existing monitoring and diagnostic tools supporting decision-maker planning and intervention.” Decision maker planning involves analytics. Interventions are executed operations. Sidahmed page 2 sixth paragraph discloses “unstructured content mining for extracting useful patterns and information in support of decision making.” Extracted patterns and information are analytics based on the data. Sidahmed page 3 second to last paragraph discloses “to inform decisions at the well site and in office during mid post well construction operations.” A well site is part of an oilfield. Post well construction operations are oilfield operations based on the corresponding analytics.
Claim 6 further recites “6. The method of claim 1, further comprising: generating a matrix based at least in part on the one or more features, wherein the one or more features comprise a number of features, and wherein the matrix comprises one or more frequency values representing a frequency of at least two words in each of the plurality of files.” Sidahmed page 4 last paragraph discloses:
We established a process for processing digital reports and extracting textual content. The next step to prepare the data, transformed the text to build a matrix representation in the form of document-term matrix, For scalability purposes. we designed the process to be able to ingest different report formats with various sizes. We found that storing concept level extractions from the report corpus in vector format to have performance efficiency and accelerate subsequent operations and analysis.
The matrix/vector format representation is a generated structured data object. The digital reports and text are data files. The matrix is a generated matrix comprising frequency values of respective words.
Sidahmed page 5 last two paragraphs disclose:
We adopted term frequency-inverted document frequency (TF-IDF) weighting schema with normalized word frequency to determine the relative importance of each word in reports. … Subsequent task involve finding units of text by constructing n-gram concepts. … It is also used as a preprocessing step for subsequent text analysis
The term weighting and relative importance is a feature of the structured data object. TF-IDF and n-gram preprocessing are preprocessing of the structured data object.
Sidahmed page 9 last paragraph discloses “We extended the single-word (1-gram) term extraction from the drilling reports to construct two-word (2-gram) and three-word (3-gram) phrases.” Two and three gram phrases are at least two words.
Claim 6 further recites “determining a distance between the at least two words, wherein the distance is between multi-dimensional planes of each cluster created by the number of features, wherein the multi-dimensional planes each have a number of dimensions corresponding to the number of features; and identifying the one or more clusters using the distance between the at least two words.” Sidahmed page 11 fourth paragraph discloses:
The clustering exercise significantly reduces dimensionality of the data and offer enhanced resolution for examining similarities among seemingly disparate collection of documents. Figure 8 presents five clusters of relevant topics identified by domain experts as relevant considerations. We specified segmentation of the sample data into five clusters and plotted scatter of clusters on two dimensions represented by the first two principal component factors.
The scatter plot of the clusters is a presentation of the text of the clusters. The scatter plot indicates a distance between features. The planes corresponding to the first two principal component factors is a multi-dimensional plane having a number of dimensions corresponding to those respective features. Clustering based on this is identifying corresponding one or more clusters using the distance.
Claim 7 further recites “7. The method of claim 6, further comprising identifying a boundary for each of the one or more clusters, wherein the boundary represents a distance from a centroid value that separates a first cluster from a second cluster.” Sidahmed page 6 last paragraph discloses “we restricted each cluster center to be represented by one of the actual points, instead of using the mean point.” The cluster center is a centroid value of a cluster.
Sidahmed page 9 figure 7 right side shows five ellipses identifying five corresponding clusters in the scatter plot diagram. The shown ellipses separate and form a boundary between different clusters and are defined by respective distances from the cluster center.
Claim 8 further recites “8. The method of claim 7, wherein the boundary comprises a boundary in the number of dimensions.” Sidahmed page 9 figure 7 right side shows five ellipses identifying five corresponding clusters in the scatter plot diagram. The ellipses are two dimensional boundaries corresponding to the two dimensions of the scatter plot.
Claim 9 further recites “9. The method of claim 1, wherein the word cloud comprises an image depicting terms organized according to a size based on a frequency of the terms in each of the one or more clusters.” Sidahmed page 6 figure 3 shows three word clouds. Sidahmed page 7 discloses “We used word cloud as a visualization mechanism for highlighting most prominent concepts buried in the reports and their progression over time. Figure 3 shows trending of key concepts over multiple time periods.” Using word clouds is presenting word clouds to a user for corresponding clusters. Sidahmed page 6 figure 3 shows the size of the words based on a frequency of the term in the corresponding reports.
Dependent Claims 4 and 5
Claims 4 and 5 are rejected under 35 U.S.C. 103 as being unpatentable over Sidahmed and Stundner as applied to claim 1 above, and further in view of US patent 10,364,662 B1 Basu, et al. [herein “Basu”].
Claim 4 further recites “4. The method of claim 1, wherein the plurality of data files comprise a combination of one or more structured data files and one or more unstructured data files.” Sidahmed title discloses “Augmenting Operations Monitoring by Mining Unstructured Drilling Reports.” Unstructured reports are unstructured data.
Sidahmed page 2 fourth paragraph discloses “the majority of these approaches focused on structured data in the form of: sensors measurement […] and gauges reading.”
Sidahmed does not explicitly disclose accessing structured data; however, in analogous art of analyzing oilfield data, Basu column 11 lines 14-15 teaches “the preprocessing subsystem 4102 can access structured data sources 4112 and unstructured data sources 4114.” Structured and unstructured data sources are structured and unstructured data files.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Sidahmed, Stundner, and Basu. One having ordinary skill in the art would have found motivation to use both structured and unstructured data into the system of augmenting mining operations with drilling reports for the advantageous purpose of accessing data useful for other systems including modeling and analytics systems. See Basu column 11 lines 10-13 and column 12 lines 8-10.
Claim 5 further recites “5. The method of claim 1, wherein the structured data object comprises metadata for each of the data files, wherein the metadata comprises at least one of a file name, a file type, a file title, a hyperlink to the file, or a path of the file.” From the above list of alternatives the Examiner is selecting “a file type.”
Sidahmed does not explicitly disclose accessing structured data; however, in analogous art of analyzing oilfield data, Basu column 11 lines 14-15 teaches “the preprocessing subsystem 4102 can access structured data sources 4112 and unstructured data sources 4114.” Structured and unstructured data sources are structured and unstructured data files.
Basu column 16 lines 30-39 shows two example data sources “\\data_folder\data_file.txt” and “\\data_folder\interview.mp3.” A person of ordinary skill in the art would recognize these data source examples are paths of the file. Furthermore, the file extensions “.txt” and “.mp3” are indicative of a file type. Basu column 12 lines 1-4 teach “the configuration script 4110 can identify a data source such as an unstructured data source 4114, the nature the data source (e.g., narrative text, audio, image, or video).” The nature of the data source as either text, audio, image, or video is a file type.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Sidahmed, Stundner, and Basu. One having ordinary skill in the art would have found motivation to use both structured and unstructured data into the system of augmenting mining operations with drilling reports for the advantageous purpose of accessing data useful for other systems including modeling and analytics systems. See Basu column 11 lines 10-13 and column 12 lines 8-10.
Claims 10-20
Claims 10-20 are rejected under 35 U.S.C. 103 as being unpatentable over Sidahmed in view of Stundner and Basu.
Claim 10 recites “10. A computing system, comprising: one or more processors; and a memory system including one or more non-transitory, computer-readable media storing instructions that, when executed by at least one of the one or more processors cause the computing system to perform operations.” Sidahmed does not explicitly disclose processors, memory, or computer-readable media; however, in analogous art of analyzing oilfield data, Basu column 17 lines 57-59 teaches “computing devices 2210 (e.g. computer systems, personal data assistants, kiosks, dedicated terminals, etc.).” A computer system has a process and memory.
Basu column 19 lines 40-42 teach “A computer program product, includes a computer usable medium having a computer readable program code non-transitorily embodied therein.” A computer medium with non-transitorily stored code is a non-transitory computer-readable medium storing instructions.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Sidahmed, Stundner, and Basu. One having ordinary skill in the art would have found motivation to use a computer system for the system of augmenting mining operations with drilling reports for the advantageous purpose of accessing data useful for other systems including modeling and analytics systems. See Basu column 11 lines 10-13 and column 12 lines 8-10.
Claim 10 further recites “the operations comprising: generating a structured data object from a plurality of data files in a data repository.” Sidahmed page 4 last paragraph discloses:
We established a process for processing digital reports and extracting textual content. The next step to prepare the data, transformed the text to build a matrix representation in the form of document-term matrix, For scalability purposes. we designed the process to be able to ingest different report formats with various sizes. We found that storing concept level extractions from the report corpus in vector format to have performance efficiency and accelerate subsequent operations and analysis.
The matrix/vector format representation is a generated structured data object. The digital reports and text are data files.
Claim 10 further recites “preprocessing the structured data object to identify one or more features from the structured data object.” Sidahmed page 5 last two paragraphs disclose:
We adopted term frequency-inverted document frequency (TF-IDF) weighting schema with normalized word frequency to determine the relative importance of each word in reports. … Subsequent task involve finding units of text by constructing n-gram concepts. … It is also used as a preprocessing step for subsequent text analysis
The term weighting and relative importance is a feature of the structured data object. TF-IDF and n-gram preprocessing are preprocessing of the structured data object to identify corresponding term features.
Claim 10 further recites “executing an unsupervised machine-learning technique to identify one or more clusters of data files in the plurality of data files from the one or more features.” Sidahmed page 11 second paragraph discloses “some of the unsupervised learning algorithms can be adapted to address the issue of identifying relatively similar reports.” Sidahmed page 11 third paragraph discloses “We employed multiple competing clustering algorithms.” Sidahmed page 5 last line discloses “reports clustering.” Unsupervised learning algorithms are respective unsupervised machine-learning techniques. Clustering to identify similar reports is identifying clusters of data files based on the features.
Claim 10 further recites “displaying at least one set of text from the one or more clusters as a word cloud for each of the one or more clusters.” Sidahmed page 11 fourth paragraph discloses:
The clustering exercise significantly reduces dimensionality of the data and offer enhanced resolution for examining similarities among seemingly disparate collection of documents. Figure 8 presents five clusters of relevant topics identified by domain experts as relevant considerations. We specified segmentation of the sample data into five clusters and plotted scatter of clusters on two dimensions represented by the first two principal component factors.
The scatter plot of the clusters is a display of the text of the clusters.
Sidahmed page 6 figure 3 shows three word clouds. Sidahmed page 7 discloses “We used word cloud as a visualization mechanism for highlighting most prominent concepts buried in the reports and their progression over time. Figure 3 shows trending of key concepts over multiple time periods.” Using word clouds is displaying word clouds to a user for corresponding clusters.
Claim 10 further recites “and labeling each of the one or more clusters by an expert user based on the displaying of the word cloud for each of the one or more clusters.” Sidahmed page 11 fifth paragraph discloses “Extracted terms for this group of reports address common topic and support abstraction of trending patterns from the collection.” The group/collection corresponds to a cluster. The extracted terms and topics are respective labels for the respective cluster.
Sidahmed does not explicitly disclose further labeling by an expert after the word clouds are displayed; however, in analogous art of decision support tools, Stundner column 16 lines 28-31 teach “semi-supervised learning, in that the map corrector portion 418 ‘adds’ some human supervision to an originally unsupervised method, and active learning, in that supervisory feedback from the user (or expert) may be involved.” The added feedback from an expert is labelling by an expert user.”
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Sidahmed and Stundner. One having ordinary skill in the art would have found motivation to use semi-supervised learning into the system of augmenting mining operations with drilling reports for the advantageous purpose of adding some human supervision to an originally unsupervised method. See Stundner column 16 lines 28-31.
Dependent claims 11-18 are substantially similar to claims 2-9 above and are rejected for the same reasons.
Claim 19 recites “19. A non-transitory, computer-readable medium storing instructions that, when executed by at least one processor of a computing system, cause the computing system to perform operations.” Sidahmed does not explicitly disclose processors, memory, or computer-readable media; however, in analogous art of analyzing oilfield data, Basu column 17 lines 57-59 teaches “computing devices 2210 (e.g. computer systems, personal data assistants, kiosks, dedicated terminals, etc.).” A computer system has a process and memory.
Basu column 19 lines 40-42 teach “A computer program product, includes a computer usable medium having a computer readable program code non-transitorily embodied therein.” A computer medium with non-transitorily stored code is a non-transitory computer-readable medium storing instructions.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Sidahmed, Stundner, and Basu. One having ordinary skill in the art would have found motivation to use a computer system for the system of augmenting mining operations with drilling reports for the advantageous purpose of accessing data useful for other systems including modeling and analytics systems. See Basu column 11 lines 10-13 and column 12 lines 8-10.
Claim 19 further recites “the operations comprising: generating a structured data object from a plurality of data files in a data repository.” Sidahmed page 4 last paragraph discloses:
We established a process for processing digital reports and extracting textual content. The next step to prepare the data, transformed the text to build a matrix representation in the form of document-term matrix, For scalability purposes. we designed the process to be able to ingest different report formats with various sizes. We found that storing concept level extractions from the report corpus in vector format to have performance efficiency and accelerate subsequent operations and analysis.
The matrix/vector format representation is a generated structured data object. The digital reports and text are data files.
Claim 19 further recites “preprocessing the structured data object to identify one or more features from the structured data object.” Sidahmed page 5 last two paragraphs disclose:
We adopted term frequency-inverted document frequency (TF-IDF) weighting schema with normalized word frequency to determine the relative importance of each word in reports. … Subsequent task involve finding units of text by constructing n-gram concepts. … It is also used as a preprocessing step for subsequent text analysis
The term weighting and relative importance is a feature of the structured data object. TF-IDF and n-gram preprocessing are preprocessing of the structured data object to identify corresponding term features.
Claim 19 further recites “executing an unsupervised machine-learning technique to identify one or more clusters of data files in the plurality of data files from the one or more features.” Sidahmed page 11 second paragraph discloses “some of the unsupervised learning algorithms can be adapted to address the issue of identifying relatively similar reports.” Sidahmed page 11 third paragraph discloses “We employed multiple competing clustering algorithms.” Sidahmed page 5 last line discloses “reports clustering.” Unsupervised learning algorithms are respective unsupervised machine-learning techniques. Clustering to identify similar reports is identifying clusters of data files based on the features.
Claim 19 further recites “displaying at least one set of text from the one or more clusters as a word cloud for each of the one or more clusters.” Sidahmed page 11 fourth paragraph discloses:
The clustering exercise significantly reduces dimensionality of the data and offer enhanced resolution for examining similarities among seemingly disparate collection of documents. Figure 8 presents five clusters of relevant topics identified by domain experts as relevant considerations. We specified segmentation of the sample data into five clusters and plotted scatter of clusters on two dimensions represented by the first two principal component factors.
The scatter plot of the clusters is a display of the text of the clusters.
Sidahmed page 6 figure 3 shows three word clouds. Sidahmed page 7 discloses “We used word cloud as a visualization mechanism for highlighting most prominent concepts buried in the reports and their progression over time. Figure 3 shows trending of key concepts over multiple time periods.” Using word clouds is displaying word clouds to a user for corresponding clusters.
Claim 19 further recites “and labeling each of the one or more clusters by an expert user based on the displaying of the word cloud for each of the one or more clusters.” Sidahmed page 11 fifth paragraph discloses “Extracted terms for this group of reports address common topic and support abstraction of trending patterns from the collection.” The group/collection corresponds to a cluster. The extracted terms and topics are respective labels for the respective cluster.
Sidahmed does not explicitly disclose further labeling by an expert after the word clouds are displayed; however, in analogous art of decision support tools, Stundner column 16 lines 28-31 teach “semi-supervised learning, in that the map corrector portion 418 ‘adds’ some human supervision to an originally unsupervised method, and active learning, in that supervisory feedback from the user (or expert) may be involved.” The added feedback from an expert is labelling by an expert user.”
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Sidahmed and Stundner. One having ordinary skill in the art would have found motivation to use semi-supervised learning into the system of augmenting mining operations with drilling reports for the advantageous purpose of adding some human supervision to an originally unsupervised method. See Stundner column 16 lines 28-31.
Claim 20 further recites “20. The medium of claim 19, wherein the operations further comprise: generating a matrix based at least in part on the one or more features, wherein the one or more features comprise a number of features, and wherein the matrix comprises one or more frequency values representing a frequency of at least two words in each of the plurality of files.” Sidahmed page 4 last paragraph discloses:
We established a process for processing digital reports and extracting textual content. The next step to prepare the data, transformed the text to build a matrix representation in the form of document-term matrix, For scalability purposes. we designed the process to be able to ingest different report formats with various sizes. We found that storing concept level extractions from the report corpus in vector format to have performance efficiency and accelerate subsequent operations and analysis.
The matrix/vector format representation is a generated structured data object. The digital reports and text are data files. The matrix is a generated matrix comprising frequency values of respective words.
Sidahmed page 5 last two paragraphs disclose:
We adopted term frequency-inverted document frequency (TF-IDF) weighting schema with normalized word frequency to determine the relative importance of each word in reports. … Subsequent task involve finding units of text by constructing n-gram concepts. … It is also used as a preprocessing step for subsequent text analysis
The term weighting and relative importance is a feature of the structured data object. TF-IDF and n-gram preprocessing are preprocessing of the structured data object.
Sidahmed page 9 last paragraph discloses “We extended the single-word (1-gram) term extraction from the drilling reports to construct two-word (2-gram) and three-word (3-gram) phrases.” Two and three gram phrases are at least two words.
Claim 20 further recites “determining a distance between the at least two words, wherein the distance is between multi-dimensional planes of each cluster created by the number of features, wherein the multi-dimensional planes each have a number of dimensions corresponding to the number of features; and identifying the one or more clusters using the distance between the at least two words.” Sidahmed page 11 fourth paragraph discloses:
The clustering exercise significantly reduces dimensionality of the data and offer enhanced resolution for examining similarities among seemingly disparate collection of documents. Figure 8 presents five clusters of relevant topics identified by domain experts as relevant considerations. We specified segmentation of the sample data into five clusters and plotted scatter of clusters on two dimensions represented by the first two principal component factors.
The scatter plot of the clusters is a presentation of the text of the clusters. The scatter plot indicates a distance between features. The planes corresponding to the first two principal component factors is a multi-dimensional plane having a number of dimensions corresponding to those respective features. Clustering based on this is identifying corresponding one or more clusters using the distance.
Claim 20 further recites “and identifying a boundary for each of the one or more clusters, wherein the boundary represents a distance from a centroid value that separates a first cluster from a second cluster.” Sidahmed page 6 last paragraph discloses “we restricted each cluster center to be represented by one of the actual points, instead of using the mean point.” The cluster center is a centroid value of a cluster.
Sidahmed page 9 figure 7 right side shows five ellipses identifying five corresponding clusters in the scatter plot diagram. The shown ellipses separate and form a boundary between different clusters and are defined by respective distances from the cluster center.
Claim 20 further recites “and wherein the boundary comprises a boundary in the number of dimensions.” Sidahmed page 9 figure 7 right side shows five ellipses identifying five corresponding clusters in the scatter plot diagram. The ellipses are two dimensional boundaries corresponding to the two dimensions of the scatter plot.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jay B Hann whose telephone number is (571)272-3330. The examiner can normally be reached M-F 10am-7pm EDT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Rehana Perveen can be reached on (571)272-3676. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Jay Hann/Primary Examiner, Art Unit 2148                                                                                                                                                                                                        13 September 2022