Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/18/2022 has been entered.
 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:

2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3, 5, 7-11, 13, 15-16, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over He et al. (US 2018/0357262) in view of Thomas et al. (US 2011/0087668).


Regarding Claim 1, He discloses a method of analyzing multidimensional data, comprising: 
obtaining an original set of data comprising a multidimensional set of objects having a sequential order and multiple original dimensions ([0028], 114, 116, 112, and 118, and [0029], “tables 112,118, where each database table is implemented as a two-dimensional table having one or more columns and/or one or more rows. The datastores 104,110 may be implemented as a relational database, a hierarchical database, one or more flat files, or other logical construct for structuring data. The computer-readable storage devices 106-108 may store one or more files 114-116 that represent data in a spreadsheet form (e.g., table-like), where the spreadsheet includes one or more columns and/or one or more rows. The table processing server 122 is configured to retrieve one or more values represented by the intersection of the one or more columns and/or one or more rows for each of the database tables 112,118 and for each of the files 114-116. The values extracted from the various database tables 112,118 and spreadsheet files 114-116 are stored as extracted values 120. In one embodiment, the values extracted from the database 112,118 and the spreadsheet files 114-116 may be stored in a logical arrangement 
selecting a topic-based summarization scheme to summarize the original set of data, the selecting a topic-based summarization scheme comprising selecting a plurality of topics, each of the topics representing a set of original dimensions ([0021], “concept tree,” “concepts,” and [0036], “The data 210 may further include data representing a candidate concept tree 230, one or more selected clusters 232, one or more identified concept(s) 234,” [0049]-[0050], [0053], “selecting the concept of "cities" as a selected candidate cluster should not affect selecting the concept of "countries" as these concepts are generally are not redundant,” He); and
applying the selected topic-based summarization scheme to the original set of data to transform the original set of data into a new set of data comprising a summarized sequence of objects having fewer dimensions than the original set of data, while preserving, within a defined measure, the sequential order of the original set of data ([0043], “the candidate cluster module 218 collapses all node pairs whose edge scores are higher than a selected similarity threshold,” [0094], “a tree reduction with a height of three gains a major performance boost and is able to recover the original tree and capture most of the ground truth concepts. This performance indicator confirms a suspicion that having a few levels of hierarchy in the output concepts helps,  
However, He does not expressly disclose: edit-distance similarity.  Thomas discloses: edit similarity measure ([0015], Thomas).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the He’s system of clusters/concepts by incorporating the edit-distance similarity measure, as disclosed by He, in order to increase the likelihood that any two documents or data items within the same cluster are in fact highly similar to each other ([0060], Thomas). See: KSR International Co. v. Teleflex Inc., 82 USPQ 1385, 1396 (US 2007); MPEP § 2143.


Regarding Claim 3, He/Thomas discloses a method according to claim 1, wherein the selecting a plurality of topics includes:
identifying a plurality of attributes, each of the attributes having a number of the original dimensions ([0038], “if a first value (e.g., "France.02.SVCSRLCT") co-occurs frequently with a second value (e.g., "Germany.06.MNFEU") in the same columns,” He); and 
selecting the one of the attributes with the largest number of the original dimensions ([0038], “if a first value (e.g., "France.02.SVCSRLCT") co-occurs frequently with a second value (e.g., "Germany.06.MNFEU") in the same columns,” He). 


Regarding Claim 5, He/Thomas discloses a method according to claim 1, wherein the applying the topic based summarization scheme includes using the topic based summarization 

Regarding Claim 7, He/Thomas discloses a method according to claim 1, wherein the producing a many-to-one mapping from the original dimensions to the topics includes treating each of the original dimensions as a singleton cluster, and successively merging pairs of the clusters ([0048], “the candidate cluster module 218 merges entities with a high similarity to form first-level nodes (e.g., initial candidate clusters 228) corresponding to narrow concepts. The candidate cluster module 218 then iteratively merges the initial set of candidate clusters 228 to form candidate clusters corresponding to "super-concepts" (e.g., clusters that represent one or more concepts) and so forth, resulting in the candidate concept tree 230. By observing the candidate concept tree 230, it is evident that the candidate cluster module 218 (e.g., through Algorithm 1) gradually merges the individual ATUs according to geological locations, into ATUs belonging to the same continent (e.g., because these ATUs occur much more often together in same table columns), and finally into ATUs in the world,” He). 

Regarding Claim 8, He/Thomas discloses a method according to claim 7, wherein the successively merging pairs of the clusters includes successively merging the pairs of the clusters until all clusters have been merged into a single cluster that contains all the original dimensions to create a hierarchy including a multitude of leaf nodes and a root node, each of the leaf nodes representing one of the original dimensions, and the root node representing said single cluster, and the hierarchy further including a plurality of levels between the leaf nodes and the root node, 

Regarding Claim 9, He/Thomas discloses a method according to claim 8, wherein the performing dimensionality reduction on the original set of data further include cutting the hierarchy at one of the levels to obtain a selected number of the clusters ([0094], “a tree reduction with a height of three gains a major performance boost and is able to recover the original tree and capture most of the ground truth concepts. This performance indicator confirms a suspicion that having a few levels of hierarchy in the output concepts helps, but an over-complicated tree with too many levels of hierarchy does not meaningfully contribute to the performance,” [0123], “By treating the concept tree 236 as a CSSHC, the table corpus processing server 122 generates a concept tree 236 having few (or no) redundant clusters (e.g., concept nodes), which results in faster traversals of the concept tree 236 and reduced storage requirements to store the concept tree 236. As the concept tree 236 may include nodes having more than one million distinct values, compressing the concept tree 236 to reduce redundant concept nodes is technically beneficial to improving the performance of the table corpus 

Regarding Claim 10, He/Thomas discloses a method according to claim 9, wherein the cutting the hierarchy at one of the levels to obtain a selected number of the clusters includes finding a minimum similarity threshold so that a distance between any two dimensions in the same cluster is no more than said similarity threshold and no more than the selected number of clusters are formed at said one of the levels ([0094], “a tree reduction with a height of three gains a major performance boost and is able to recover the original tree and capture most of the ground truth concepts. This performance indicator confirms a suspicion that having a few levels of hierarchy in the output concepts helps, but an over-complicated tree with too many levels of hierarchy does not meaningfully contribute to the performance,” [0123], “By treating the concept tree 236 as a CSSHC, the table corpus processing server 122 generates a concept tree 236 having few (or no) redundant clusters (e.g., concept nodes), which results in faster traversals of the concept tree 236 and reduced storage requirements to store the concept tree 236. As the concept tree 236 may include nodes having more than one million distinct values, compressing the concept tree 236 to reduce redundant concept nodes is technically beneficial to improving the performance of the table corpus processing server 122 or any other computing device that may host the concept tree 236 for access by other computing devices,” He).

Regarding Claims 11 and 16, He discloses a system for analyzing multidimensional data, comprising: 
one or more processors ([0030], He); and 

said one or more processors configured for: 
obtaining an original set of data comprising a multidimensional set of objects having a sequential order and multiple original dimensions ([0028], 114, 116, 112, and 118, and [0029], “tables 112,118, where each database table is implemented as a two-dimensional table having one or more columns and/or one or more rows. The datastores 104,110 may be implemented as a relational database, a hierarchical database, one or more flat files, or other logical construct for structuring data. The computer-readable storage devices 106-108 may store one or more files 114-116 that represent data in a spreadsheet form (e.g., table-like), where the spreadsheet includes one or more columns and/or one or more rows. The table processing server 122 is configured to retrieve one or more values represented by the intersection of the one or more columns and/or one or more rows for each of the database tables 112,118 and for each of the files 114-116. The values extracted from the various database tables 112,118 and spreadsheet files 114-116 are stored as extracted values 120. In one embodiment, the values extracted from the database 112,118 and the spreadsheet files 114-116 may be stored in a logical arrangement (e.g., an array or other logical construct) so as to preserve the column structure of the database table and/or spreadsheet file from which the extracted values 120 were obtained. In this manner, when one or more values are extracted from a given column of a database table and/or spreadsheet file, the values of the given column maintain their associations. Additionally, and/or alternatively, the ordering in which the values appear in the given column may also be preserved,” wherein the order in which the values appear is an example of sequential order as claimed, He), each object having a corresponding set of attributes (([0028], 114, 116, 112, and 118, and [0029], wherein the values correspond to the attributes claimed; He); 

applying the selected topic-based summarization scheme to the original set of data to transform the original set of data into a new set of data comprising a summarized sequence of objects having fewer dimensions than the original set of data, while preserving, within a defined measure, the sequential order of the original set of data ([0043], “the candidate cluster module 218 collapses all node pairs whose edge scores are higher than a selected similarity threshold,” [0094], “a tree reduction with a height of three gains a major performance boost and is able to recover the original tree and capture most of the ground truth concepts. This performance indicator confirms a suspicion that having a few levels of hierarchy in the output concepts helps, but an over-complicated tree with too many levels of hierarchy does not meaningfully contribute to the performance,” [0123], “By treating the concept tree 236 as a CSSHC, the table corpus processing server 122 generates a concept tree 236 having few (or no) redundant clusters (e.g., concept nodes), which results in faster traversals of the concept tree 236 and reduced storage requirements to store the concept tree 236. As the concept tree 236 may include nodes having more than one million distinct values, compressing the concept tree 236 to reduce redundant concept nodes is technically beneficial to improving the performance of the table corpus processing server 122 or any other computing device that may host the concept tree 236 for  
However, He does not expressly disclose: edit-distance similarity.  Thomas discloses: edit similarity measure ([0015], Thomas).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the He’s system of clusters/concepts by incorporating the edit-distance similarity measure, as disclosed by He, in order to increase the likelihood that any two documents or data items within the same cluster are KSR International Co. v. Teleflex Inc., 82 USPQ 1385, 1396 (US 2007); MPEP § 2143.

Regarding Claims 13 and 18, He/Thomas discloses a system, wherein the selecting a plurality of topics includes: 
identifying a plurality of attributes, each of the attributes having a number of the original dimensions ([0038], “if a first value (e.g., "France.02.SVCSRLCT") co-occurs frequently with a second value (e.g., "Germany.06.MNFEU") in the same columns,” He); and 
selecting the one of the attributes with the largest number of the original dimensions ([0038], “if a first value (e.g., "France.02.SVCSRLCT") co-occurs frequently with a second value (e.g., "Germany.06.MNFEU") in the same columns,” He). 


Regarding Claims 15 and 20, He/Thomas discloses a system, wherein the applying the topic based summarization scheme includes using the topic based summarization scheme to form the new set of data ([0048], “the candidate cluster module 218 merges entities with a high similarity to form first-level nodes (e.g., initial candidate clusters 228) corresponding to narrow concepts,” He).

Response to Arguments

Applicant argues that the applied art fails to teach; “obtaining an original set of data comprising a multidimensional set of objects having a sequential order and multiple original ”
The Examiner respectfully disagrees.  The applied art does disclose the claimed limitation; obtaining an original set of data comprising a multidimensional set of objects having a sequential order and multiple original dimensions ([0028], 114, 116, 112, and 118, and [0029], “tables 112,118, where each database table is implemented as a two-dimensional table having one or more columns and/or one or more rows. The datastores 104,110 may be implemented as a relational database, a hierarchical database, one or more flat files, or other logical construct for structuring data. The computer-readable storage devices 106-108 may store one or more files 114-116 that represent data in a spreadsheet form (e.g., table-like), where the spreadsheet includes one or more columns and/or one or more rows. The table processing server 122 is configured to retrieve one or more values represented by the intersection of the one or more columns and/or one or more rows for each of the database tables 112,118 and for each of the  

In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., “value ordering”) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).

In response to applicant's argument that “He et al. actually is solving a different problem,” it has been held that a prior art reference must either be in the field of applicant’s endeavor or, if not, then be reasonably pertinent to the particular problem with which the applicant was concerned, in order to be relied upon as a basis for rejection of the claimed invention.  See In re Oetiker, 977 F.2d 1443, 24 USPQ2d 1443 (Fed. Cir. 1992).  In this case, He is in applicant’s field of endeavor of summarizing databases (See: abstract, [0090], [0093], He).

Conclusion


Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aleksandr Kerzhner can be reached on (571) 270-1760.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/GIOVANNA B COLAN/Primary Examiner, Art Unit 2165
January 26, 2022