DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/20/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Objections
Claim 11 is objected to because of the following informalities:  in lines 5-6, “in which an object feature vector belonging to the cluster has changed before and after the assignment in the assignment means is present” is unclear. For examination purposes, this limitation is construed to mean in which a cluster has changed before and after an assignment period. The meaning of “in the assignment means is present” is not clear. Furthermore, in lines 7-18, “the first array element in the array element among array elements” is unclear. For examination purposes, this limitation is construed to mean the first array element of array elements in a transposed file.  Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


	Claims 9-28 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
	Regarding claim 9, the claim recites: A computer-implemented method for clustering sparse data, the method comprising: storing a transposed file, the transposed file including array elements, each of the array elements storing an element pair, the element pair including: a cluster identifier for identifying a cluster, and a first non-zero feature quantity of a mean feature vector representing a mean of object feature vectors associated with the cluster; determining a distance between the object feature vector in sparse representation and the mean feature vector of the cluster, the sparse representation indicating the object feature vector in a pair, the feature pair including: a second non-zero feature quantity of the object feature vector, and a feature identifier of the second non-zero feature quantity; assigning, based on the determined distance, the object feature vector to one of a plurality of clusters.
	The limitations of claim 1:
determining a distance between the object feature vector in sparse representation and the mean feature vector of the cluster, the sparse representation indicating the object feature vector in a pair, the feature pair including: a second non-zero feature quantity of the object feature vector, and a feature identifier of the second non-zero feature quantity: 
assigning, based on the determined distance, the object feature vector to one of a plurality of clusters: this limitation is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. Nothing in the claim element precludes the step from practically being performed in the mind. The context of this claim encompasses a person mentally assigning a feature vector to a cluster based on a mental evaluation of a distance. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas.	Under Step 2A Prong Two and Step 2B, this judicial exception is not integrated into a practical application and does not contain additional elements that amount to significantly more than the judicial exception. In particular, the claim recites the following additional elements: A computer-implemented method for clustering sparse data, the method comprising: storing a transposed file, the transposed file including array elements, each of the array elements storing an element pair, the element pair including: a cluster identifier for identifying a cluster, and a first non-zero feature quantity of a mean feature vector representing a mean of object feature vectors associated with the cluster: this limitation, under Step 2A Prong 2, is recited at a high level of generality (i.e., as a general means of storing array elements) and would qualify under MPEP 2106.05(g) as “adding insignificant extra-solution activity to the judicial exception,” as it can be described as mere data gathering. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. Furthermore, this limitation amounts to adding insignificant extra-solution activities which are well-understood, routine, conventional activities previously known to the industry and specified at a high level of generality. The Versata Dev. Group, Inc. v. SAP Am., Inc court decisions cited in MPEP 

Thus claim 9 is not patent eligible. Claims 16 and 23 are similarly rejected, but for the recitation of one or more processors and non-transitory computer readable storage media, which are generic computer components that amount to no more than mere instructions to apply the exception, however, they do not integrate the abstract idea in to a practical application, nor amount to significantly more than the judicial exception.

Regarding claim 10, the claim recites: The computer-implemented method of claim 9, wherein the determined distance is associated with an inner product based on a product of the second non-zero feature quantity in the object feature vector in sparse representation, and the first non-zero feature quantity in the array element of an index based on the feature identifier, the feature identifier associated with the second non-zero feature quantity among array elements in the transposed file: This limitation is considered a mathematical concept, wherein a distance is determined based on a mathematical relationship (in this case, a dot product) between a feature vector and a mean feature vector. Thus, this limitation is directed to an abstract idea and does not further recite additional elements that integrate this abstract idea into a practical application nor amount to significantly more than the judicial exception.
Thus claim 10 is not patent eligible. Claims 17 and 24 are similarly rejected, but for the recitation of one or more processors and non-transitory computer readable storage media, which are generic computer components that amount to no more than mere instructions to apply the exception, however, they do not integrate the abstract idea in to a practical application, nor amount to significantly more than the judicial exception.

	Regarding claim 11, the claim recites the following abstract idea: and wherein the determined distance is associated with an inner product based on a product of: the second non-zero feature quantity included in the object feature vector in sparse representation and a first non-zero feature quantity stored in any one of an array element of an index based on a feature identifier corresponding to the second non-zero feature quantity and the first array element in the array element among array elements in the transposed file according to whether the cluster to which the object feature vector in sparse representation belongs is an invariant cluster: This limitation is considered a mathematical concept, wherein a distance is determined based on a mathematical relationship (in this case, a dot product) between a feature vector and a mean feature vector. Thus, this limitation is directed to an abstract idea.	Furthermore, the claim recites the following additional limitations analyzed under Step 2A Prong Two and Step 2B: the computer-implemented method of claim 10, wherein a first array element is present before a second array element, wherein the first array element stores a pair including a first non-zero element of a mean feature vector corresponding to a changed cluster indicating a cluster in which an object feature vector belonging to the cluster has changed before and after the assignment in the assignment means is present, wherein the second array element stores a pair including a first non-zero element of a mean feature vector corresponding to an invariant cluster indicating a cluster in which an object feature vector belonging to the cluster has not changed before and after the assignment in the assignment means: this limitation, under Step 2A Prong 2, is recited at a high level of generality (i.e., as a general means of storing array elements) and would qualify under MPEP 2106.05(g) as “adding insignificant extra-solution activity to the judicial exception,” as it can be described as mere 	Thus claim 11 is not patent eligible. Claims 18 and 25 are similarly rejected, but for the recitation of one or more processors and non-transitory computer readable storage media, which are generic computer components that amount to no more than mere instructions to apply the exception, however, they do not integrate the abstract idea in to a practical application, nor amount to significantly more than the judicial exception.

Regarding claim 12, the claim recites the following abstract idea: and determining, for generating an inner product associated with the determined distance, a product of the second non-zero feature quantity in the object feature vector in sparse representation, and ACTIVE. 125515717.014U.S. Patent Application Serial No. Filed herewith Preliminary Amendment dated November 20, 2020 a first non-zero feature quantity stored in an array element of an index based on a feature identifier corresponding to the second non-zero feature quantity among array elements in the transposed file or the variable transposed file according to whether the cluster to which the object feature vector in sparse representation belongs is an invariant cluster: This limitation is considered a mathematical concept, wherein a distance is determined based on a mathematical relationship between a feature vector and a feature quantity of a mean feature vector. Thus, this limitation is directed to an abstract idea. 
	Furthermore, the claim recites the following additional limitations analyzed under Step 2A Prong Two and Step 2B: The computer-implemented method of claim 9, the method further comprising: storing a variable transposed file obtained by removing the feature pair including a first non-zero element of a mean feature vector corresponding to an invariant cluster indicating a cluster in which an object feature vector belonging to the cluster has not changed before and after the assignment from the transposed file: this limitation, under Step 2A Prong 2, is recited at a high level of generality (i.e., as a general means of storing a transposed file) and would qualify under MPEP 2106.05(g) as “adding insignificant extra-solution activity to the judicial exception,” as it can be described as mere data gathering. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. Furthermore, this limitation amounts to adding insignificant extra-solution activities which are well-understood, routine, conventional activities previously known to the industry and specified at a high level of generality. The Versata Dev. Group, Inc. v. SAP Am., Inc court decisions cited in MPEP 2106.05(d)(II) indicate that simply “storing and retrieving information in memory,” such as a transposed file, are well-understood, routine, conventional activities and are supported by Berkheimer evidence. Accordingly, this additional element does not amount to significantly more than the judicial exception.

	Thus claim 12 is not patent eligible. Claims 19 and 26 are similarly rejected, but for the recitation of one or more processors and non-transitory computer readable storage media, which are generic computer components that amount to no more than mere instructions to apply the exception, however, they do not integrate the abstract idea in to a practical application, nor amount to significantly more than the judicial exception.
	Regarding claim 13, the claim recites: The computer-implemented method of claim 9, wherein the determined distance is associated with an inner product based on a product of the second non-zero feature quantity included in the object feature vector in sparse representation, and a first non-zero feature quantity associated with a cluster identifier of a changed cluster indicating a cluster in which an object feature vector belonging to the cluster has changed before and after the assignment: This limitation is considered a mathematical concept, wherein a distance is determined based on a mathematical relationship (in this case, an inner product) between a feature vector and a feature quantity of a mean feature vector. Thus, this limitation is directed to an abstract idea. Furthermore, this limitation is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. Nothing in the claim element precludes the step from practically being performed in the mind. The context of this claim encompasses a person mentally determining a distance based on a mental evaluation of a feature vector and a mean feature vector. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Furthermore, the claim does not recite any additional elements that integrate the abstract idea into a practical application nor amount to significantly more than the judicial exception.	Thus claim 13 is not patent eligible. Claims 20 and 27 are similarly rejected, but for the recitation of one or more processors and non-transitory computer readable storage media, which are generic computer components that amount to no more than mere instructions to apply the exception, however, they do not integrate the abstract idea in to a practical application, nor amount to significantly more than the judicial exception.

	Regarding claim 14, the claim recites the following abstract idea: initializing a mean feature vector in a complete representation: This limitation is considered a mathematical concept, wherein a mean vector is determined based on a mathematical relationship (in this case, an average or mean) of a set of feature vectors. Thus, this limitation is directed to an abstract idea. Furthermore, this limitation is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. Nothing in the claim element precludes the step from practically being performed in the mind. The context of this claim encompasses a person mentally initializing a mean feature vector based on a mental evaluation of a set of feature vectors. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Furthermore, the claim does not recite any additional elements that integrate the abstract idea into a practical application nor amount to significantly more than the judicial exception.	Furthermore, the claim recites the following additional limitations analyzed under Step 2A Prong Two and Step 2B: The computer-implemented method of claim 9, the method further comprising: receiving a plurality of object feature vectors in sparse representation; receiving a number of clusters; as well as and generating, based on the number of clusters, the plurality of clusters: these limitations, under Step 2A Prong 2, are recited at a high level of generality (i.e., as a general means of receiving feature vectors and cluster amounts and storing clusters) and would qualify under MPEP 2106.05(g) as “adding insignificant extra-solution activity to the judicial exception,” as it can be Thus claim 14 is not patent eligible. Claims 21 and 28 are similarly rejected, but for the recitation of one or more processors and non-transitory computer readable storage media, which are generic computer components that amount to no more than mere instructions to apply the exception, however, they do not integrate the abstract idea in to a practical application, nor amount to significantly more than the judicial exception.	Regarding claim 15, the claim recites: the computer-implemented method of claim 9, wherein an inner product for determining the distance is based on a mean transposed file with a boundary index, and where the mean transposed file with the boundary index includes: a first part including the mean feature vectors not belonging to an invariant cluster, a second part including the mean feature vectors belonging to the invariant cluster, and the boundary index associated with an index for the last mean feature vector in the first part: This limitation is considered a mathematical concept, wherein a distance is determined based on a mathematical relationship of a mean vector stored in a transposed file, a boundary index, and a feature vector. Thus, this limitation is directed to an abstract idea. 
	Thus claim 15 is not patent eligible. Claim 22 is similarly rejected, but for the recitation of one or more processors and non-transitory computer readable storage media, which are generic computer components that amount to no more than mere instructions to apply the exception, however, they do not integrate the abstract idea in to a practical application, nor amount to significantly more than the judicial exception.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 9-10, 12-14, 16-17, 19-21, 23-24, and 26-28 are rejected under 35 U.S.C. 103 as being unpatentable over Peng et. al. ("Refining Spherical K-Means for Clustering Documents," The 2006 IEEE International Joint Conference on Neural Network Proceedings, 2006, pp. 4146-4150, doi: 10.1109/IJCNN.2006.246962), hereinafter Peng, in view of Chakerian (U.S. Patent Publication No. 20160004764), hereinafter Chakerian.

Regarding claim 9, Peng discloses A computer-implemented method for clustering sparse data, the method comprising: . . . determining a distance between the object feature vector in sparse representation and the mean feature vector of the cluster (on page 4146 column 1 paragraph 2 – column 2 paragraph 1, Peng describes a k means algorithm for processing sparse data. He specifically describes calculating a cluster center. On page 4147 column 1 paragraph 1, Peng denotes cluster centers as a list of clusters and their respective centers: “ Let C1, C2, ··· , Ck be k clusters and c1, c2, ··· , ck be k cluster centers ,respectively.” Peng describes that a cluster center is “can be calculated by solving a constrained optimization problem whose optimal solution can be obtained by simply normalizing the geometric center of all the data points in the cluster.” A normalized geometric center of all points in a cluster is considered mathematically equivalent to a mean feature vector. In equation (3) on page 4147 column 1 and in paragraph 1 on page 4147 column2, Peng details how this process maximizes “cosine similarity between each document and the cluster center that it is assigned to” and “reassigns each document to the closest cluster in terms of cosine similarity.” It is considered well known in the art that cosine similarity is equivalent to an inner product, which is described by applicant in claim 10 as a distance measure for “sparse vectors.” This equivalence is described on page 149 paragraph 4 in the following article: “the inner product x T y is often known as the ‘cosine similarity.’ Since cosine similarity is easy to interpret and simple to compute for sparse vectors, it is widely used in text mining and information retrieval” (INDERJIT S. DHILLON and DHARMENDRA S. MODHA, Concept Decompositions for large sparse text data using clustering, Machine Learning, 42, pp. 143-175, 2001). Furthermore, each document is considered equivalent to an object feature vector in sparse representation, as described by documents are casted into a vector space and the feature of documents are the words or terms that they contain. The basic idea is that we represent a document by a vector whose entries are the frequencies of content-related words or terms in the document . . . The resulted vectors are usually highly dimensional and sparse”).	the sparse representation indicating the object feature vector in a pair, the feature pair including: a second non-zero feature quantity of the object feature vector, and a feature identifier of the second non-zero feature quantity; (as described in above claim analysis, each document is considered equivalent to an object feature vector in sparse representation, as described by Peng on page 4146 column1 paragraph 2: “Typically, the documents are casted into a vector space and the feature of documents are the words or terms that they contain. The basic idea is that we represent a document by a vector whose entries are the frequencies of content-related words or terms in the document . . . The resulted vectors are usually highly dimensional and sparse.” On page 4146 column 2 paragraph 4, Peng specifies that the documents are denoted by “A = {a1, a2, ··· , an}.” This is considered a list of feature vectors, with each variable “a1 … an” representing a feature quantity, and each index “1 . . n” representing a feature identifier tied to that feature quantity, and the pair of the variable and the index put together is equivalent to a feature pair).
assigning, based on the determined distance, the object feature vector to one of a plurality of clusters (on page 4147 column 2 paragraph 1, Peng describes steps of the k-means algorithm. Step 2 is designated to “reassign each document to the closest cluster in terms of the cosine similarity and get the new partition.” This is considered equivalent to assigning a feature vector, equivalent to a document described by Peng, to a cluster based on a distance, described by Peng as a cosine similarity.	However, Peng does not disclose storing a transposed file, the transposed file including array elements, each of the array elements storing an element pair the element pair including: the element pair including: a cluster identifier for identifying a cluster and a first non-zero feature quantity of a mean feature vector representing a mean of object feature vectors associated with the cluster. Peng discloses several element pairs denoting cluster centers representing clusters represented in a list (equivalent to an array), but does not specify these elements to be stored in an array in a transposed file.
However, in the same field of endeavor, Chakerian describes storing a transposed file, the transposed file including array elements, each of the array elements storing an element pair ((in paragraph 0119 and FIG. 13 of applicant’s specification, applicant defines a transposed file as a table consisting of several “pairs” which applicant further defines as “tuples.” Therefore, an element pair is interpreted as not restricted to a pair of two items, but could be a pair of one item to a list of several items mapped to that one item, and a transposed file is interpreted as a table of tuples. Based on this interpretation, in paragraph 0034 Chakerian details a “database” that can be stored in “non-volatile memory (e.g. in storage device)” equivalent to the storing of a file. This database stores “for each cluster,” “information identifying the documents contained in the cluster (e.g., title, publication date, URL, etc.),” as well as a “a centroid vector of the vectors of all the document within the cluster.” This information is considered equivalent to one element of a cluster identifier, and the centroid vector is considered equivalent to a second element (in a tuple of elements) corresponding to a mean vector as detailed by the applicant. As these elements are stored for each cluster, the database is understood to contain an array of these elements).
the element pair including: a cluster identifier for identifying a cluster(as stated above, in paragraph 0034 Chakerian details a “database” that can be stored in “non-volatile memory (e.g. in storage device)” equivalent to the storing of a file. This database stores “for each cluster,” “information identifying the documents contained in the cluster (e.g., title, publication date, URL, etc.).” This identifying information is considered equivalent to one element of a cluster identifier).
 and a first non-zero feature quantity of a mean feature vector representing a mean of object feature vectors associated with the cluster (as stated above, in paragraph 0034 Chakerian details a “database” that can be stored in “non-volatile memory (e.g. in storage device)” equivalent to the storing of a file.  This database stores, “for each cluster,” “a centroid vector of the vectors of all the document within the cluster.” The centroid vector is considered equivalent to a second element (in a tuple of elements) corresponding to a mean vector as detailed by the applicant. As these elements are stored for each cluster, the database is understood to contain an array of these elements).	Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Peng (in regards to a k -means algorithm for processing sparse vectors) and Chakerian (directed to a system for storing pairs of cluster identifiers and mean vectors in an array in a transposed file) and arrived at a system for clustering sparse data implemented with an array of cluster identifiers. A person of ordinary skill in the database field would be motivated to make such a combination to configure a system for “automatically characterizing, grouping, and visually presenting the data in a concise and informative way” (Chakerian paragraph 0001).
Claims 16 and 23 are similarly rejected. Refer to claim 9 for analysis.

Regarding claim 10, as stated above, Peng in view of Chakerian describe the computer implemented method of claim 9. Peng further details wherein the determined distance is associated with an inner product based on a product of the second non-zero feature quantity in the object feature vector in sparse representation (in equation (3) on page 4147 column 1 and in paragraph 1 on page 4147 column2, Peng details how this process maximizes “cosine similarity between each document and the cluster center that it is assigned to” and “reassigns each document to the closest cluster in terms of cosine similarity.” It is considered well known in the art that cosine similarity is equivalent to an inner product, which is described by applicant in claim 10 as a distance measure for “sparse vectors” as documents are casted into a vector space and the feature of documents are the words or terms that they contain . . . the resulted vectors are usually highly dimensional and sparse.” The second non-zero feature quantity, based on the description in claim 9, is interpreted to be an object feature vector associated with a cluster.
and the first non-zero feature quantity in the array element of an index based on the feature identifier (The first non-zero feature quantity, based on the description in claim 9, is interpreted to be a mean feature vector. On page 4147 column 1 paragraph 1, Peng denotes cluster centers as a list of clusters and their respective centers: “Let C1, C2, ··· , Ck be k clusters and c1, c2, ··· , ck be k cluster centers ,respectively.” Peng describes that a cluster center is “can be calculated by solving a constrained optimization problem whose optimal solution can be obtained by simply normalizing the geometric center of all the data points in the cluster.” A normalized geometric center of all points in a cluster is considered mathematically equivalent to a mean feature vector. On page 4146 column 2 paragraph 4, Peng specifies that the documents are denoted by “A = {a1, a2, ··· , an}.” This is considered a list of feature vectors, with each variable “a1 … an” representing a feature quantity, and each index “1 . . . n” representing a feature identifier tied to that feature quantity. In the equation (3) on page 4147 column 1, the cosine similarity is calculated between c indexed at j, representing a cluster center in an array element at an index j, and a document a indexed at i, with i representing a feature identifier of the document. Thus, this inner product is based on the feature identifier identifying which document (or feature vector) to use to calculate the cosine similarity with a cluster center).
the feature identifier associated with the second non-zero feature quantity among array elements in the transposed file.
However, in the same field of endeavor Chakerian discloses the feature identifier associated with the second non-zero feature quantity among array elements in the transposed file (in paragraph 0034, Chakerian details how a cluster stored in a database associated with “information identifying the documents contained in the cluster.” This information is considered equivalent to feature identifiers as they identify each document and thus each feature vector for that document. Chakerian specifies that the database also stores “vectors for each of the documents.” These “vectors for each of the documents” are considered equivalent to the second non-zero feature quantity, as this second quantity is defined in claim 9 to be quantity of an object feature vector. This claim is interpreted to mean that this object feature is included in the array elements defined in claim 9, although these array elements are defined to be a pair in which this object feature vector is not included.
Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Peng (in regards to a k -means algorithm for processing sparse vectors) and Chakerian (directed to a system for storing pairs of cluster identifiers and mean vectors in an array in a transposed file) and arrived at a system for clustering sparse data implemented with an array of cluster identifiers. A person of ordinary skill in the database field would be motivated to make such a combination to configure a system for “automatically characterizing, grouping, and visually presenting the data in a concise and informative way” (Chakerian paragraph 0001).
Claims 17 and 24 are similarly rejected. Refer to claim 10 for analysis. 
	Regarding claim 12, as stated above, Peng in view of Chakerian disclose the computer-implemented method of claim 9. Chakerian further discloses The computer-implemented method of claim 9, the method further comprising: storing a variable transposed file obtained by removing the feature pair including a first non-zero element of a mean feature vector corresponding to an invariant cluster indicating a cluster in which an object feature vector belonging to the cluster has not changed before and after the assignment from the transposed file (in paragraph 0036, Chakerian specifies that the database of clusters, “when a cluster's weight falls below a predetermined minimal weight,” can “remove the cluster from the database.” This weight is determined by the amount of documents added to the cluster, meaning that “clusters to which no or few documents are added for some period of time are likely to drop in weight below a minimal weight and become inactive over time.” No documents being added to a cluster is considered equivalent to an object feature vector in the cluster not changing before and after an assignment. Furthermore, removing a cluster from the database is considered equivalent to removing an element tied to that cluster, as removing a cluster will remove its associated documents. 
and determining, for generating an inner product associated with the determined distance, a product of the second non-zero feature quantity in the object feature vector in sparse representation (in paragraph 0034, Chakerian specifies that the “database can also store, for each cluster, the individual vectors for each of the documents and/or the similarities (e.g., cosine similarities) between the individual vectors and the representative vector.” It is considered well known in the art that cosine similarity is equivalent to an inner product, which is described by applicant in claim 10 as a distance measure for “sparse vectors.” This equivalence is described on page 149 paragraph 4 in the following article: “the inner product x T y is often known as the ‘cosine similarity.’ Since cosine similarity is easy to interpret and simple to compute for sparse vectors, it is widely used in text mining and information retrieval” (INDERJIT S. DHILLON and DHARMENDRA S. MODHA, Concept Decompositions for large sparse text data using clustering, Machine Learning, 42, pp. 143-175, 2001). Furthermore, the “individual vector” for a document is considered equivalent to the second non-zero feature quantity in the object feature vector).	andACTIVE. 125515717.014U.S. Patent Application Serial No. Filed herewithPreliminary Amendment dated November 20, 2020 a first non-zero feature quantity stored in an array element of an index based on a feature identifier corresponding to the second non-zero feature quantity among array elements in the transposed file or the variable transposed file according to whether the cluster to which the object feature vector in sparse representation belongs is an invariant cluster (based on independent claim 1, the first non-zero feature quantity is interpreted as a quantity of the mean vector for a cluster. In paragraph 0034, Chakerian specifies that the “database can also store, for each cluster, the individual vectors for each of the documents and/or the similarities (e.g., cosine similarities) between the individual vectors and the representative vector.” This “representative vector” is defined as a “centroid vector” equivalent to a mean vector detailed by the applicant, and thus includes a first non-zero quantity. This representative vector is associated and corresponds with a cluster in the database including “information identifying the documents contained in the cluster,” equivalent to feature identifier corresponding to a feature vector (the individual vector detailed by Chakerian). This cosine similarity is dependent on if a cluster is invariant, since, as detailed by Chakerian in paragraph 0036, if a cluster in which “no or few documents are added for some period of time,” this cluster can be removed from the database).
Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Peng (in regards to a k -means algorithm for processing sparse vectors) and Chakerian (directed to a system for storing pairs of cluster identifiers and mean vectors in an array in a transposed file) and arrived at a system for clustering sparse data implemented with an array of cluster identifiers. A person of ordinary skill in the database field would be motivated to make such a combination to configure a system for “automatically characterizing, grouping, and visually presenting the data in a concise and informative way” (Chakerian paragraph 0001).	Claims 19 and 26 are similarly rejected. Refer to claim 13 for analysis.
	Regarding claim 13, as stated above Peng in view of Chakerian disclose the computer-implemented method of claim 9. Peng further discloses wherein the determined distance is associated with an inner product based on a product of the second non-zero feature quantity included in the object feature vector in sparse representation (in equation (3) on page 4147 column 1 and in paragraph 1 on page 4147 column2, Peng details how this process maximizes “cosine similarity between each document and the cluster center that it is assigned to” and “reassigns each document to the closest cluster in terms of cosine similarity.” It is considered well known in the art that cosine similarity is equivalent to an inner product, which is described by applicant in claim 10 as a distance measure for “sparse vectors” as described on page 149 paragraph 4 in the cited article (INDERJIT S. DHILLON and DHARMENDRA S. MODHA, Concept Decompositions for large sparse text data using clustering, Machine Learning, 42, pp. 143-175, 2001). These documents are considered equivalent to a feature quantity in an object feature vector in sparse representation, as described by Peng on page 4146 column1 paragraph 2: “Typically, the documents are casted into a vector space and the feature of documents are the words or terms that they contain . . . the resulted vectors are usually highly dimensional and sparse”).	and a first non-zero feature quantity associated with a cluster identifier of a changed cluster indicating a cluster in which an object feature vector belonging to the cluster has changed before and after the assignment (on page 4146 column 2 paragraph 4, Peng specifies that the documents are denoted by “A = {a1, a2, ··· , an}.” This is considered a list of feature vectors, with each variable “a1 … an” representing a feature quantity, and each index “1 .. . n” representing a feature identifier tied to that feature quantity. In the equation (3) on page 4147 column 1, the cosine similarity is calculated between c indexed at j, representing a cluster center in an array element at an index j, and a document a indexed at i, with i representing a feature identifier of the document. Thus, this inner product is based on the feature identifier identifying which document (or feature vector) to use to calculate the cosine 
Claims 20 and 27 are similarly rejected. Refer to claim 13 for analysis.	Regarding claim 14, as stated above, Peng in view of Chakerian disclose the computer-implemented method of claim 9. Peng further discloses the method further comprising: receiving a plurality of object feature vectors in sparse representation (in paragraph 0036 of applicant’s specification, applicant details that the system “the clustering processing unit 120 receives the set X of object feature vectors x1 in sparse representation and the number K of clusters as an input.” No separate user or operator is specified for this input, so this input is interpreted to be defined by the system. As described in above claim analysis, each document is considered equivalent to an object feature vector in sparse representation, as described by Peng on page 4146 column 1 paragraph 2: “Typically, the documents are casted into a vector space and the feature of documents are the words or terms that they contain. . . . The resulted vectors are usually highly dimensional and sparse.” On page 4147 column 1 paragraph 2, Peng describes that the method’s purpose is to “partition n documents into k disjoint clusters such that similar documents are assigned to one cluster.” The number of “n documents” and “k disjoint clusters” is considered input into Peng’s method.).	receiving a number of clusters (as described above, on page 4147 column 1 paragraph 2, Peng describes that the method’s purpose is to “partition n documents into k disjoint clusters such that 	initializing a mean feature vector in a complete representation (on page 4147 column 2 paragraph 1, Peng discloses that in the first step of the k means method, “cluster centers” equivalent to mean feature vectors, are computed)	and generating, based on the number of clusters, the plurality of clusters (on page 4147 column 2 paragraph 1, Peng discloses that in the second step of the k means method, documents are assigned to the closest cluster, thus generating a plurality of clusters, based on an input k value).	Claims 21 and 28  are similarly rejected. Refer to claim 14 for analysis.
	Claims 15 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Peng, in view of Chakerian and in further view of Iio (U.S. Patent Publication No. 20110158535), hereinafter Iio.

Regarding claim 15, as stated above, Peng in view of Chakerian teach the computer-implemented method of claim 9. Chakerian further teaches wherein an inner product for determining the distance is based on a mean transposed file with a boundary index (in paragraph 0119 and FIG. 13 of applicant’s specification, applicant defines a transposed file as a table consisting of several “pairs” which applicant further defines as “tuples.” Therefore, an element pair is interpreted as not restricted to a pair of two items, but could be a pair of one item to list of several items mapped to that one item, and a transposed file is interpreted as a table of tuples. Based on this interpretation, in paragraph 0034 Chakerian details a “database” that can be stored in “non-volatile memory (e.g. in storage device)” equivalent to the storing of a file. This database stores “for each cluster,” “information identifying the documents contained in the cluster (e.g., title, publication date, URL, etc.),” equivalent to an index for that cluster in the database, serving as an index to different clusters and information associated with the individual vectors for each of the documents and/or the similarities (e.g., cosine similarities) between the individual vectors and the representative vector.” It is considered well known in the art that cosine similarity is equivalent to an inner product, which is described by applicant in claim 10 as a distance measure for “sparse vectors.” This equivalence is described on page 149 paragraph 4 in the following article: “the inner product x T y is often known as the ‘cosine similarity.’ Since cosine similarity is easy to interpret and simple to compute for sparse vectors, it is widely used in text mining and information retrieval” (INDERJIT S. DHILLON and DHARMENDRA S. MODHA, Concept Decompositions for large sparse text data using clustering, Machine Learning, 42, pp. 143-175, 2001). This inner product is based on individual vectors and representative vectors in a “database,” equivalent to a transposed file storing information about a cluster.).	and where the mean transposed file with the boundary index includes: a first part including the mean feature vectors not belonging to an invariant cluster (in paragraph 0036, Chakerian details that “in some embodiments, when a cluster's weight falls below a predetermined minimal weight, the electronic device can either keep that cluster in the database but mark it as inactive.” In paragraph 0034, Chakerian specifies that the “electronic device can also store, for each cluster, the cluster's weight, the cluster's score, and whether the cluster is active or inactive,” and that “the database can also store, for each cluster, the individual vectors for each of the documents.” The clusters stored in the database marked as active, along with the individual vectors stored in connection to these clusters, are considered equivalent to a part of the database containing vectors not in an invariant cluster).	a second part including the mean feature vectors belonging to the invariant cluster ( in paragraph 0036, Chakerian details that “in some embodiments, when a cluster's weight falls below a predetermined minimal weight, the electronic device can either keep that cluster in the database but mark it as inactive.” In paragraph 0034, Chakerian specifies that the “electronic device can also store, for whether the cluster is active or inactive,” and that “the database can also store, for each cluster, the individual vectors for each of the documents.” The clusters stored in the database marked as inactive, along with the individual vectors stored in connection to these clusters, are considered equivalent to a part of the database containing vectors in an invariant cluster).	However, Peng in view of Chakerian does not disclose and the boundary index associated with an index for the last mean feature vector in the first part. Chakerian teaches an index for associating mean vectors to clusters, equivalent to a boundary between cluster information in the database, but does not teach an ordered list of feature vectors with an index associated with a last vector in a partition as described by this limitation.
However, in the same field of endeavor,  Iio discloses and the boundary index associated with an index for the last mean feature vector in the first part (in FIG.3 and paragraph 0077, Iio discusses producing a lookup table for mapping “representative vectors” to a cluster that the representative vector is “classified into.” In the lookup table in FIG. 4, “representative vectors x1 and x2 are classified into cluster I, the representative vector x3 is classified into cluster II, and the representative vector x4 is classified into cluster III.” This lookup table “stores that cluster I corresponds to indices 1 and 2. The lookup table also stores that cluster II corresponds to index 3, and cluster III corresponds to index 4.” In the case of index 2, this index of “representative vector x2” is associated with a boundary, differentiating the last representative feature vector in a first part of the lookup table corresponding to cluster 1. This disclosure covers the concept of a boundary index associated with a mean feature vector in a table, as the “representative vector” is equivalent to a mean feature vector, as described by Iio in paragraph 0055: “vector generation unit 114 generates representative vectors of the feature vectors included in each cluster in accordance with the classification result of the feature vector classification unit”).


Claims 22 similarly rejected. Refer to claim 15 for analysis.

Allowable Subject Matter
Claims 11, 18, and 25 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, and if amended to overcome subject matter eligibility rejection.









Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Yoon (U.S. Patent Publication No. 20180102938) teaches the storing of cluster identifiers in an array with feature vectors. Mitsuru (JP Patent Publication No. 2015138462) discloses calculating an inner product between a feature vector and a mean vector for a cluster).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS PATRICK FRESNEDA whose telephone number is (571)272-8452. The examiner can normally be reached Monday - Friday 7:30 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, USMAAN SAEED can be reached on 571-272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NICHOLAS PATRICK FRESNEDA/Examiner, Art Unit 2169                                                                                                                                                                                                        

/USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2169