DETAILED ACTION

Remarks
1.    Pending claims for reconsideration are claims 22-41. Claims 22, 27, 29, 34, 36, and 40 have been amended.

Response to Arguments
2.    Applicant’s arguments filed on 7/1/2020 are moot in view of grounds of new rejection

Continued Examination Under 37 CFR 1.114
3.    A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 7/1/2020 has been entered.


Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


4.	Claims 22-41 are rejected under 35 U.S.C. 103 as being unpatentable over Patent No.: US 7,797,265 B2 to Brinker et al(hereafter referenced as Brinker) in view of Patent No.: US 8,589,455 B2 to Jain et al(hereafter referenced as Jain), in further view of Patent No.: US 6,289,354 B1 to Aggarwal et al(hereafter referenced as Aggarwal).
Regarding claim 22, Brinker discloses “ A method of identifying sensitive data (SD) from amongst data stored on one or more data repositories”(cluster centroids are retrieved from memory feature vector using sensitive hashing function [Col.2/lines 28-31]),  “the method provided by a processor (pre-processor [Fig.1/item 112]) and memory block (memory cache block [Fig.1/item 108]) operatively coupled to the one or more data repositories (coupled to memory cluster module [Fig.1/item 118]) and comprising: processing the data from the one or more data repositories to calculate a plurality of float feature (FF) vectors associated with the data”(generate/ calculate feature vector associated with data [Fig.2/item 210]) , “each FF vector associated with a file of said data and comprising a plurality of floating point numbers calculated in accordance with the binary content of said file”(compare feature vector with centroid [Fig.310] which comprises binary content associated with a mathematical representation/feature vector [Col.6/lines 28-32]), “clustering the plurality of FF vectors into a plurality of clusters”(send feature vector to cluster hash [Fig.3/item 304]), “each cluster associated with a respective subset of the data being the data in respect of which the FF vectors in the cluster were calculated” (compare feature vectors with calculated centroids [Fig.3/item 310]); “generating, for each cluster, a vector representative of the cluster”( create new vector cluster [Fig.3/item 316]) ; “comparing the vectors representative of respective clusters to one or more FF vectors calculated for a respective one or more examples of SD” (compare feature vector to calculated centroid/vectors for the data [Col.6/lines 65- Col.7/line 4]) ; “and classifying one or more clusters as SD based on the result of said comparing, thereby identifying respective subsets of data as SD.”(sensitive hashing function is applied/classified to data within the cluster subset hash [Col.7/lines 15-20]).
Brinker does not explicitly disclose “wherein the processing comprises, for each file of said data: extracting bytes from a binary content of the file, wherein at least some of the bytes are representative of a graphical content present in the file, dividing the bytes into a plurality of chunks, generating, for each chunk, a pair of values characterizing data in the chunk, thereby giving rise to a plurality of pairs of values.” 
However, Jain in an analogous art discloses “wherein the processing comprises, for each file of said data: extracting bytes from a binary content of the file” (extract binary content of file [Fig.6] also see the object is a BLOB binary large object chunk  Jain[Col.1/lines 62-65]), “wherein at least some of the bytes are representative of a graphical content present in the file”(file checksum graphical mapping of file Jain[Fig.13]), “dividing the bytes into a plurality of chunks, generating, for each chunk, a pair of values characterizing data in the chunk”, thereby giving rise to a plurality of pairs of values” (chunking process is employed to divide into a plurality of chunk/pair values Jain[Col.2/lines 64-66]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Brinker’s clustering process using a sensitive hash for data with Jain’s content aware data partitioning of data chunks in order to provide additional security. One of ordinary skill in the art would have been motivated to combine because Brinker teaches a data clustering process of sensitive data hash techniques, Jain employs a partitioning strategy in data chunks and both are from the same field of endeavor.
Neither Brinker nor Jain explicitly disclose “generating a plurality of nodes in a two dimensional space, wherein each node is associated with a pair of values of the plurality of pair of values, defining location of the node in the two dimensional space “clustering module, generating objects characterizing a spatial relationship between nodes of the plurality of nodes, and generating the FF vector associated with the file based on the objects”
However, Arggawal in an analogous art teaches “generating a plurality of nodes in a two dimensional space”(nodes are created in high dimensional data spacing environment Arggawal[Col.7/lines 11-14]), “wherein each node is associated with a pair of values of the plurality of pair of values”(partition data values into clusters Arggawal [Fig.1/item 130]), “defining location of the node in the two dimensional space”(defining of nodes in dimensional space Arggawal [Fig.2]), “generating objects characterizing a spatial relationship between nodes of the plurality of nodes”(cluster C characterizes a relationship within a capsule restraint of nodes E1, E2, E3 Arggawal [Col.10/ lines 29-43]), “and generating the FF vector associated with the file based on the objects”(eigen vector is generated from a constructive proof based object Arggawal [Col.7/lines 30-35]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Brinker’s clustering process using a sensitive hash for data and Jain’s content aware data partitioning of data chunks with Arggawal’s system to generate a plurality of nodes in dimensional a high dimensional space in order to provide additional security. One of ordinary skill in the art would have been motivated to combine because Brinker teaches a data clustering process of sensitive data hash techniques, Jain employs a partitioning strategy in data chunks, Arggawal discloses a plurality of data nodes within dimensional space and all are from the same field of endeavor.
Regarding claim 23 in view of claim 22, the references combined disclose “wherein said processing includes associating the FF vector calculated for a given file with context data associated with the given file, said context data comprising at least one of metadata and attributes associated with the file” i.e. for each file a sensitivity signature consisting of a tag indicating the cluster sensitivity level based on signature consisting of top modules of the file term vector (Reininger[Col.11/lines 59-64]) also see Close time of creation or other similar file metadata matching exchangeable image file format within a data cluster for protecting sensitive data Reininger [Col.3/lines 13-20]).
Regarding claim 24 in view of claim 22, the references combined disclose “further comprising updating the FF vector and context data associated with a given file upon predetermined criteria being met.” (filter clusters allow files to be updated Reininger [Col.7/lines 4-64]).
Regarding claim 25 in view of claim 24, the references combined disclose “wherein the predetermined criteria is met upon detecting a change in the given file's name, date and/or size” (pre-determined criteria is met when comparing feature vector of document to centroid Brinker[Col.8/lines 49-50]).
Regarding claim 26 in view of claim 23, the references combined disclose “further comprising analyzing the context data associated with the FF vectors in one or more clusters using a machine learning algorithm” (compare feature vector cluster Brinker [Fig.3/item 310]), “and responsive to said analyzing, generating a behavior model for each subset of data associated a respective cluster of the one or more clusters”  (after comparison a threshold model is associated with cluster Brinker [Fig.3/item 312]).
Regarding claim 27 in view of claim 26, the references combined disclose “further comprising using the behavior model for a subset of data to detect outlier  (associate document with cluster after threshold model has been assessed Brinker [Fig.3/item 314])
Regarding claim 28 in view of claim 22, the references combined disclose “further comprising repeatedly scanning the one or more data repositories, wherein in each scan, the processor and memory block processes at least a subset of files stored on the one or more data repositories” tagging sensitive data and building file clusters based on file similarity (Reininger [Col.3/lines 38-49]), “said processing including calculating FF vectors associated with the at least a subset of files and clustering the vectors into corresponding clusters”(files are updated and encrypted Reininger[Col.7/lines 60-64]) , “thereby updating the clusters associated with the data stored on the one or more data repositories” (files are updated and encrypted Reininger[Col.7/lines 60-64]).
Regarding claim 29, Brinker discloses “a system capable of identifying sensitive data (SD) from amongst data stored on one or more data repositories” (cluster centroids are retrieved from memory feature vector using sensitive hashing function [Col.2/lines 28-31]), “the system comprising a processor and memory block operatively coupled to the one or more data repositories” (coupled to memory cluster module [Fig.1/item 118]), wherein the processor and memory block is configured to: process the data from the one or more data repositories to calculate a plurality of float feature (FF) vectors associated with the data” (generate/ calculate feature vector associated with data [Fig.2/item 210]), “each FF vector associated with a file of said data and comprising a plurality of floating point numbers calculated in accordance  (compare feature vector with centroid [Fig.310] which comprises binary content associated with a mathematical representation/feature vector [Col.6/lines 28-32]), cluster the plurality of FF vectors into a plurality of clusters” (send feature vector to cluster hash [Fig.3/item 304]), “each cluster associated with a respective subset of the data being the data in respect of which the FF vectors in the cluster were calculated” (compare feature vectors with calculated centroids [Fig.3/item 310]); “generate, for each cluster, a vector representative of the cluster” ( create new vector cluster [Fig.3/item 316]); “compare the vectors representative of respective clusters to one or more FF vectors calculated for a respective one or more examples of SD” (compare feature vector to calculated centroid/vectors for the data [Col.6/lines 65- Col.7/line 4]); “and classify one or more clusters as SD based on the result of said comparing, thereby identifying respective subsets of data as SD” (sensitive hashing function is applied/classified to data within the cluster subset hash [Col.7/lines 15-20]).
Brinker does not explicitly disclose “wherein process the data comprises, for each file of said data: extracting bytes from a binary content of the file, wherein at least some of the bytes are representative of a graphical content present in the file, dividing the bytes into a plurality of chunks. generating, for each chunk, a pair of values characterizing data in the chunk, thereby giving rise to a plurality of pairs of values.”
However, Jain in an analogous art discloses “wherein the processing comprises, for each file of said data: extracting bytes from a binary content of the file” (extract binary content of file [Fig.6] also see the object is a BLOB binary large object chunk  Jain[Col.1/lines 62-65]), “wherein at least some of the bytes are representative of a graphical content present in the file”(file checksum graphical mapping of file Jain[Fig.13]), “dividing the bytes into a plurality of chunks, generating, for each chunk, a pair of values characterizing data in the chunk”, thereby giving rise to a plurality of pairs of values” (chunking process is employed to divide into a plurality of chunk/pair values Jain[Col.2/lines 64-66]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Brinker’s clustering process using a sensitive hash for data with Jain’s content aware data partitioning of data chunks in order to provide additional security. One of ordinary skill in the art would have been motivated to combine because Brinker teaches a data clustering process of sensitive data hash techniques, Jain employs a partitioning strategy in data chunks and both are from the same field of endeavor.
Neither Brinker nor Jain explicitly disclose “generating a plurality of nodes in a two dimensional space, wherein each node is associated with a pair of values of the plurality of pair of values, defining location of the node in the two dimensional space “clustering module, generating objects characterizing a spatial relationship between nodes of the plurality of nodes, and generating the FF vector associated with the file based on the objects”
However, Arggawal in an analogous art teaches “generating a plurality of nodes in a two dimensional space”(nodes are created in high dimensional data spacing environment Arggawal[Col.7/lines 11-14]), “wherein each node is associated with a pair of values of the plurality of pair of values”(partition data values into clusters Arggawal [Fig.1/item 130]), “defining location of the node in the two dimensional space”(defining of nodes in dimensional space Arggawal [Fig.2]), “generating objects characterizing a spatial relationship between nodes of the plurality of nodes”(cluster C characterizes a relationship within a capsule restraint of nodes E1, E2, E3 Arggawal [Col.10/ lines 29-43]), “and generating the FF vector associated with the file based on the objects”(eigen vector is generated from a constructive proof based object Arggawal [Col.7/lines 30-35]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Brinker’s clustering process using a sensitive hash for data and Jain’s content aware data partitioning of data chunks with Arggawal’s system to generate a plurality of nodes in dimensional a high dimensional space in order to provide additional security. One of ordinary skill in the art would have been motivated to combine because Brinker teaches a data clustering process of sensitive data hash techniques, Jain employs a partitioning strategy in data chunks, Arggawal discloses a plurality of data nodes within dimensional space and all are from the same field of endeavor.
Regarding claim 30 in view of claim 29, the references combined disclose “wherein said processing includes associating the FF vector calculated for a given file with context data associated with the given file, said context data comprising at least one of metadata and attributes associated with the file” i.e. for each file a sensitivity signature consisting of a tag indicating the cluster sensitivity level based on signature consisting of top modules of the file term vector (Reininger[Col.11/lines 59-64]) also see Close time of creation or other similar file metadata matching exchangeable image file format within a data cluster for protecting sensitive data Reininger [Col.3/lines 13-20]).
Regarding claim 31 in view of claim 30, the references combined disclose “wherein the processor and memory block is further configured to update the FF vector and context data associated with a given file upon predetermined criteria being met” (filter clusters allow files to be updated Reininger [Col.7/lines 4-64]).
Regarding claim 32 in view of claim 31, the references combined disclose “wherein the predetermined criteria is met upon detecting a change in the given file's name, date and/or size” (pre-determined criteria is met when comparing feature vector of document to centroid Brinker[Col.8/lines 49-50]).
Regarding claim 33 in view of claim 30, the references combined disclose “wherein the processor and memory block is further configured to analyze the context data associated with the FF vectors in one or more clusters using a machine learning algorithm”(compare feature vector cluster Brinker [Fig.3/item 310]), “and responsive to said analyzing, generate a behavior model for each subset of data associated a respective cluster of the one or more clusters” (after comparison a threshold model is associated with cluster Brinker [Fig.3/item 312]).
Regarding claim 34 in view of claim 33, the references combined disclose “wherein the processor and memory block is further configured to use the behavior (associate document with cluster after threshold model has been assessed Brinker [Fig.3/item 314]). 
Regarding claim 35 in view of claim 29, the references combined disclose “wherein the processor and memory block is further configured to repeatedly scan the one or more data repositories, wherein in each scan, the processor and memory block is configured to process at least a subset of data files stored on the one or more data repositories” tagging sensitive data and building file clusters based on file similarity (Reininger [Col.3/lines 38-49]), “said processing including calculating FF vectors associated with the at least a subset of files and clustering the vectors into corresponding clusters” (files are updated and encrypted Reininger[Col.7/lines 60-64]), “thereby updating the clusters associated with the data stored on the one or more data repositories” (files are updated and encrypted Reininger[Col.7/lines 60-64]).
Regarding claim 36, Brinker discloses “ non-transitory storage medium comprising instructions that when executed by a processor (pre-processor [Fig.1/item 112]) and memory block, cause the processor and memory block(memory cache block [Fig.1/item 108]) to perform a method of identifying sensitive data (SD) from amongst data stored on one or more data repositories” (cluster centroids are retrieved from memory feature vector using sensitive hashing function [Col.2/lines 28-31]), “the method comprising: processing the data from the one or more data repositories  to calculate a plurality of float feature (FF) vectors associated with the data” (generate/ calculate feature vector associated with data [Fig.2/item 210],  (compare feature vector with centroid [Fig.310] which comprises binary content associated with a mathematical representation/feature vector [Col.6/lines 28-32]), “clustering the plurality of FF vectors into a plurality of clusters(send feature vector to cluster hash [Fig.3/item 304], “each cluster associated with a respective subset of the data being the data in respect of which the FF vectors in the cluster were calculated” (compare feature vectors with calculated centroids [Fig.3/item 310]); generating, for each cluster” (create new vector cluster [Fig.3/item 316], “a vector representative of the cluster; comparing the vectors representative of respective clusters to one or more FF vectors calculated for a respective one or more examples of SD” (compare feature vector to calculated centroid/vectors for the data [Col.6/lines 65- Col.7/line 4]); “and classifying one or more clusters as SD based on the result of said comparing, thereby identifying respective subsets of data as SD” (sensitive hashing function is applied/classified to data within the cluster subset hash [Col.7/lines 15-20]).
Brinker does not explicitly disclose “wherein the processing comprises, for each file of said data: extracting bytes from a binary content of the file, wherein at least some of the bytes are representative of a graphical content present in the file, dividing the bytes into a plurality of chunks, generating, for each chunk, a pair of values”
However, Jain in an analogous art discloses “wherein the processing comprises, for each file of said data: extracting bytes from a binary content of the file” (extract binary content of file [Fig.6] also see the object is a BLOB binary large object chunk  Jain[Col.1/lines 62-65]), “wherein at least some of the bytes are representative of a graphical content present in the file”(file checksum graphical mapping of file Jain[Fig.13]), “dividing the bytes into a plurality of chunks, generating, for each chunk, a pair of values characterizing data in the chunk”, thereby giving rise to a plurality of pairs of values” (chunking process is employed to divide into a plurality of chunk/pair values Jain[Col.2/lines 64-66]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Brinker’s clustering process using a sensitive hash for data with Jain’s content aware data partitioning of data chunks in order to provide additional security. One of ordinary skill in the art would have been motivated to combine because Brinker teaches a data clustering process of sensitive data hash techniques, Jain employs a partitioning strategy in data chunks and both are from the same field of endeavor.
Neither Brinker nor Jain explicitly disclose “generating a plurality of nodes in a two dimensional space, wherein each node is associated with a pair of values of the plurality of pair of values, defining location of the node in the two dimensional space “clustering module, generating objects characterizing a spatial relationship between nodes of the plurality of nodes, and generating the FF vector associated with the file based on the objects”
However, Arggawal in an analogous art teaches “generating a plurality of nodes in a two dimensional space”(nodes are created in high dimensional data spacing environment Arggawal[Col.7/lines 11-14]), “wherein each node is associated with a pair of values of the plurality of pair of values”(partition data values into clusters Arggawal [Fig.1/item 130]), “defining location of the node in the two dimensional space”(defining of nodes in dimensional space Arggawal [Fig.2]), “generating objects characterizing a spatial relationship between nodes of the plurality of nodes”(cluster C characterizes a relationship within a capsule restraint of nodes E1, E2, E3 Arggawal [Col.10/ lines 29-43]), “and generating the FF vector associated with the file based on the objects”(eigen vector is generated from a constructive proof based object Arggawal [Col.7/lines 30-35]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Brinker’s clustering process using a sensitive hash for data and Jain’s content aware data partitioning of data chunks with Arggawal’s system to generate a plurality of nodes in dimensional a high dimensional space in order to provide additional security. One of ordinary skill in the art would have been motivated to combine because Brinker teaches a data clustering process of sensitive data hash techniques, Jain employs a partitioning strategy in data chunks, Arggawal discloses a plurality of data nodes within dimensional space and all are from the same field of endeavor.
Regarding claim 37 in view of claim 36, the references combined disclose “wherein said processing includes associating the FF vector calculated for a given file with context data associated with the given file” (filter clusters allow files to be updated Reininger [Col.7/lines 4-64])., said context data comprising at least one of metadata and attributes associated with the file.” i.e. for each file a sensitivity signature consisting of a tag indicating the cluster sensitivity level based on signature consisting of top modules of the file term vector (Reininger[Col.11/lines 59-64]) also see Close time of creation or other similar file metadata matching exchangeable image file format within a data cluster for protecting sensitive data Reininger [Col.3/lines 13-20]).
Regarding claim 38 in view of claim 37, the references combined disclose “further comprising instructions causing the processor and memory block to update the FF vector and context data associated with a given file upon predetermined criteria being met, wherein the predetermined criteria is met upon detecting a change in the given file's name, date and/or size” (pre-determined criteria is met when comparing feature vector of document to centroid Brinker [Col.8/lines 49-50]).
Regarding claim 39 in view of claim 37, the references combined disclose “further comprising instructions causing the processor and memory block to analyze the context data associated with the FF vectors in one or more clusters using a machine learning algorithm” (compare feature vector cluster Brinker [Fig.3/item 310]), “and, responsive to said analyzing, generating a behavior model for each subset of data associated a respective cluster of the one or more clusters” (after comparison a threshold model is associated with cluster Brinker [Fig.3/item 312]). 
Regarding claim 40 in view of claim 39, the references combined disclose “further comprising instructions causing the processor and memory block to use the behavior model for a subset of data to detect outlier behavior in the subset of data” (associate document with cluster after threshold model has been assessed Brinker [Fig.3/item 314]).
Regarding claim 41 in view of claim 36, the references combined disclose “further comprising instructions causing the processor and memory block to repeatedly scan the one or more data repositories” tagging sensitive data and building file clusters based on file similarity (Reininger [Col.3/lines 38-49]), “wherein in each scan, to process at least a subset of data files stored on the one or more data repositories, said processing including calculating FF vectors associated with the at least a subset of files and clustering the vectors into corresponding clusters” (files are updated and encrypted Reininger[Col.7/lines 60-64]), “thereby updating the clusters associated with the data stored on the one or more data repositories” (files are updated and encrypted Reininger[Col.7/lines 60-64]).

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL D ANDERSON whose telephone number is (571)270-5159.  The examiner can normally be reached on Mon-Fri 9am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MICHAEL D ANDERSON/Examiner, Art Unit 2432                                                                                                                                                                                                        

/MORSHED MEHEDI/Primary Examiner, Art Unit 2432