Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
1.	Claims 1-22 are active in this application.
Claim Objections
2.	Claims 16-17 are objected to because of the following informalities:  A “.” is needed as the end of the claim.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  3.	Claim 1-12 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Regarding claim 1, the term “some" is a relative term which renders the claim indefinite.  The term "some" is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. 
Claims 2-12 depend on rejected claims thus rejected on the same ground.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

 4.	Claims 13-22 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
In particular, Claims 13-22 recites a system comprising one or more filters, a clustering module, a sampling module, and a classification module which lacks the necessary physical articles or objects to constitute a machine or manufacture within the meaning of 35 U.S.C 101.  No express limitation of any hardware element corresponding to the claimed system is mentioned in the specification in order to place the claim statutory. Therefore, it can be reasonably be interpreted that the system of claims 13-22 may be directed to simply a "system" comprising a plurality of software code or programs.  Thus, claims 13-22 are rejected under 35 U.S.C. 101 as directed to non-statutory subject matter.
Examiner's Note
The Examiner respectfully requests of the Applicants in preparing responses, to fully consider the entirety of the references as potentially teaching all or part of the claimed invention.
It is noted, REFERENCES ARE RELEVANT AS PRIOR ART FOR ALL THEY CONTAIN. "The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned. They are part of the literature of the art, relevant for all they contain." In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)). A reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill the art, including non-preferred embodiments (see MPEP 2123).
The Examiner has cited particular locations in the reference(s) as applied to the claims below for the convenience of the Applicants. Although the specified citations are representative 
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


5.	Claims 1-10, 13-22 are rejected under 35 U.S.C. 102(a)(1)as being anticipated by Thomas (US 7,945,600).
Regarding claim 1, Thomas discloses a method for data management of documents in one or more data repositories in a computer network or cloud infrastructure, the method comprising: 	sampling the documents in the one or more data repositories (col. 4, line 59 to col. 5, line 5, Col. 6, lines 42-44); 
formulating representative subsets of the sampled documents (col. 5, lines 53-54, col. 11, lines 35-67); 
generating sampled data sets of the sampled documents (Col. 7, lines 18-35, col. 10, lines 60-65 and col. 11, lines 35-67); and 
balancing the sampled data sets for further processing of the sampled documents, wherein the formulation of the representative subsets is performed for identification of some of the 

Regarding claim 2, Thomas discloses wherein sampling the documents in the one or more data repositories comprises filtering the documents in the one or more data repositories in accordance with one or more features of the documents (Col. 6, lines 45-57, Col. 8, lines 15-20 and Col. 8 line 58 to col. 9, line 3).

Regarding claim 3, Thomas discloses wherein the one or more features of the documents comprise folder, department or location in the one or more data repositories, document date of creation, document date of modification, document size, document depth, document language, document extension and/or number of personal identifying information (PII) in a document (Col. 6, lines 45-57, Col. 7, lines 36-51, and Col. 9, line 45 to col. 10, line 15).

Regarding claim 4, Thomas discloses wherein the filtering the documents in the one or more data repositories comprises filtering the documents in accordance with one or more user-selected features of the documents (Col. 6, lines 45-57, Col. 7, lines 36-51, and col. 9 lines 3-16).

Regarding claim 5, Thomas discloses wherein the filtering the documents in the one or more data repositories further comprises filtering the documents in accordance with distributors and weights (Col. 7, lines 18-51).



Regarding claim 7, Thomas discloses wherein the generating the sampled datasets of the sampled documents comprises smart sampling the sampled datasets by one of weighted clustering (Col. 5, lines 7-23, Col.10, lines 48-65 and col. 11, lines 35-67), combined distributors or hierarchical sampling prior to smart sampling the sampled datasets in either or both of the random sampling mode and the proportional sampling mode (Col. 5, lines 7-23, Col.10, lines 48-65 and col. 11, lines 35-67).

Regarding claim 8, Thomas discloses wherein the formulating representative subsets of the sampled documents comprises weighted clustering of the sampled documents (col. 11, lines 35-67).

Regarding claim 9, Thomas discloses wherein the weighted clustering of the sampled documents comprises weighted k-means clustering of the sampled documents (col. 11, lines 35-67).

Regarding claim 10, Thomas discloses wherein the weighted clustering of the sampled documents comprises weighted clustering of the sampled documents in a grid search space (Figure 7 and corresponding text).


one or more filters for generating a sample of the documents in the one or more data repositories (col. 4, line 59 to col. 5, line 5, Col. 6, lines 42-44); 
a clustering module for formulating representative subsets of the sampled documents (col. 5, lines 53-54, col. 11, lines 35-67); 
a sampling module for generating sampled datasets of the sampled documents (Col. 7, lines 18-35, col. 10, lines 60-65 and col. 11, lines 35-67); and 
a classification module for classifying or categorizing documents in the sampled datasets of the sampled documents (Col. 10, lines 49-67 and col. 11, lines 35-67).

Regarding claim 14, Thomas discloses wherein the one or more filters comprise predicates and distributors (Col. 6, lines 45-57, Col. 7, lines 36-51, and Col. 9, line 45 to col. 10, line 15).

Regarding claim 15, Thomas discloses wherein the one or more filters define a grid search space (Figure 7 and corresponding text).

Regarding claim 16, Thomas discloses wherein the predicates comprise user-defined predicates (Col. 6, lines 45-57, Col. 7, lines 36-51, and col. 9 lines 3-16).



Regarding claim 18, Thomas discloses wherein the weighted clustering module is coupled to a k-means weighted module in order for the weighted clustering module to perform k-means weighted clustering of the sampled documents (col. 11, lines 35-67).

Regarding claim 19, Thomas discloses wherein the sampling module comprises a smart sampling module for generating the sampled datasets of the sampled documents in accordance with one or both of a random sampling mode and a proportional sampling mode (Col. 5, lines 7-23, Col.10, lines 48-65 and col. 11, lines 35-67).

Regarding claim 20, Thomas discloses wherein the smart sampling module comprises one or more of a weighted clustering module, combined distributors, or a hierarchical sampling module (Col. 5, lines 7-23, Col.10, lines 48-65 and col. 11, lines 35-67).

Regarding claim 21, Thomas discloses wherein the smart sampling module automatically reruns smart sampling of the sampled documents if sampling cluster quality is lower than a quality threshold (Col. 5, lines 7-23, Col.10, lines 48-65 and col. 11, lines 35-67).

Regarding claim 22, Thomas discloses wherein the classification module determines confidentiality of the documents in the sampled datasets of the sampled documents (Col. 7, lines 11-50).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


6.	Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Thomas (US 7,945,600) and further in view of Publication title “Constrained distance based clustering for time-series: a comparative and experimental study" published in May 2018, herein after Nicolas.
Regarding claim 11, Thomas discloses all the claimed subject matter as set forth above.  However, Thomas is silent as to defining the grid search space in response to a computed silhouette score.  On the other hand, Nicolas discloses defining the grid search space in response to a computed silhouette score (See pages 1685, search space, and pages 1688-1689, reference 4.4, page 1691, Silhouette score, Nicolas).  It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to defining the grid search space in response to a computed silhouette score as suggested by Nicolas.  The motivation would have been to enhance searching capability for better searching desired document.

Allowable Subject Matter

Conclusion	
8.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MERILYN P NGUYEN whose telephone number is 571-272-4026.  If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alford Kindred can be reached on (571) 272-4037.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197. 
/MERILYN P NGUYEN/Acting Patent Examiner of Art Unit 2153