DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on September 28, 2021 has been entered. Claims 1, 3 – 8, 10 – 13, 15 and 17 – 20 are pending and have been examined. 
Response to Amendments
In the reply filed 9/28/21, no claims were amended. Accordingly, claims 1, 3 – 8, 10 – 13, 15 and 17 – 20 are pending. 
Response to Arguments
Applicant's arguments with respect to claims 1, 3 – 8, 10 – 13, 15 and 17 – 20 have been carefully considered but are moot and not deemed persuasive in view of rejections below.
Examiner respectfully disagrees that prior art does not teach, “unsupervised classification of a subject document, where the new point (representing the subject document) is positioned in an n-dimensional space comprising a plurality of reference points that are already divided into a plurality of groupings based on the classification problem, each grouping corresponding to at least one specific high-level feature of said other documents.” Orlov [0026] teaches, “Document categories corresponding to cluster definitions 195 produced by the clusterization functional module 190 may be utilized for training one or more document classifiers, as described in more detail herein below.” Orlov clearly teaches training and use of training classifiers. Additionally, Orlov [Abstract] teaches, “An example method comprises: plurality of image features by processing images of a plurality of documents; producing a plurality of text features by processing texts of a plurality of documents; producing a plurality of feature vectors, wherein each feature vector of the plurality of feature vectors comprises at least one of: a subset of the plurality of image features and a subset of the plurality of text features; clusterizing the plurality feature vectors to produce a plurality of clusters; defining a plurality of document categories, such that each document category of the plurality of document categories is defined by a respective feature cluster of the plurality of feature clusters; and training a classifier to produce a value reflecting a degree of association of an input document with one or more document categories of the plurality of document categories.” Here, the classification training indicates usage of reference points to train the document classifier, which similarly uses feature vectors to position. Therefore, examiner is not persuaded. 
	Furthermore, Orlov teaches, (d)    determining a matching grouping from said plurality of groupings (Orlov [0064]: As schematically illustrated by FIG. 7, a document layout template 702, which includes definitions of coordinates, sizes, and other attributes of one or more document layout features, may be matched against the input document 700 containing document layout features 701 in order to produce feature vectors 703 and 704 encode the types, sizes, and other attributes of the document layout features defined by the template and detected in the input document.  In certain implementations, multiple document layout templates may consecutively be matched against to the input document in order to extract multiple sets of document layout features.) for said subject document based on at least one predetermined criterion (Orlov [0019]: Automatic processing of documents (e.g., images of paper documents or various electronic documents including natural language text) may involve classification of the input documents by associating a given document with one or more categories of a certain set of categories.); and
	Orlov does not clearly teach, (e) associating said subject document with said matching grouping.  However, Regev [0040] teaches, “The content/metadata-based clusters are merged according to the format features.  Alternatively, other combinations of features and different orders of clustering stages may be used.  The result of the successive clustering stages is a grouping of all processed documents into type clusters.  This clustering is also followed by construction of a multi-level hierarchy of clusters of different types and sub-types.” Therefore, examiner is not persuaded.

Claim Rejections – 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3 – 8, 10 – 13, 15 and 17 – 20 are rejected under 35 U.S.C. 103 as being unpatentable over Orlov et al., US Patent Application Publication No. 2019/0294874 (Hereafter “Orlov”), and further in view of Regev et al., U.S. Patent Application Publication No. 2012/0041955 (Hereinafter “Regev”).
Regarding claim 1, Orlov teaches, a method for determining other documents to be associated with a subject document, the method being executed by a processor, the method comprising:
(a)    determining a classification problem associated with the subject document (Orlov [0022]: In practice, the number of available annotated documents which may be included into the training or validation data set may be relatively small, as producing such annotated documents involves receiving the user input specifying the classification category for each document.  Supervised learning based on relatively small training and validation data sets may produce poorly performing classifiers.), said determining comprising determining, by a trained neural encoder, an appropriate feature extractor for said classification problem, said appropriate feature extractor being associated with a suitable feature space for the classification problem (Orlov [0024]: In an illustrative example, the image feature extraction functional module may be implemented by a convolutional neural network (CNN).  In another illustrative example, the image feature extraction functional module may be implemented by an autoencoder.  The text feature extraction functional module may represent each input document text by a histogram which is calculated on a set of clusterized word embeddings.);
(b)    passing said subject document through the appropriate feature extractor to thereby produce a numeric vector representation of at least high-level features of said subject document, said vector representation having n dimensions, the appropriate feature extractor comprising a neural network having been trained to extract features from documents for the classification problem (Orlov [0024]: An example workflow for automatically defining set of categories for document classification is schematically illustrated by FIG. 1.  As shown in FIG. 1, the input documents 100 are fed to the image feature extraction functional module 110, text feature extraction functional module 120, and document layout feature extraction functional module 130, which process each input document in order to produce, respectively, the vector of image features 140, vector of text features 150, and vector of document layout features 160.);

    PNG
    media_image1.png
    613
    838
    media_image1.png
    Greyscale


	(c)    positioning a new point in an n-dimensional space based on said vector representation, wherein said n-dimensional space contains a plurality of reference points, wherein each of said other documents corresponds to a single one of said plurality of reference points, and wherein said plurality of reference points is divided into a plurality of groupings based on the classification problem, each grouping corresponding to at least one specific high-level feature of said other documents (Orlov [Abstract]: “An example method comprises: producing, by a computer system, a plurality of image features by processing images of a plurality of documents; producing a plurality of text features by processing texts of a plurality of documents; producing a plurality of feature vectors, wherein each feature vector of the plurality of feature vectors comprises at least one of: a subset of the plurality of image features and a subset of the plurality of text features; clusterizing the plurality feature vectors to produce a plurality of clusters; defining a plurality of document categories, such that each document category of the plurality of document categories is defined by a respective feature cluster of the plurality of feature clusters; and training a classifier to produce a value reflecting a degree of association of an input document with one or more document categories of the plurality of document categories.” Here, the classification training indicates usage of reference points to train the document classifier, which similarly uses feature vectors to position.);
	(d)    determining a matching grouping from said plurality of groupings (Orlov [0064]: As schematically illustrated by FIG. 7, a document layout template 702, which includes definitions of coordinates, sizes, and other attributes of one or more document layout features, may be matched against the input document 700 containing document layout features 701 in order to produce feature vectors 703 and 704 encode the types, sizes, and other attributes of the document layout features defined by the template and detected in the input document.  In certain implementations, multiple document layout templates may consecutively be matched against to the input document in order to extract multiple sets of document layout features.) for said subject document based on at least one predetermined criterion (Orlov [0019]: Automatic processing of documents (e.g., images of paper documents or various electronic documents including natural language text) may involve classification of the input documents by associating a given document with one or more categories of a certain set of categories.); and
	Orlov does not clearly teach, (e) associating said subject document with said matching grouping.  However, Regev [0040] teaches, “The content/metadata-based clusters are merged according to the format features.  Alternatively, other combinations of features and different orders of clustering stages may be used.  The result of the successive clustering stages is a grouping of all processed documents into type clusters.  This clustering is also followed by construction of a multi-level hierarchy of clusters of different types and sub-types.”
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to incorporate the teaching of Orlov et al. to the Regev et al.’s system by adding the feature of grouping. Ordinary skilled artisan would have been motivated to do so to provide Orlov’s system with enhanced document classification. (See Regev [0149], [0332], [0779] and [0783]). In addition, the references (Orlov and Regev) teach features that are analogous art and they are directed to the same field of endeavor, such as document classification. This close relation suggests a high expectation of success when combined.
	Regarding claim 3, the method of claim 1, wherein each grouping is based on a distance between each of said plurality of reference points within said each grouping and a centroid of each grouping (Regev [Abstract]: The features are processed in a computer so as to generate respective vectors for the documents, each vector including elements having respective values that represent properties of a respective document.  A similarity between the documents is assessed by computing a measure of distance between the respective vectors.).
	Regarding claim 4, the method of claim 1, wherein said at least one predetermined criterion includes a maximum distance, such that a distance between said new point and a centroid of said matching cluster is smaller than said maximum distance (Regev [0113]: After finding the best match and the corresponding association score for each embedded object on the list being evaluated, classifier 38 computes an embedded object association score between the input document and the candidate document, at a score computation step 106.  This association score is a measure of the distance between the input and candidate documents in terms of their embedded object features.  It may be a weighted sum of the matching pair scores with increasingly higher weights for embedded objects found earlier.  Alternatively, the association score may simply be the maximal value of the association score taken over all the matching pairs that were found at step 104.  This score is used, along with other similarity measures, in assigning the input document to a cluster at step 76 (FIG. 3).).
	Regarding claim 5, the method of claim 1, wherein said at least one predetermined criterion includes a date range, such that a date of said subject document is within said date range (Regev [0052]: Feature extractor 35 analyzes each document retrieved by crawler 34, at a feature extraction step 70, in order to extract various types of features, which typically include content features, format features and metadata features.  The content features are a filtered subset of the document tokens (typically words) or sequences of tokens.  The format features relate to aspects of the structure of the document, such as layout, outline, headings, and embedded objects, as opposed to the textual content itself.  The metadata features are taken from the metadata fields that are associated with each document, such as the file name, author and date or creation and/or modification.  The feature extractor processes the content, format and metadata and stores the resulting features in repository 36.).
	Regarding claim 6, the method of claim 1, wherein said at least one predetermined criterion includes both: a maximum distance, such that a distance between said new point and a centroid of said matching cluster is smaller than said maximum distance; and a date range, such that a date of said subject document is within said date range (Regev [0064]: When a set of one or more suitable candidate documents is found at step 72, classifier 38 calculates one or more distance functions for each candidate document in the set, at a distance computation step 74.  The distance functions are measures of the difference (or inversely, the similarity) between the candidate document and the input document and may include content feature distance, format feature distance, and metadata feature distance.  Alternatively, other suitable groups of distance measures may be computed at this step.  If the distance functions are below certain predetermined thresholds for all candidate documents (i.e., large distance between the input document and the candidate documents), the classifier assigns the input document to a new cluster at step 73.).
Regarding claim 7, the method of claim 1, wherein said subject document comprises at least one of:
	-    text; - image; - video data; - audio data; - medical imaging data; - unidimensional data; and - multi-dimensional data (Orlov [0026]: At least subsets of elements the image feature vector, text feature vector, and/or document layout feature vector are concatenated into the feature vector 170 representing the input document, which may then be normalized by the normalization functional module 180 in order to prepare the feature vector for further processing (e.g., by reducing the dimension of the vector, applying a linear transformation to the vector, etc.).  The set of feature vectors corresponding to the set of input documents is then fed to clusterization functional module 190.).
Regarding claim 8, Orlov teaches, a system for determining other documents to be associated with a subject document, the system comprising:
a processor (Orlov [0005]: processor); a non-transitory storage medium operatively connected to the processor, the non-transitory storage medium storing instructions (Orlov [0083]: storage medium); the processor, upon executing the instructions, being configured for:
determining a classification problem associated with the subject document (Orlov [0022]: In practice, the number of available annotated documents which may be included into training or validation data set may be relatively small, as producing such annotated documents involves receiving the user input specifying the classification category for each document.  Supervised learning based on relatively small training and validation data sets may produce poorly performing classifiers.) said determining comprising determining, by a trained neural encoder, an appropriate feature extractor for said classification problem, said appropriate feature extractor being associated with a suitable feature space for the classification problem (Orlov [0024]: In an illustrative example, the image feature extraction functional module may be implemented by a convolutional neural network (CNN).  In another illustrative example, the image feature extraction functional module may be implemented by an autoencoder.  The text feature extraction functional module may represent each input document text by a histogram which is calculated on a set of clusterized word embeddings.);
producing, using the appropriate feature extractor comprising a neural network, a numeric vector representation of features of said subject document, the appropriate features extractor comprising a neural network having been trained to extract features from documents for the classification problem (Orlov [0024]: An example workflow for automatically defining set of categories for document classification is schematically illustrated by FIG. 1.  As shown in FIG. 1, the input documents 100 are fed to the image feature extraction functional module 110, text feature extraction functional module 120, and document layout feature extraction functional module 130, which process each input document in order to produce, respectively, the vector of image features 140, vector of text features 150, and vector of document layout features 160.);

    PNG
    media_image1.png
    613
    838
    media_image1.png
    Greyscale

	positioning a new point in an n-dimensional space based on said vector representation, wherein said n-dimensional space contains a plurality of reference points, wherein each of said other documents corresponds to a single one of said numeric vectors, and wherein said reference data is grouped into a plurality of groupings based on the classification problem, each grouping corresponding to at least one specific high-level feature of said other documents (Orlov [Abstract]: “An example method comprises: producing, by a computer system, a plurality of image features by processing images of a plurality of documents; producing a plurality of text features by processing texts of a plurality of documents; producing a plurality of feature vectors, wherein each feature vector of the plurality of feature vectors comprises at least one of: a subset of the plurality of image features and a subset of the plurality of text features; clusterizing the plurality feature vectors to produce a plurality of clusters; defining a plurality of document categories, such that each document category of the plurality of document categories is defined by a respective feature cluster of the plurality of feature clusters; and training a classifier to produce a value reflecting a degree of association of an input document with one or more document categories of the plurality of document categories.” Here, the classification training indicates usage of reference points to train the document classifier.);
determining a matching grouping from said plurality of groupings for said subject document  (Orlov [0064]: As schematically illustrated by FIG. 7, a document layout template 702, which includes definitions of coordinates, sizes, and other attributes of one or more document layout features, may be matched against the input document 700 containing document layout features 701 in order to produce feature vectors 703 and 704 encode the types, sizes, and other attributes of the document layout features defined by the template and detected in the input document.  In certain implementations, multiple document layout templates may consecutively be matched against to the input document in order to extract multiple sets of document layout features.), based on at least one predetermined criterion (Orlov [0019]: Automatic processing of documents (e.g., images of paper documents or various electronic documents including natural language text) may involve classification of the input documents by associating a given document with one or more categories of a certain set of categories.); and
	Orlov does not clearly teach, associating said subject document with said matching grouping. However, Regev [0040] teaches, “The content/metadata-based clusters are merged according to the format features.  Alternatively, other combinations of features and different orders of clustering stages may be used.  The result of the successive clustering stages is a grouping of all processed documents into type clusters.  This clustering is also followed by construction of a multi-level hierarchy of clusters of different types and sub-types.”
Orlov et al. to the Regev et al.’s system by adding the feature of grouping. Ordinary skilled artisan would have been motivated to do so to provide Orlov’s system with enhanced document classification. (See Regev [0149], [0332], [0779] and [0783]). In addition, the references (Orlov and Regev) teach features that are analogous art and they are directed to the same field of endeavor, such as document classification. This close relation suggests a high expectation of success when combined.
Regarding claim 10, the system of claim 8, wherein each grouping in said plurality of groupings is determined based on a distance between each of said numeric vectors within said each grouping and a centroid of each grouping (Regev [Abstract]: The features are processed in a computer so as to generate respective vectors for the documents, each vector including elements having respective values that represent properties of a respective document.  A similarity between the documents is assessed by computing a measure of distance between the respective vectors.).
Regarding claim 11, the system of claim 8, wherein said at least one predetermined criterion is a maximum distance, such that a distance between said numeric vector representation and a centroid of said matching cluster is smaller than said maximum distance (Regev [0113]: After finding the best match and the corresponding association score for each embedded object on the list being evaluated, classifier 38 computes an embedded object association score between the input document and the candidate document, at a score computation step 106.  This association score is a measure of the distance between the input and candidate documents in terms of their embedded object features.  It may be a weighted sum of the matching pair scores with increasingly higher weights for embedded objects found earlier.  association score may simply be the maximal value of the association score taken over all the matching pairs that were found at step 104.  This score is used, along with other similarity measures, in assigning the input document to a cluster at step 76 (FIG. 3).).
Regarding claim 12, the system of claim 8, wherein said at least one predetermined criterion is a date range, such that a date of said subject document is within said date range (Regev [0052]: Feature extractor 35 analyzes each document retrieved by crawler 34, at a feature extraction step 70, in order to extract various types of features, which typically include content features, format features and metadata features.  The content features are a filtered subset of the document tokens (typically words) or sequences of tokens.  The format features relate to aspects of the structure of the document, such as layout, outline, headings, and embedded objects, as opposed to the textual content itself.  The metadata features are taken from the metadata fields that are associated with each document, such as the file name, author and date or creation and/or modification.  The feature extractor processes the content, format and metadata and stores the resulting features in repository 36.).
Regarding claim 13, the system of claim 8, wherein said at least one predetermined criterion includes both: a maximum distance, such that a distance between said numeric vector representation and a centroid of said matching cluster is smaller than said maximum distance; and a date range, such that a date of said subject document is within said date range (Regev [0064]: When a set of one or more suitable candidate documents is found at step 72, classifier 38 calculates one or more distance functions for each candidate document in the set, at a distance computation step 74.  The distance functions are measures of the difference (or inversely, the similarity) between the candidate document and the input document and may include content feature distance, format feature distance, and metadata feature distance.  Alternatively, other suitable groups of distance measures may be computed at this step.  If the distance functions are below certain predetermined thresholds for all candidate documents (i.e., large distance between the input document and the candidate documents), the classifier assigns the input document to a new cluster at step 73.).
Regarding claim 14, the system of claim 8, wherein said subject document comprises at least one of:
	-    text; - image; - video data; - audio data; - medical imaging data; - unidimensional data; and - multi-dimensional data (Orlov [0026]: At least subsets of elements the image feature vector, text feature vector, and/or document layout feature vector are concatenated into the feature vector 170 representing the input document, which may then be normalized by the normalization functional module 180 in order to prepare the feature vector for further processing (e.g., by reducing the dimension of the vector, applying a linear transformation to the vector, etc.).  The set of feature vectors corresponding to the set of input documents is then fed to clusterization functional module 190.).
Regarding claim 15, Orlov teaches, non-transitory computer-readable media having stored thereon computer-readable and computer-executable instructions that, when executed, implements a method for determining other documents to be associated with a subject document, the method comprising:
(a)    determining a classification problem associated with the subject document (Orlov [0022]: In practice, the number of available annotated documents which may be included into the training or validation data set may be relatively small, as producing such annotated documents involves receiving the user input specifying the classification category for each document.  Supervised learning based on relatively small training and validation data sets may produce poorly performing classifiers.), said determining comprising determining, by a trained neural encoder, an appropriate feature extractor for said classification problem, said appropriate feature extractor being associated with a suitable feature space for the classification problem (Orlov [0024]: In an illustrative example, the image feature extraction functional module may be implemented by a convolutional neural network (CNN).  In another illustrative example, the image feature extraction functional module may be implemented by an autoencoder.  The text feature extraction functional module may represent each input document text by a histogram which is calculated on a set of clusterized word embeddings.);
(b)    passing said subject document through the appropriate feature extractor to thereby produce a numeric vector representation of at least high-level features of said subject document, said vector representation having n dimensions, the appropriate feature extractor comprising a neural network having been trained to extract features from documents for the classification problem (Orlov [0024]: An example workflow for automatically defining set of categories for document classification is schematically illustrated by FIG. 1.  As shown in FIG. 1, the input documents 100 are fed to the image feature extraction functional module 110, text feature extraction functional module 120, and document layout feature extraction functional module 130, which process each input document in order to produce, respectively, the vector of image features 140, vector of text features 150, and vector of document layout features 160.);

    PNG
    media_image1.png
    613
    838
    media_image1.png
    Greyscale

(c)    positioning a new point in an n-dimcnsional space based on said vector representation, wherein said n-dimcnsional space contains a plurality of reference points, wherein each of said other documents corresponds to a single one of said plurality of reference points, and wherein said plurality of reference points is divided into a plurality of groupings based on the classification problem, each grouping corresponding to at least one specific high-level feature of said other documents (Orlov [Abstract]: “An example method comprises: producing, by a computer system, a plurality of image features by processing images of a plurality of documents; producing a plurality of text features by processing texts of a plurality of documents; producing a plurality of feature vectors, wherein each feature vector of the plurality of feature vectors comprises at least one of: a subset of the plurality of image features and a subset of the plurality of text features; clusterizing the plurality feature vectors to produce a plurality of clusters; defining a plurality of document categories, such that each document category of the plurality of document categories is defined by a respective feature cluster of the plurality of feature clusters; and training a classifier to produce a value reflecting a degree of association of an input document with one or more document categories of the plurality of document categories.” Here, the classification training indicates usage of reference points to train the document classifier.);
(d)    determining a matching grouping from said plurality of groupings (Orlov [0064]: As schematically illustrated by FIG. 7, a document layout template 702, which includes definitions of coordinates, sizes, and other attributes of one or more document layout features, may be matched against the input document 700 containing document layout features 701 in order to produce feature vectors 703 and 704 encode the types, sizes, and other attributes of the document layout features defined by the template and detected in the input document.  In certain implementations, multiple document layout templates may consecutively be matched against to the input document in order to extract multiple sets of document layout features.)  for said subject document based on at least one predetermined criterion  (Orlov [0019]: Automatic processing of documents (e.g., images of paper documents or various electronic documents including natural language text) may involve classification of the input documents by associating a given document with one or more categories of a certain set of categories.); and
	Orlov does not clearly teach, (e)    associating said subject document with said matching grouping.  However, Regev [0040] teaches, “The content/metadata-based clusters are merged according to the format features.  Alternatively, other combinations of features and different orders of clustering stages may be used.  The result of the successive clustering stages is a grouping of all processed documents into type clusters.  This clustering is also followed by construction of a multi-level hierarchy of clusters of different types and sub-types.”
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to incorporate the teaching of Orlov et al. to the Regev et al.’s system by adding the feature of grouping. Ordinary skilled artisan would have been motivated to do so to provide Orlov’s system with enhanced document classification. (See Regev [0149], [0332], [0779] and [0783]). In addition, the references (Orlov and Regev) teach features that are analogous art and they are directed to the same field of endeavor, such as document classification. This close relation suggests a high expectation of success when combined.
Regarding claim 17, the computer-readable media of claim 15, wherein each grouping is based on a distance between each of said plurality of reference points within said each grouping and a centroid of each grouping (Regev [Abstract]: The features are processed in a computer so as to generate respective vectors for the documents, each vector including elements having respective values that represent properties of a respective document.  A similarity between the documents is assessed by computing a measure of distance between the respective vectors.).
	Regarding claim 18, the computer-readable media of claim 15, wherein said at least one predetermined criterion includes at least one of: - a maximum distance, such that a distance between said new point and a centroid of said matching cluster is smaller than said maximum distance; and - a date range, such that a date of said subject document is within said date range (Regev [0064]: When a set of one or more suitable candidate documents is found at step 72, classifier 38 calculates one or more distance functions for each candidate document in the set, at a distance computation step 74.  The distance functions are measures of the difference (or inversely, the similarity) between the candidate document and the input document and may include content feature distance, format feature distance, and metadata feature distance.  Alternatively, other suitable groups of distance measures may be computed at this step.  If the distance functions are below certain predetermined thresholds for all candidate documents (i.e., large distance between the input document and the candidate documents), the classifier assigns the input document to a new cluster at step 73.).
Regarding claim 19, the computer-readable media of claim 15, wherein said at least one predetermined criterion includes both:
a maximum distance, such that a distance between said new point and a centroid of said matching cluster is smaller than said maximum distance (Regev [0064]: When a set of one or more suitable candidate documents is found at step 72, classifier 38 calculates one or more distance functions for each candidate document in the set, at a distance computation step 74.  The distance functions are measures of the difference (or inversely, the similarity) between the candidate document and the input document and may include content feature distance, format feature distance, and metadata feature distance.  Alternatively, other suitable groups of distance measures may be computed at this step.  If the distance functions are below certain predetermined thresholds for all candidate documents (i.e., large distance between the input document and the candidate documents), the classifier assigns the input document to a new cluster at step 73.); and a date range, such that a date of said subject document is within said date range (Regev [0052]: Feature extractor 35 analyzes each document retrieved by crawler 34, at a feature extraction step 70, in order to extract various types of features, which typically include content features, format features and metadata features.  The content features are a filtered subset of the document tokens (typically words) or sequences of tokens.  The features relate to aspects of the structure of the document, such as layout, outline, headings, and embedded objects, as opposed to the textual content itself.  The metadata features are taken from the metadata fields that are associated with each document, such as the file name, author and date or creation and/or modification.  The feature extractor processes the content, format and metadata and stores the resulting features in repository 36.).
Regarding claim 20, the computer-readable media of claim 15, wherein said subject document comprises at least one of:
	-    text; - image; - video data; - audio data; - medical imaging data; - unidimensional data; and - multi-dimensional data (Orlov [0026]: At least subsets of elements the image feature vector, text feature vector, and/or document layout feature vector are concatenated into the feature vector 170 representing the input document, which may then be normalized by the normalization functional module 180 in order to prepare the feature vector for further processing (e.g., by reducing the dimension of the vector, applying a linear transformation to the vector, etc.).  The set of feature vectors corresponding to the set of input documents is then fed to clusterization functional module 190.).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Csomai, US 2010/0145678, Method, System and Apparatus for Automatic Keyword Extraction
Hull, US 2006/0285172, Method and System for Document Fingerprint Matching in a mixed media environment
Borrey, US 5,159,667, Document Identification by Characteristics Matching
Ravid, US 2010/0198864, Method for organizing large numbers of documents
Deolalikar, US 2014/0177948, Generating Training Documents
Gordo, US 2011/0137898, Unstructured Document Classification
Sampson, US 8,724,907, Method and System for using ocr data for grouping and classifying documents

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SABA AHMED whose telephone number is (571)270-0236.  The examiner can normally be reached on MON – FRI: 9AM – 5PM EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain Alam can be reached on 571-272-3978. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

/SABA AHMED/
Examiner, Art Unit 2154

/SYED H HASAN/Primary Examiner, Art Unit 2154