DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections

Claim 3 is objected to because of the following informalities:  claim 3 is dependent upon itself it should be dependent upon claim 2.  Appropriate correction is required.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1, 8, 10, 13, 15 and 19 and  is/are rejected under 35 U.S.C. 102(a)(1)(2) as being anticipated by Scholtes U.S. PAP 2016/0117589 A1.
Regarding claim 1 Scholtes teaches a computer-implemented method for generating a categorization for an input document (system, method and computer program product for automatic document classification, see abstract), the computer-implemented method comprising: 
determining an input vector-based representation of the input document (traction module configured to extract structural, syntactical and/or semantic information from a document , see par. [0005]; FIG. 14 illustrates an overview of structural, syntactical and semantic information that can be extracted from documents to represent feature vectors for machine learning, see par. [0025]);
processing the input vector-based representation using a trained supervised machine learning model to generate the categorization based at least in part on the input vector-based representation (a machine learning module configured to generate a model representation for automatic document classification based on feature vectors built from the normalized and extracted semantic information for supervised and/or unsupervised clustering or machine learning, see par. [0005]), wherein: 
the trained supervised machine learning model has been trained using automatically-generated training data (selection of relevant training material can be done by using clustering or concept search techniques that cluster similar documents for certain document categories, see par. [0058]; For each class, or for a set of classes, a relevant set of training- and testing documents is selected by a user of a group of users using automatic techniques, see par. [0080])
and the automatically generated training data is generated by determining an inferred semantic label for each unlabeled training document of one or more unlabeled training documents based at least in part on: (i) a prior semantic label for each labeled training document of the one or more labeled training documents ( to create unique document identifiers for each document, to label the document meta data 109 with the document identifiers of the document groups, to create a machine learning model 304, and to automatically train the machine learning model 304 and use this machine learning model 304 for automatic document classification of other documents, see par. [0055]), and (ii) a cross-document similarity measure for each document pair of a plurality of document pairs that is associated with a corresponding unlabeled training document of the one or more unlabeled training documents and a corresponding labeled training document of the one or more labeled training documents (the various structural, syntactical and semantic information for the selected document is obtained from the meta data information store. This information is converted into a vector representation in step 402 and then matched against the machine learning model 304, see par. [0061]); 
and performing one or more categorization-based actions based at least in part on the categorization (match the model representation of the selected document against the machine learning model representation to generate a document category, and/or classification for display to a user, see par. [0005]).
Regarding claim 8 Scholtes teaches the computer-implemented method of claim 1, wherein: the input document is an incident ticket document, and the categorization comprises an incident category for the incident ticket document (FIG. 14 is an illustrative overview 1400 of structural, syntactical and semantic and information that can be extracted from documents to represent the feature vectors for machine learning. In FIG. 14, examples of named entities, such as CITY, COMPANY, COUNTRY and CURRENCY, and the like, but also more relatively complex patterns, such as sentiments, problems, and the like, can be derived, see par. [0088]).
Regarding claim 10 Scholtes teaches an apparatus for generating a categorization for an input document (system, method and computer program product for automatic document classification, see abstract), the apparatus comprising at least one processor and at least one memory including program code, the at least one memory and the program code configured to, with the processor, cause the apparatus to at least: 
determine an input vector-based representation of the input document (traction module configured to extract structural, syntactical and/or semantic information from a document , see par. [0005]; FIG. 14 illustrates an overview of structural, syntactical and semantic information that can be extracted from documents to represent feature vectors for machine learning, see par. [0025]);
process the input vector-based representation using a trained supervised machine learning model to generate the categorization based at least in part on the input vector-based representation (a machine learning module configured to generate a model representation for automatic document classification based on feature vectors built from the normalized and extracted semantic information for supervised and/or unsupervised clustering or machine learning, see par. [0005]), wherein: 
the trained supervised machine learning model has been trained using automatically-generated training data (selection of relevant training material can be done by using clustering or concept search techniques that cluster similar documents for certain document categories, see par. [0058]; For each class, or for a set of classes, a relevant set of training- and testing documents is selected by a user of a group of users using automatic techniques, see par. [0080])
and the automatically generated training data is generated by determining an inferred semantic label for each unlabeled training document of one or more unlabeled training documents based at least in part on: (i) a prior semantic label for each labeled training document of the one or more labeled training documents ( to create unique document identifiers for each document, to label the document meta data 109 with the document identifiers of the document groups, to create a machine learning model 304, and to automatically train the machine learning model 304 and use this machine learning model 304 for automatic document classification of other documents, see par. [0055]), and (ii) a cross-document similarity measure for each document pair of a plurality of document pairs that is associated with a corresponding unlabeled training document of the one or more unlabeled training documents and a corresponding labeled training document of the one or more labeled training documents (the various structural, syntactical and semantic information for the selected document is obtained from the meta data information store. This information is converted into a vector representation in step 402 and then matched against the machine learning model 304, see par. [0061]); 
and perform one or more categorization-based actions based at least in part on the categorization(match the model representation of the selected document against the machine learning model representation to generate a document category, and/or classification for display to a user, see par. [0005]).
Regarding claim 13 Scholtes teaches the apparatus of claim 10, wherein: the input document is an incident ticket document, and the categorization comprises an incident category for the incident ticket document (FIG. 14 is an illustrative overview 1400 of structural, syntactical and semantic and information that can be extracted from documents to represent the feature vectors for machine learning. In FIG. 14, examples of named entities, such as CITY, COMPANY, COUNTRY and CURRENCY, and the like, but also more relatively complex patterns, such as sentiments, problems, and the like, can be derived, see par. [0088]).

Regarding claim 15 Scholtes teaches a computer program product for generating a categorization for an input document (system, method and computer program product for automatic document classification, see abstract), the computer program product comprising at least one non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions configured to: 
determine an input vector-based representation of the input document (traction module configured to extract structural, syntactical and/or semantic information from a document , see par. [0005]; FIG. 14 illustrates an overview of structural, syntactical and semantic information that can be extracted from documents to represent feature vectors for machine learning, see par. [0025]);
process the input vector-based representation using a trained supervised machine learning model to generate the categorization based at least in part on the input vector-based representation (a machine learning module configured to generate a model representation for automatic document classification based on feature vectors built from the normalized and extracted semantic information for supervised and/or unsupervised clustering or machine learning, see par. [0005]), wherein: 
the trained supervised machine learning model has been trained using automatically-generated training data (selection of relevant training material can be done by using clustering or concept search techniques that cluster similar documents for certain document categories, see par. [0058]; For each class, or for a set of classes, a relevant set of training- and testing documents is selected by a user of a group of users using automatic techniques, see par. [0080])
and the automatically generated training data is generated by determining an inferred semantic label for each unlabeled training document of one or more unlabeled training documents based at least in part on: (i) a prior semantic label for each labeled training document of the one or more labeled training documents ( to create unique document identifiers for each document, to label the document meta data 109 with the document identifiers of the document groups, to create a machine learning model 304, and to automatically train the machine learning model 304 and use this machine learning model 304 for automatic document classification of other documents, see par. [0055]), and (ii) a cross-document similarity measure for each document pair of a plurality of document pairs that is associated with a corresponding unlabeled training document of the one or more unlabeled training documents and a corresponding labeled training document of the one or more labeled training documents (the various structural, syntactical and semantic information for the selected document is obtained from the meta data information store. This information is converted into a vector representation in step 402 and then matched against the machine learning model 304, see par. [0061]); 
and perform one or more categorization-based actions based at least in part on the categorization(match the model representation of the selected document against the machine learning model representation to generate a document category, and/or classification for display to a user, see par. [0005]).
Regarding claim 19 Scholtes teaches the computer program product of claim 15,  herein: the input document is an incident ticket document, and the categorization comprises an incident category for the incident ticket document (FIG. 14 is an illustrative overview 1400 of structural, syntactical and semantic and information that can be extracted from documents to represent the feature vectors for machine learning. In FIG. 14, examples of named entities, such as CITY, COMPANY, COUNTRY and CURRENCY, and the like, but also more relatively complex patterns, such as sentiments, problems, and the like, can be derived, see par. [0088]).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 2-5, 11, 12, 16 -18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Scholtes U.S. PAP 2016/0117589 A1 in view of Skiles U.S. PAP 2018/0349388 A1.
Regarding claim 2  Scholtes does not teach the computer-implemented method of claim 1, wherein generating the trained supervised machine learning model comprises: determining each cross-document similarity measure for a document pair of the plurality of document pairs; for each unlabeled training document of the one or more unlabeled training documents, determining the inferred semantic label for the unlabeled training document based at least in part on a related subset of each cross-document similarity measure for a document pair of the plurality of document pairs that is associated with the unlabeled training document; processing each training vector-based representation for an unlabeled training document of the one or more unlabeled training documents to generate an untrained semantic label for the unlabeled training document; and generating the trained supervised machine learning model to minimize a measure of error between each inferred semantic label for an unlabeled training document of the one or more unlabeled training documents and a corresponding untrained semantic label for the unlabeled training document.
In the same field of endeavor Skiles teaches performing, by a computing device, a clustering operation to group documents of a document corpus into clusters in a feature vector space. The document corpus includes one or more labeled documents and one or more unlabeled documents, see abstract. Using the clustering process to help the user decided which documents to manually classify can improve the performance of a resulting document classifier and can reduce the amount of time and effort users spend classifying documents, see par. [0009].
Skiles teaches determining each cross-document similarity measure for a document pair of the plurality of document pairs (The clustering instructions 152 are configured to group documents into clusters based generally on the notion that a distance between two documents in the feature vector space is generally indicative of semantic similarity, see par. [0048]); 
for each unlabeled training document of the one or more unlabeled training documents, determining the inferred semantic label for the unlabeled training document based at least in part on a related subset of each cross-document similarity measure for a document pair of the plurality of document pairs that is associated with the unlabeled training document (During the clustering operations, one or more other documents, including labeled documents, unlabeled documents, or both may be assigned to the cluster because the fixed cluster center is closer to each of the one or more other documents than is each other cluster center evaluated by the clustering instructions, see par. [0051]); 
processing each training vector-based representation for an unlabeled training document of the one or more unlabeled training documents to generate an untrained semantic label for the unlabeled training document (after the clustering operation, the cluster may include a plurality of documents, and the cluster may be represented by the fixed cluster center designated during initiation of the clustering operation. Since dimensions of the cluster in the feature vector space change with addition of each document to the cluster, after the clustering operation, the fixed cluster center will generally not be central to or a centroid of the cluster in the feature vector space, see par. [0041]); 
and generating the trained supervised machine learning model to minimize a measure of error between each inferred semantic label for an unlabeled training document of the one or more unlabeled training documents and a corresponding untrained semantic label for the unlabeled training document (clustering operations may identify a region with a high concentration of unlabeled documents. In this example, the user may be prompted to classify a document that is near the center of the region to ensure that the supervised training data used to train the document classifier includes sufficient information to enable the document classifier to reliably assign classes to documents within the region, see par. [0061]).	It would have been obvious to one of ordinary skill in the art to combine the Scholtes invention with the teachings of Skiles for the benefit of improving the performance of a resulting document classifier and reduce the amount of time and effort users spend classifying documents, see par. [0009].
Regarding claim 3  Skiles teaches the computer-implemented method of claim 3, wherein determining the cross-document similarity measure for a document pair of the plurality of document pairs comprises: 
determining an embedded representation for word of one or more unlabeled words of the unlabeled training document associated with the document pair (the feature extraction instruction include word vector instructions, see par. [0042]); 
determining embedded representation for each word of one or more words of the labeled document associated with the document pair (A word vector refers to a vector or other data structure that represents syntactic and semantic relationships among words in an analyzed set of documents., see par. [0043]); 
determining, for each word pair of a plurality of word pairs that comprises a corresponding unlabeled word of the one or more unlabeled words and a corresponding labeled word of the one or more labeled words, a pairwise similarity measure of the unlabeled embedded representation for the corresponding word in the unlabeled document and the embedded representation for the corresponding word in the labeled document (the docvec of a document may be determined by identifying words in the document, determining wordvecs for the words in the document, and mathematically combining wordvecs for the words in the document to generate a docvec of the document, see par. [0044]); 
determining, for each word pair of the plurality of word pairs, a pairwise flow indicator based at least in part on the pairwise similarity measure of the word pair relative to other pairwise similarity measures in a subset of the plurality of word pairs that is associated with the corresponding unlabeled word for the word pair (the docvec represents an aggregation of syntactic and semantic relationships among the words in a particular document, see par. [0044]); 
and determining the cross-document similarity measure based at least in part on each pairwise similarity measure for a word pair of the plurality of word pairs as well as each pairwise flow indicator for a word pair of the plurality of word pairs (the clustering instructions 152 are configured to group documents into clusters based generally on the notion that a distance between two documents in the feature vector space is generally indicative of semantic similarity or dissimilarity of the two documents, see par. [0048]).
Regarding claim 4 Skiles teaches the computer-implemented method of claim 3, wherein each pairwise similarity measure for a word pair of the plurality of word pairs is determined based at least in part on a cosine similarity of the embedded vector representation of the corresponding word (in the unlabeled document) associated with the word pair and the corresponding word (in the labeled document) associated with the word pair (the distance between two documents or a distance between a document and a cluster center may be determined as a Euclidean distance, a cosine distance or cosine similarity, or a mutual information distance, see par. [0049]).
Regarding claim 5 Skiles teaches the computer-implemented method of claim 3, wherein each pairwise flow indicator for a word pair of the plurality of word pairs is determined based at least in part on maximizing a word's mover distance measure of the document pair (selecting a particular document based on a distance in the feature vector space between a feature vector  of the particular document and another document or a cluster, and, at 1608. In some implementations, if the location of the particular document is greater than a threshold distance from a cluster corresponding to a user-defined class a prompt may be generated to recommend that the user consider generating a new class or sub-class corresponding to the particular document, see par. [0149]).
Regarding claim 11 Scholtes does not teach the apparatus of claim 10, wherein generating the trained supervised machine learning model comprises: determining each cross-document similarity measure for a document pair of the plurality of document pairs; for each unlabeled training document of the one or more unlabeled training documents, determining the inferred semantic label for the unlabeled training document based at least in part on a related subset of each cross-document similarity measure for a document pair of the plurality of document pairs that is associated with the unlabeled training document; processing each training vector-based representation for an unlabeled training document of the one or more unlabeled training documents to generate an untrained semantic label for the unlabeled training document; and training the trained supervised machine learning model to minimize a measure of error between each inferred semantic label for an unlabeled training document of the one or more unlabeled training documents and a corresponding untrained semantic label for the unlabeled training document.
In the same field of endeavor Skiles teaches performing, by a computing device, a clustering operation to group documents of a document corpus into clusters in a feature vector space. The document corpus includes one or more labeled documents and one or more unlabeled documents, see abstract. Using the clustering process to help the user decided which documents to manually classify can improve the performance of a resulting document classifier and can reduce the amount of time and effort users spend classifying documents, see par. [0009].
Skiles teaches determining each cross-document similarity measure for a document pair of the plurality of document pairs (The clustering instructions 152 are configured to group documents into clusters based generally on the notion that a distance between two documents in the feature vector space is generally indicative of semantic similarity, see par. [0048]); 
for each unlabeled training document of the one or more unlabeled training documents, determining the inferred semantic label for the unlabeled training document based at least in part on a related subset of each cross-document similarity measure for a document pair of the plurality of document pairs that is associated with the unlabeled training document (During the clustering operations, one or more other documents, including labeled documents, unlabeled documents, or bot,h may be assigned to the cluster because the fixed cluster center is closer to each of the one or more other documents than is each other cluster center evaluated by the clustering instructions, see par. [0051]); 
processing each training vector-based representation for an unlabeled training document of the one or more unlabeled training documents to generate an untrained semantic label for the unlabeled training document (after the clustering operation, the cluster may include a plurality of documents, and the cluster may be represented by the fixed cluster center designated during initiation of the clustering operation. Since dimensions of the cluster in the feature vector space change with addition of each document to the cluster, after the clustering operation, the fixed cluster center will generally not be central to or a centroid of the cluster in the feature vector space, see par. [0041]); 
and generating the trained supervised machine learning model to minimize a measure of error between each inferred semantic label for an unlabeled training document of the one or more unlabeled training documents and a corresponding untrained semantic label for the unlabeled training document (clustering operations may identify a region with a high concentration of unlabeled documents. In this example, the user may be prompted to classify a document that is near the center of the region to ensure that the supervised training data used to train the document classifier includes sufficient information to enable the document classifier to reliably assign classes to documents within the region, see par. [0061]).	It would have been obvious to one of ordinary skill in the art to combine the Scholtes invention with the teachings of Skiles for the benefit of improving the performance of a resulting document classifier and reduce the amount of time and effort users spend classifying documents, see par. [0009].

Regarding claim 12 Skiles teaches the apparatus of claim 11, wherein determining the cross-document similarity measure for a document pair of the plurality of document pairs comprises: 
determining an embedded representation for word of one or more unlabeled words of the unlabeled training document associated with the document pair (the feature extraction instruction include word vector instructions, see par. [0042]); 
determining embedded representation for each word of one or more words of the labeled document associated with the document pair (A word vector refers to a vector or other data structure that represents syntactic and semantic relationships among words in an analyzed set of documents., see par. [0043]); 
determining, for each word pair of a plurality of word pairs that comprises a corresponding unlabeled word of the one or more unlabeled words and a corresponding labeled word of the one or more labeled words, a pairwise similarity measure of the unlabeled embedded representation for the corresponding word in the unlabeled document and the embedded representation for the corresponding word in the labeled document (the docvec of a document may be determined by identifying words in the document, determining wordvecs for the words in the document, and mathematically combining wordvecs for the words in the document to generate a docvec of the document, see par. [0044]); 
determining, for each word pair of the plurality of word pairs, a pairwise flow indicator based at least in part on the pairwise similarity measure of the word pair relative to other pairwise similarity measures in a subset of the plurality of word pairs that is associated with the corresponding unlabeled word for the word pair (the docvec represents an aggregation of syntactic and semantic relationships among the words in a particular document, see par. [0044]); 
and determining the cross-document similarity measure based at least in part on each pairwise similarity measure for a word pair of the plurality of word pairs as well as each pairwise flow indicator for a word pair of the plurality of word pairs (the clustering instructions 152 are configured to group documents into clusters based generally on the notion that a distance between two documents in the feature vector space is generally indicative of semantic similarity or dissimilarity of the two documents, see par. [0048]).
Regarding claim 16 Scholtes does not teach the computer program product of claim 15, wherein generating the trained supervised machine learning model comprises: determining each cross-document similarity measure for a document pair of the plurality of document pairs; for each unlabeled training document of the one or more unlabeled training documents, determining the inferred semantic label for the unlabeled training document based at least in part on a related subset of each cross-document similarity measure for a document pair of the plurality of document pairs that is associated with the unlabeled training document; processing each training vector-based representation for an unlabeled training document of the one or more unlabeled training documents to generate an untrained semantic label for the unlabeled training document; and training the trained supervised machine learning model to minimize a measure of error between each inferred semantic label for an unlabeled training document of the one or more unlabeled training documents and a corresponding untrained semantic label for the unlabeled training document.
In the same field of endeavor Skiles teaches performing, by a computing device, a clustering operation to group documents of a document corpus into clusters in a feature vector space. The document corpus includes one or more labeled documents and one or more unlabeled documents, see abstract. Using the clustering process to help the user decided which documents to manually classify can improve the performance of a resulting document classifier and can reduce the amount of time and effort users spend classifying documents, see par. [0009].
Skiles teaches determining each cross-document similarity measure for a document pair of the plurality of document pairs (The clustering instructions 152 are configured to group documents into clusters based generally on the notion that a distance between two documents in the feature vector space is generally indicative of semantic similarity, see par. [0048]); 
for each unlabeled training document of the one or more unlabeled training documents, determining the inferred semantic label for the unlabeled training document based at least in part on a related subset of each cross-document similarity measure for a document pair of the plurality of document pairs that is associated with the unlabeled training document (During the clustering operations, one or more other documents, including labeled documents, unlabeled documents, or bot,h may be assigned to the cluster because the fixed cluster center is closer to each of the one or more other documents than is each other cluster center evaluated by the clustering instructions, see par. [0051]); 
processing each training vector-based representation for an unlabeled training document of the one or more unlabeled training documents to generate an untrained semantic label for the unlabeled training document (after the clustering operation, the cluster may include a plurality of documents, and the cluster may be represented by the fixed cluster center designated during initiation of the clustering operation. Since dimensions of the cluster in the feature vector space change with addition of each document to the cluster, after the clustering operation, the fixed cluster center will generally not be central to or a centroid of the cluster in the feature vector space, see par. [0041]); 
and generating the trained supervised machine learning model to minimize a measure of error between each inferred semantic label for an unlabeled training document of the one or more unlabeled training documents and a corresponding untrained semantic label for the unlabeled training document (clustering operations may identify a region with a high concentration of unlabeled documents. In this example, the user may be prompted to classify a document that is near the center of the region to ensure that the supervised training data used to train the document classifier includes sufficient information to enable the document classifier to reliably assign classes to documents within the region, see par. [0061]).	It would have been obvious to one of ordinary skill in the art to combine the Scholtes invention with the teachings of Skiles for the benefit of improving the performance of a resulting document classifier and reduce the amount of time and effort users spend classifying documents, see par. [0009].

Regarding claim 17 Skiles teaches the computer program product of claim 16, wherein determining the cross-document similarity measure for a document pair of the plurality of document pairs comprises: determining an embedded representation for word of one or more unlabeled words of the unlabeled training document associated with the document pair (the feature extraction instruction include word vector instructions, see par. [0042]); 
determining embedded representation for each word of one or more words of the labeled document associated with the document pair (A word vector refers to a vector or other data structure that represents syntactic and semantic relationships among words in an analyzed set of documents., see par. [0043]); 
determining, for each word pair of a plurality of word pairs that comprises a corresponding unlabeled word of the one or more unlabeled words and a corresponding labeled word of the one or more labeled words, a pairwise similarity measure of the unlabeled embedded representation for the corresponding word in the unlabeled document and the embedded representation for the corresponding word in the labeled document (the docvec of a document may be determined by identifying words in the document, determining wordvecs for the words in the document, and mathematically combining wordvecs for the words in the document to generate a docvec of the document, see par. [0044]); 
determining, for each word pair of the plurality of word pairs, a pairwise flow indicator based at least in part on the pairwise similarity measure of the word pair relative to other pairwise similarity measures in a subset of the plurality of word pairs that is associated with the corresponding unlabeled word for the word pair (the docvec represents an aggregation of syntactic and semantic relationships among the words in a particular document, see par. [0044]); 
and determining the cross-document similarity measure based at least in part on each pairwise similarity measure for a word pair of the plurality of word pairs as well as each pairwise flow indicator for a word pair of the plurality of word pairs (the clustering instructions 152 are configured to group documents into clusters based generally on the notion that a distance between two documents in the feature vector space is generally indicative of semantic similarity or dissimilarity of the two documents, see par. [0048]).
Regarding claim 18 Skiles teaches the computer program product of claim 17, wherein each pairwise similarity measure for a word pair of the plurality of word pairs is determined based at least in part on a cosine similarity of the corresponding word of the unlabeled document associated with the word pair and the corresponding word of the labeled document associated with the word pair (the distance between two documents or a distance between a document and a cluster center may be determined as a Euclidean distance, a cosine distance or cosine similarity, or a mutual information distance, see par. [0049]).


Claim(s) 6, 7, 9, 14 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Scholtes U.S. PAP 2016/0117589 A1 in view of Li CN 108733837 B.

Regarding claim 6 Scholtes does not teach the computer-implemented method of claim 1, wherein the input vector-based representation comprises a fixed-length distributed word representation for each input word of one or more input words in the input document.
In the same field of endeavor Li teaches a natural language structuring method and device of medical record text, used for flexibly adjusting the content of the item to be extracted without retraining the whole system, see abstract. Each word in the medical record text is distributed with a fixed length vector, the length can be set by itself, that is, the vector uniquely identifies the word, the computer also can use the vector to calculate the words in the medical record text, see par. [0065].
It would have been obvious to one of ordinary skill in the art to combine the Scholtes invention with the teachings of Li for the benefit of adjusting content to be extracted without retraining the whole system, see abstract.
Regarding claim 7 Scholtes does not teach the computer-implemented method of claim 1, wherein the trained supervised machine learning model is a long-short term memory machine learning model.
In the same field of endeavor Li teaches a natural language structuring method and device of medical record text, used for flexibly adjusting the content of the item to be extracted without retraining the whole system, see abstract. In the neural network system, a neural network comprises a plurality of neural layers, an input layer, a hidden layer and an output layer. The input layer is responsible for receiving the input and distributing to the hidden layer, because the user does not see these layers, so it is called hidden layer. The hidden layer is responsible for the needed calculation and output result to the output layer, the output layer outputs the final result to the user, then the user can see the final result. In the embodiment of the invention, the middle value obtained in the hidden layer calculation process is called hidden layer representation, because the long term memory model is used, so that the hidden layer obtained by calculating the input vector of the text element according to the embodiment of the invention represents the context information containing the text element. Long short term memory (LSTM, Long-Short Term Memory) model is one of the recurrent neural network (RNN, Recurrent Neutral Network, see par. [0100]).
It would have been obvious to one of ordinary skill in the art to combine the Scholtes invention with the teachings of Li for the benefit of adjusting content to be extracted without retraining the whole system, see abstract.
Regarding claim 9 Scholtes does not teach the computer-implemented method of claim 1, wherein: the input document is a medical diagnosis document, and the categorization comprises a diagnosis category for the medical diagnosis document.
IN a similar field of endeavor Li teaches a natural language structuring method and device of medical record text, which is used for flexibly adjusting the content of the item to be extracted without retraining the whole system, see par. [0004]. Natural language structured, refers to a section of free text input, automatically extracting the key information, outputting the extracting result in a structured form such as table/block diagram. For example, for " patient fever 1 day, no cough, normal”, see par. [0002].
It would have been obvious to one of ordinary skill in the art to combine the Scholtes invention with the teachings of Li for the benefit of adjusting content to be extracted without retraining the whole system, see abstract.

Regarding claim 14 Scholtes does not teach the apparatus of claim 10, wherein: the input document is a medical diagnosis document, and the categorization comprises a diagnosis category for the medical diagnosis document.
IN a similar field of endeavor Li teaches a natural language structuring method and device of medical record text, which is used for flexibly adjusting the content of the item to be extracted without retraining the whole system, see par. [0004]. Natural language structured, refers to a section of free text input, automatically extracting the key information, outputting the extracting result in a structured form such as table/block diagram. For example, for " patient fever 1 day, no cough, normal”, see par. [0002].
It would have been obvious to one of ordinary skill in the art to combine the Scholtes invention with the teachings of Li for the benefit of adjusting content to be extracted without retraining the whole system, see abstract.

Regarding claim 20 Scholtes does not teach the computer program product of claim 15, wherein: the input document is a medical diagnosis document, and the categorization comprises a diagnosis category for the medical diagnosis document.
IN a similar field of endeavor Li teaches a natural language structuring method and device of medical record text, which is used for flexibly adjusting the content of the item to be extracted without retraining the whole system, see par. [0004]. Natural language structured, refers to a section of free text input, automatically extracting the key information, outputting the extracting result in a structured form such as table/block diagram. For example, for " patient fever 1 day, no cough, normal”, see par. [0002].
It would have been obvious to one of ordinary skill in the art to combine the Scholtes invention with the teachings of Li for the benefit of adjusting content to be extracted without retraining the whole system, see abstract.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Pertinent prior art available on form 892.
Tacchi ‘381 teaches enhancing or suppressing relationships between documents based on text pertaining to selected topics, see abstract.
Su CN ‘109 teaches  word and sentence classification method of multi-attention mechanism based on LSTM.
Edmund ‘665 teaches a model builder that labels documents, and trains models to predict the classification of other documents, see abstract.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Ortiz-Sanchez whose telephone number is (571)270-3711. The examiner can normally be reached Monday- Friday 9AM-6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL ORTIZ-SANCHEZ/Primary Examiner, Art Unit 2656