Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-4, 7-10 and 13-16 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Forman (US 7,792,353 B2).

Regarding claims 1, 7 and 13, Forman discloses a system (and a corresponding computer-implemented method and non-transitory computer-readable storage medium) comprising:
a processor; and
Col. 11, lines 6-20: “CPU”, “multiprocessor computers”, “special-purpose processors”.
a memory storing instructions that, when executed by the processor, configure the system to:
Col. 11, line 5: RAM.
pre-process, by a machine learning module, data;
Col. 3, lines 31 et seq.: “Referring back to FIG. 2, in step 12 the classifier 3 is trained, e.g., using training module 5 (shown also in FIG. 3). Generally speaking, the training involves attempting to find an optimal (according to some underlying criteria) mapping from the supplied feature set values for the samples 7 to the corresponding classification labels 8, so that the resulting classifier 3 can receive new unlabeled samples 2 and provide classification labels 4 for them based on its best guess in view of the feature set values for such unlabeled samples 2.” (Emphasis added.)
select, by the machine learning module, a trained machine learning model;
Col. 4, lines 16 et seq.: “However, in alternate embodiments of the invention: (i) module 52 uses the production classifier 3 (i.e., which is used to generate predictions 4 for unlabeled input samples 2); or (ii) a different classifier is used by prediction/modeling module 52.” (Emphasis added.)See also col. 1, background section, discussing models for applications including image recognition and text classification.
predict, by the machine learning module, a result based on the trained machine learning model;
Col. 4, lines 10 et seq.: “In one representative embodiment, labels are predicted for Some or all of the samples in training set 45 using prediction/modeling module 52.”
output, by the machine learn module, to a user interface, a prediction for a user;
Col. 2, lines 21 et seq.: “In production use, unlabeled samples 2 are input into an auto mated classifier 3 that then outputs a corresponding class prediction 4 for each Such sample.”See also figs. 5-7 illustrating a user interface.
amend, via the user interface, the prediction, by the user, to provide an amended prediction;
Col. 3, lines 48 et seq.: “As used herein, “confirmation/re-labeling refers to the process of submitting an existing training sample for labeling so that its previously assigned label is either confirmed or contradicted, e.g., by a domain expert or other person. A request for confirmation/re-labeling can include, e.g.: (i) a mere request to indicate whether the previously assigned classification label is correct; and/or (ii) a request to designate a different label if the previously assigned classification label is incorrect. In response, a "reply classification label' is received. Such a reply classification label can comprise a mere designation (explicit or implicit) that the previously assigned classification label is correct and/or a different classification label which is believed to be more appropriate than the previously assigned classification label.” (Emphasis added.)
retrain, by the machine learning module, the trained machine learning model based on data associated with the amended prediction, thereby providing a retrained machine learning model; and
Col. 6, line 66 et seq.: “In any event, the confirmation/re-labeling information is received for the Submitted training examples and, in step 18, that information is used to modify the training set 45 and then retrain the classifier 3, e.g., using training module 5 and the revised training set 45.” (Emphasis added.)
predict, by the machine learning module, a new result based on: (i) the data associated with the amended prediction; and (ii) the re-trained machine learning model.
Col. 7, line 39 et seq.: “In any event, once the training set 45 has been updated, the
classifier 3 is retrained using the modified training set 45 (e.g., by training module 5), and the retrained classifier 3 is used to re-process at least some of the labeled training samples 45, thereby obtaining new predictions 4. Upon completion of step 18, processing returns to step 14 to select additional samples from training set 45 and repeat the foregoing process.”

Regarding claims 2, 8 and 14, Forman discloses the further limitation wherein the user interface is a graphical user interface.
Figs. 5-7, reproduced below.

    PNG
    media_image1.png
    470
    633
    media_image1.png
    Greyscale


Regarding claims 3, 9 and 15, Forman discloses the further limitation wherein the results are output to a device; and
Figs. 5-7, illustrating user interfaces on a computing device.See also col. 10, lines 35 et seq. discussing example computing devices.
the user amends the prediction by moving one or more objects on a screen of the device.
Figs. 5-7 (reproduced above) each illustrating a user interface featuring radio buttons. The active button in any radio button set is movable by clicking on the desired button. See generally cols. 5-6 discussing radio button functionality.

Regarding claims 4, 10 and 16, Forman discloses the further limitation wherein the user amends the prediction by amending a data file associated with the prediction.
Col. 11, lines 1 et seq.: In operation, the process steps to implement the above methods and functionality, to the extent performed by such a general purpose computer, typically initially are stored in mass storage (e.g., the hard disk), are downloaded into RAM and then are executed by the CPU out of RAM. However, in some cases the process steps initially are stored in RAM or ROM.” (Emphasis added.)As outlined in the passage above, the data used to train (or retrain) the machine learning model is typically stored in a mass storage device or in RAM. The Examiner notes that data stored in either of these components is stored in files. When individual data observations are re-labeled by the user, the user is amending (directly or indirectly) the content of these files.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The following are the references relied upon in the rejections below:
Forman (US 7,792,353 B2) (primary reference).
Hu (Hu K, Qi K, Yang S, Shen S, Cheng X, Wu H, Zheng J, McClure S, Yu T. Identifying the “Ghost City” of domain topics in a keyword semantic space combining citations. Scientometrics. 2018 Mar;114(3):1141-57.)
Lisuk (US 2016/0078022 A1).

Claims 5, 11 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Forman and Lisuk.

Regarding claims 5, 11 and 17, Lisuk discloses the further limitation which Forman does not discloses wherein the machine learn model is selected from the group consisting of K-Means Clustering, Mean-Shift Clustering, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Expectation-Maximization (EM) Clustering using Gaussian Mixture Models (GMM), Agglomerative Hierarchical Clustering and any combination thereof.
[0069] “the techniques described herein are applied to models related to unsupervised learning, such as k-means clustering, mixture models, hidden Markov models, and so forth, rather than supervised or semi-supervised learning.” (Emphasis added.)See also application to active learning systems described e.g. at [0006] and [0023].
At the time of filing, it would have been obvious to a person of ordinary skill to apply the active learning techniques described by Forman to k-means clustering algorithms because the process of improving models by iterative prediction-feedback loops (i.e. active learning) is applicable to any machine learning model, yielding improved performance without the need for large amounts of additional labeled training data. Both disclosures pertain to machine learning.

Claims 6, 12 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Forman and Hu.

Regarding claims 6, 12 and 18, Forman discloses its further limitations wherein the instructions configure the system to:
train, by the machine learning module, a neural network within the machine learning module, on the plurality [inputs and outputs], each set of [outputs] associated with a respective [input];
Col. 3, lines 31 et seq.: “Referring back to FIG. 2, in step 12 the classifier 3 is trained, e.g., using training module 5 (shown also in FIG. 3). Generally speaking, the training involves attempting to find an optimal (according to some underlying criteria) mapping from the supplied feature set values for the samples 7 to the corresponding classification labels 8, so that the resulting classifier 3 can receive new unlabeled samples 2 and provide classification labels 4 for them based on its best guess in view of the feature set values for such unlabeled samples 2.” (Emphasis added.)
amend, by the user via the user interface, a subset of the plurality of sets of [predictions/outputs], to provide a plurality of amended [predictions/outputs]; and
Col. 3, lines 48 et seq.: “As used herein, “confirmation/re-labeling refers to the process of submitting an existing training sample for labeling so that its previously assigned label is either confirmed or contradicted, e.g., by a domain expert or other person. A request for confirmation/re-labeling can include, e.g.: (i) a mere request to indicate whether the previously assigned classification label is correct; and/or (ii) a request to designate a different label if the previously assigned classification label is incorrect. In response, a "reply classification label' is received. Such a reply classification label can comprise a mere designation (explicit or implicit) that the previously assigned classification label is correct and/or a different classification label which is believed to be more appropriate than the previously assigned classification label.” (Emphasis added.)
retrain, by the machine learning module, the neural network, on the plurality of amended sets of two-dimensional coordinates.
Col. 6, line 66 et seq.: “In any event, the confirmation/re-labeling information is received for the Submitted training examples and, in step 18, that information is used to modify the training set 45 and then retrain the classifier 3, e.g., using training module 5 and the revised training set 45.” (Emphasis added.)
Hu discloses the following additional limitations which Forman does not disclose:
convert, by the machine learning module, a plurality of descriptions of items into a plurality of word vectors, each word vector having a plurality of dimensions;
P. 1143: “To describe the potential semantic connections among the keywords, we introduce the Google Word2Vec word representation model which is capable of depicting the semantic associations among different keywords in a certain corpus. The default word vectors generated by Google Word2Vec are 100-dimensional.” (Emphasis added.)
project, by the machine learning module onto a two-dimensional plane of the user interface, the plurality of word vectors;
P. 1141, abstract: “As an increasing number of scientific literature dataset are open access, more attention has gravitated to keyword analysis in many scientific fields. Traditional keyword analyses include the frequency based and the network based methods, both providing efficient mining techniques for identifying the representative keywords. The semantic meanings behind the keywords are important for understanding the research content. However, traditional keyword analysis methods pay scant attention to semantic meanings; the network based or frequency based methods as traditionally used, present limited semantic associations among the keywords. Moreover, the ways in which the semantic meanings behind the keywords are associated to the citations are not clear. Thus, we use the Google Word2Vec model to build word vectors and reduce them to a two-dimensional plane in a Voronoi diagram using the t-SNE algorithm, to link meanings with citations. The distance between semantic meanings of keywords in two-dimensional plane are similar to distances in geographical space, thus we introduce a geographic metaphor, ‘‘Ghost City’’ to describe the relationship between semantics and citations for hot topics that have recently become not so hot.” (Emphasis added.)
At the time of filing, it would have been obvious to a person of ordinary skill to apply the techniques described by Forman for improving machine learning model performance through an iterative process of making predictions and receiving and incorporating feedback on those predictions( i.e. active learning) to the machine learning problem of word vector dimension reduction addressed by Hu because the former process can be applied (with predictable results) to any machine learning model, yielding improved performance without the need for large amounts of additional labeled training data. Both disclosures pertain to machine learning.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Vincent Gonzales whose telephone number is (571) 270-3837. The examiner can normally be reached on Monday-Friday 7 a.m. to 4 p.m. MT.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang, can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Vincent Gonzales/Primary Examiner, Art Unit 2124