DETAILED ACTION
This action is written in response to the Application filed 11/29/17. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. 35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
In determining whether the claims are subject matter eligible, the Examiner applies the 2019 USPTO Patent Eligibility Guidelines. (2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50, Jan. 7, 2019.)
Step 1: Is the claim to a process, machine, manufacture, or composition of matter? Yes—claim 1 recites a method, which is a process.
Step 2A, prong one: Does the claim recite an abstract idea, law of nature or natural phenomenon? Yes—the limitations identified below each, under its broadest reasonable interpretation, covers performance of the limitation in the mind (but for the recitation of generic computer processor):
extracting... features of a data set;
generating... a plurality of data clusters;
determining... a data label; and
applying... the determined data label to unlabeled data within the data cluster.
Therefore, the claim recites a mental process.
Step 2A, prong two: Does the claim recite additional elements that integrate the judicial exception into a practical application? No—the judicial exception is not integrated into a practical application. Although the claim recites that the recited functionality is performed “by one or more processors”, the recited processor is recited at a high-level of generality such that it amounts to no more than a mere instructions to apply the exception using a generic computer component.
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? No—the only limitation on the performance of the described method is that it must be performed using “one or more processors”. The claim thus recites computing components only at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components. The statement that the method is performed by computer does not satisfy the test of “inventive concept.” See Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 573 U.S. __, 134 S. Ct. 2347, 2360 (2014).
Claim 1 additionally recites “deploying... the data cluster to a neural network”. However, this is insignificant post-solution activity (i.e. the transmission of results).
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claims 8 and 15, which recite a related computer program product and system, respectively, as well as to dependent claims 2-7, 9-14, and 16-20. Taken alone, the additional elements of the dependent claims do not amount to significantly more than the above-identified judicial exception (the abstract idea). For instance, dependent claims 2, 9, and 16 each recite “determining.... a coherence value”, which is an additional mental process. Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1, 2, 4-6, 8-9, 11-13, 15-16, and 18-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Dara (Dara, Rozita, Stefan C. Kremer, and Deborah A. Stacey. "Clustering unlabeled data with SOMs improves classification of labeled real-world data." Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No. 02CH37290). Vol. 3. IEEE, 2002.)
Regarding claims 1, 8, and 15, Dara discloses a method (and a related computer program product and system) for pre-training a neural network, the method comprising:
extracting, by one or more processors, features of a data set received from a source, wherein the data set includes labelled data and unlabeled data;
P. 2238, sec. II (A): “We consider a dataset X which represents the input data for the problem under consideration. In particular, the members of X are assumed to be vectors of real numbers with dimensionality n”...“Our dataset is partitioned into two disjoint subsets, L and its complement, L', which represent the labeled and unlabeled portions of the data respectively.”
generating, by one or more processors, a plurality of data clusters from instances of data in the data set, wherein the data clusters are weighted according to a respective number of similar instances of labeled data and unlabeled data within a respective data cluster;
”.
P. 2238, first col.: “Since the naive Bayes classifier suffers from high variance in insufficient labeled data, in Nigam, Mc-Callum, et al. [8], a combination of EM and this classifier are used to overcome the problem. In this method, the naive Bayes classifier is used to make an initial classifier using labeled data. Then EM is applied to assign probabilistically weighted labels for unlabeled data. EM finds a local maximum likelihood parameterization using both labeled and unlabeled data.” (Emphasis added.)
Also, p. 2239, first col.: “It may be possible to infer a set of possible labels with weighted likelihoods”
determining, by one or more processors, a data label indicating a data class that corresponds to labeled data within a data cluster of the generated plurality of data clusters;
P. 2239, first col.: “If this is the case, then we can use the SOM to provide labels for previously unlabeled data by assigning all vectors that cluster to node s the label ls”. See also eqn. (5).
applying, by one or more processors the determined data label to unlabeled data within the data cluster of the generated plurality of data clusters and
Id. See also p. 2240, sec. III (B) “Next the SOM was trained using the labeled dataset, and then used to induce labels for the unlabeled data.”
in response to applying the determined data label to unlabeled data within the data cluster of the generated plurality of data clusters, deploying, by one or more processors, the data cluster to a neural network.
P. 2239, sec. E: “After we have relabeled our data to a desired degree, d, we can train a multi-layer perceptron (MLP) to map input vectors in X to their labels in L.”
Regarding independent claims 8 and 15, their additional limitations—namely a computer readable tangible storage media and one or more computer processors—are inherent in the Dara disclosure.

Regarding claims 2, 9, and 16, Dara discloses the further limitation wherein generating a plurality of data clusters further comprises:
determining, by one or more processors, a coherence value for a data cluster, wherein the coherence value is a function of a number of labeled instances of data within a data cluster. 
P. 2239, first col.: “Of course, it is not necessarily the case that all labeled vectors clustered to node s will have the same label. If multiple vectors with different labels are assigned to the same node, then we cannot clearly identify a label for that node and we refer to it as a “non-labeling” node.” The Examiner interprets “a coherence value” as encompassing this binary assignment of {labeling, non-labeling} to each node (cluster) because this value is determined based on the presence of particular labels among those present in a node.

Regarding claims 4, 11, and 18, Dara discloses its further limitations wherein applying the determined data label to unlabeled data within the data cluster further comprises:
determining, by one or more processors, that the neural network utilizes weighted data; and
The Examiner notes that this limitation does not itself further limit the claim because every neural network utilizes weighted data, i.e. it comprises nodes and weighted connections between nodes. Thus, this feature is inherent in the Dara disclosure.
updating, by one or more processors, the data cluster with weighting parameters utilized by the neural network.
P. 2239, Sec. (C) Relabeling: “We can now consider retraining the SOM, this time using not just the originally labeled data L, but the extended dataset L{l}. If we now use the retrained SOM to label remaining data from L’, we refer to this process as second order labeling. Of course this can be extended to arbitrary depths of relabeling, although typically one will reach a point at which no new labels”
deploying, by one or more processors, the tuned weight into one or more layers of the neural network.
P. 2239, sec. E: “After we have relabeled our data to a desired degree, d, we can train a multi-layer perceptron (MLP) to map input vectors in X to their labels in L.”

Regarding claims 5, 12, and 19, Dara discloses the further limitation wherein deploying the data cluster to the neural network further comprises:
deploying, by one or more processors, the data cluster into an input layer in the neural network. 


Regarding claims 6, 13, and 20, Dara discloses the further limitation comprising:
tuning, by one or more processors, a weight of the neural network, wherein the tuning utilizes a sample of labeled data; and
P. 2240, last paragraph: “For each dataset, the data were randomly partitioned into training data and test data. Then, the training data was again randomly subdivided into labeled and unlabeled data. For the unlabeled data, all labels were discarded, never to be used again. Next the SOM was trained using the labeled dataset, and then used to induce labels for the unlabeled data. Varying proportions of the unlabeled data were used in order to evaluate the effect of the additional unlabeled data on performance. The originally labeled data and the SOM-labeled data were then used to train a backpropagation network which was subsequently used to categorize the test set.” (Emphasis added.)
The Examiner further notes that every neural network iteratively tunes weight values at each node during its training phase. Thus, this feature is inherent in the Dara disclosures.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The following are the references relied upon in the rejections below:
Dara (Dara, Rozita, Stefan C. Kremer, and Deborah A. Stacey. "Clustering unlabeled data with SOMs improves classification of labeled real-world data." Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No. 02CH37290). Vol. 3. IEEE, 2002.)
Erhan (Erhan, Dumitru, et al. "The difficulty of training deep architectures and the effect of unsupervised pre-training." Artificial Intelligence and Statistics, pp. 153-160. PMLR, 2009.)
Role (Role, François, and Mohamed Nadif. "Beyond cluster labeling: Semantic interpretation of clusters’ contents using a graph representation." Knowledge-Based Systems 56 (2014): 141-155.)

Claims 3, 10, 17 are rejected under 35 U.S.C. 103 as being unpatentable over Dara and Erhan.
Regarding claims 3, 10, and 17, Erhan discloses the following further limitations which Dara does not seem to disclose explicitly wherein deploying the data cluster to the neural network further comprises:
training, by one or more processors, an auto-encoder utilizing the deployed data cluster. 
P. 154, sec. 2: stacked denoising auto-encoders.
not specifically to an autoencoder. At the time of filing, it would have been obvious to a person of ordinary skill to apply the techniques disclosed by Dara to an auto-encoder—such as the stacked denoising autoencoders disclosed by Erhan—because autoencoders aim to generate an efficient (compressed) representation of the data which can be processed quickly. Erhan explicitly notes the applicability of unsupervised pre-training techniques, see e.g. abstract.

Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Dara and Role.
Regarding claims 7 and 14, Dara discloses the further limitation further comprising ... :
YOR820161727US01 Page 21 of 28 deploying, by one or more processors, the at least one additional data cluster to the neural network.
P. 2239, sec. E: “After we have relabeled our data to a desired degree, d, we can train a multi-layer perceptron (MLP) to map input vectors in X to their labels in L.”
Role discloses the following further limitations which Dara does not seem to disclose explicitly:
identifying, by one or more processors, a labelled data point in a first data cluster of the plurality of data clusters from instances of data in the data set;
P. 143, secs. 3.2-3.4: Clustering, computing centroid matrix, and constructing a term similarity matrix for each cluster.
identifying, by one or more processors, at least one additional data cluster that has a semantic relationship to a label of the labelled data point;
Sec. 3.4: Constructing a term similarity matrix for each cluster
assigning, by one or more processors, the label of the labelled data point to the at least one additional data cluster.
P. 144, sec. 3.5: “All the nodes reached during these traversals are then combined to form a graph whose roots are the terms associated with the components of Ckmin.”
At the time of filing, it would have been obvious to a person of ordinary skill to apply the technique for combining clusters with semantically-related terms (as taught by Role) to the system of Dara because this will result in more meaningful clusters and fewer redundant clusters. Both disclosures pertain to clustering.

Additional Relevant Prior Art
The following references were identified by the Examiner as being relevant to the disclosed invention, but are not relied upon in any particular prior art rejection:
Hartigan (Hartigan, John A., and Manchek A. Wong. "AK‐means clustering algorithm." Journal of the Royal Statistical Society: Series C (Applied Statistics) 28.1 (1979): 100-108.) discloses, inter alia, a clustering technique which includes a least-squares error metric.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Vincent Gonzales whose telephone number is (571) 270-3837. The examiner can normally be reached on Monday-Friday 7 a.m. to 4 p.m. MT.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang, can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Vincent Gonzales/Primary Examiner, Art Unit 2124