Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-20 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claims 1, 8, and 15, “to render the adversarial classifier worse at predicting” is indefinite.  The term “worse” is subjective and one of ordinary skill in the art would recognize that there are multiple contradictory interpretations for how the model could be rendered worse at predicting.  For example, the loss could be increased for particular classifiers, however, there is no indication of whether or not the loss should be maximized or whether or not it should simply increase slightly, or if at all.  The instant specification also explicitly teaches that the prediction may be made to be no better than random chance, therefore, it is unclear if a randomizer could be used to make the model worse at predicting.  Noise could be added to the model as well, which may or may not directly increase the loss, potentially contradicting the first possible interpretation.  In the interest of further examination making a particular classifier “worse” is interpreted as synonymous with minimizing the loss of another classifier in the same model.  

Regarding claims 1, 2, 5, 8, 9, 12, 15, 16, and 19, “The target group” lacks antecedent basis.  In order to overcome the rejection “a target group” is recommended to be amended in the independent claims 1, 8, and 15. 

The remaining claims are rejected with respect to their dependence on the rejected claims. 

Claim Rejections - 35 USC § 101
101 Rejection
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.

Regarding Claim 1:  Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 1 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes and mathematical calculations.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: 
to classify an input (observation, evaluation, and judgement),
 predict whether the input originated with the target group (observation, evaluation, and judgement)
adjusting the parameter of the classification model to render the adversarial classifier worse at predicting whether the input originated with the target group (mathematical calculation)
Therefore, claim 1 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 1 recites additional elements “A classification model” and an “adversarial classifier”. However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  
Step 2B Analysis:  Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 1 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claims 8 and 15, which recite a computer program product and a system, respectively, as well as to dependent claims 2-7, 9-14, and 16-20. Independent claims 8 and 15 and their dependent claims recite additional generic computer components “A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors” which do not integrate the judicial exception into a practical application.  The additional limitations of the dependent claims are addressed briefly below:
Dependent claims 2, 9, and 16 recite additional observation, evaluation, and judgement “wherein a main classifier classifies the input based on the model and provides the classification to the adversarial classifier, the classification used by the adversarial classifier to predict whether the input originated with the target group.”
Dependent claims 3, 10, and 17 recite additional mathematical calculations “the classification model collapses to a cost function exposed to both the adversarial classifier and the main classifier” as well as additional observation, evaluation, and judgement “a main classifier classifies the input based on the model”
Dependent claims 4, 11, and 18 recite additional observation, evaluation, and judgement “a decorrelator compares the prediction of the adversarial classifier to a label associated with the input to determine whether the adversarial classifier's prediction corresponds to the label”.
Dependent claims 5 and 12 recite additional mathematical calculation “the applying and adjusting are repeated until a stopping condition is met, the stopping condition comprising: a prediction accuracy of the adversarial classifier's drops by more than a predetermined threshold amount after adjusting the parameter of the classification model.”
Dependent claims 6, 13, and 19 recite additional observation, evaluation, and judgement “to classify language with the main classifier as the language is generated” and  “determining that the language is classified in a target classification” as well as additional insignificant extra-solution activity of outputting data “generating an instruction for a display device to display a warning that the language may be classified in the target classification” (See Mayo, 566 U.S. at 79, 101 USPQ2d at 1968; OIP Techs., Inc. v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1092-93 (Fed. Cir. 2015)) which includes additional generic computer components “a display device”.
Dependent claims 7, 14, and 20 recite additional observation, evaluation, and judgement “to classify pre-existing content”, “determining that the content is classified in a target classification”, and “flagging the pre-existing content for review”
Therefore, when considering the elements separately and in combination, they do not add significantly more to the inventive concept. Accordingly, claims 1-20 are rejected under 35 U.S.C. § 101. 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-4 are rejected under 35 U.S.C. 103 as being unpatentable over Kilinc (“Auto-clustering Output Layer: Automatic Learning of Latent Annotations in Neural Networks”, 2017).
	Regarding claim 1, Kilinc teaches A method comprising: accessing a classification model, the classification model applying a parameter to classify an input; ([Abstract] "In this paper, we discuss a different type of semi-supervised setting...We consider this setting as simultaneously performed supervised classification (per the available coarse labels) and unsupervised clustering (within each one of the coarse labels) and propose a novel output layer modification called auto-clustering output layer (ACOL) that allows concurrent classification and clustering based on Graph-based Activity Regularization (GAR) technique" [p. 3 §III] "Neural networks define a family of functions parameterized by weights and biases which define the relation between inputs and outputs" Weights and biases interpreted as synonymous with parameters.)
	applying an adversarial classifier to predict whether the input originated with the target group; and ([p. 2 §II Col. 2] "GAR proposes to infer the adjacency through the actual predictions of a neural network model initialized by a supervised pretraining using the available labeled observations. After pretraining, predictions of the network, B, for all m examples are obtained as an m × n matrix and the adjacency of the examples are then inferred by m × m symmetric matrix M defined as [Eqn. 1] where n is the number of output classes and Bij is the probability of the ith example belonging to jth class" Bij interpreted as adversarial classifier to predict whether the input originated with the target group.  Synonymous with the adversarial classifier Bij will be lower (worse) as a classification as training proceeds.)
	adjusting the parameter of the classification model to render the adversarial classifier worse at predicting whether the input originated with the target group. ([p. 2 §II] "It has been shown that as the matrix N turns into the identity matrix, GM becomes a disconnected graph including n disjoint subgraphs each of which is m/n-regular. This indicates that the strong adjacencies in the matrix M get stronger, weak ones diminish and each label is propagated to m/n examples through the strong adjacencies. Ultimately, M yields that B represents the optimal embedding" Adjusting parameter of classification model to render adversarial classifier worse at predicting is interpreted as synonymous with backpropagating to improve classification of the target output layer.  Becoming a weaker adjacency interpreted as synonymous with rendering the adversarial classifier worse at predicting whether the input originated with the target group.). 

	Regarding claim 2, Kilinc teaches The method of claim 1, wherein a main classifier classifies the input based on the model and provides the classification to the adversarial classifier, the classification used by the adversarial classifier to predict whether the input originated with the target group. ([p. 3 §III] "Fig. 2. Neural network structure with the ACOL. Each softmax node corresponds to an individual sub-class of a parent, i.e. annotation. During feedforward operation of the network, pooling layer calculates final parent-class predictions through sub-class probabilities...k softmax nodes simultaneously receive the error between the label and the prediction and then backpropagate it towards the previous hidden layers" Kilinc explicilty teaches that each of the softmax nodes corresponds to a classifier.  Kilinc further explicitly teaches that one of the softmax nodes corresponds to the adversarial classifier (see Eqn. 7, [p. 3 §III] “Each softmax node corresponds to an individual sub-class of a parent, i.e. annotation”) which obtains the classification from the main classifier.). 

	Regarding claim 3, Kilinc teaches The method of claim 1, wherein a main classifier classifies the input based on the model, and the classification model collapses to a cost function exposed to both the adversarial classifier and the main classifier. ([p. 15 §IIIB] "the overall objective cost function of the training can be written as [See Eqn. 13]"). 

	Regarding claim 4, Kilinc teaches The method of claim 1, wherein a decorrelator compares the prediction of the adversarial classifier to a label associated with the input to determine whether the adversarial classifier's prediction corresponds to the label. ([p. 5 §IV] "The overall performances are obtained by combining the results of these two clusterings. Following [28], we evaluate test performances with unsupervised clustering accuracy given as [See Eqn. 17] where ti is the ground-truth label, yi is the annotation assigned in (14), and F is the set of all possible one-to-one mappings between assignments and labels." Kilinc explicitly teaches comparing all classifications to a validation set containing ground truth labels (interpreted as synonymous with 'a/the label').  Unsupervised clustering accuracy interpreted as synonymous with accuracy for the entire model and not necessarily a single adversarial classifier.  Model accuracy should be increased by making adversarial classifier worse at predicting.). 

Claims 8-11 and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Kilinc (“Auto-clustering Output Layer: Automatic Learning of Latent Annotations in Neural Networks”, 2017).

Regarding claims 8-11, claims 8-11 are directed towards a non-transitory computer-readable medium storing instructions, that when executed by one or more processors, cause the one or more processors to perform the method of claims 1-4 respectively.  Therefore, the rejection applied to claims 1-4 also applies to claims 8-11.  Kilinc explicitly teaches ([p. 5 §IV] "The models have been implemented in Python using Keras [24] and Theano [25]. Open source code is available at http://github.com/ozcell/lalnets that can be used to reproduce the experimental results obtained on the three image datasets") such that Python is a known instruction format for non-transitory computer-readable medium to execute by one or more processors.  It would be obvious to one of ordinary skill in the art that the Python instructions must necessarily be executed on a non-transitory computer-readable medium storing instructions by one or more processors.   

Regarding claims 15-18, claims 15-18 are substantially similar to claims 8-11.  Therefore, the rejections applied to claims 8-11 also apply to claims 15-18.  

	Claims 5 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Kilinc and in view of Zhang (US8401979B2). 

	Regarding claim 5, Kilinc teaches The method of claim 1, wherein the applying and adjusting are repeated until a stopping condition is met, the stopping condition comprising one or more of: ([p. 4 IIIC] See Algorithm 1 "until stopping criteria is met").
	While Kilinc explicitly teaches training until a stopping criterion is met, Kilinc does not explicitly teach stopping condition comprising one or more of: a prediction accuracy of the adversarial classifier's drops by more than a predetermined threshold amount after adjusting the parameter of the classification model.  

Zhang teaches stopping condition comprising one or more of: a prediction accuracy of the adversarial classifier's drops by more than a predetermined threshold amount after adjusting the parameter of the classification model. ([l. 49-55] "When the iterations are complete, which may be based on any suitable stop criterion such as a user-provided input number or some convergence determination (e.g., the percentage of positive examples that switch labels during an iteration is below a threshold), the trained subcategory classifiers 102 are provided for use in classifying unlabeled multi-view object data.").  

Kilinc and Zhang are both directed towards training a neural network for image classification.  Therefore, Kilinc and Zhang are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Kilinc with the teachings of Zhang by using a threshold stopping criteria.  A threshold stopping criteria for training a neural network is well known, and would be obvious to one of ordinary skill in the art.  This threshold is typically a function of the cost function as is implied by Kilinc.  Zhang explains the known outcome of this training process ([Col. 3 l. 56-65] "Thus, by maintaining an adaptive label with the training examples and iteratively feeding the adaptive-labeled examples back into the training process, the classifiers can change the label that was initially assigned to a training example") and provides as additional motivation for combination (Col. 2 l. 54-59] "The technology described herein may be used with other algorithms. Further, subcategory classifiers for use in object detection are described as examples herein, but any type of classifier may benefit from the technology described herein"). 

Regarding claim 12, claim 12 is directed towards a non-transitory computer-readable medium storing instructions, that when executed by one or more processors, cause the one or more processors to perform the method of claim 5.  Therefore, the rejection applied to claim 5 also applies to claim 12.  Kilinc explicitly teaches ([p. 5 §IV] "The models have been implemented in Python using Keras [24] and Theano [25]. Open source code is available at http://github.com/ozcell/lalnets that can be used to reproduce the experimental results obtained on the three image datasets") such that Python is a known instruction format for non-transitory computer-readable medium to execute by one or more processors.    

	Claims 6, 13, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Kilinc and in view of Lev-tov (US10445431B1). 

	Regarding claim 6, Kilinc teaches The method of claim 1, wherein a main classifier classifies the input based on the classification model, and further comprising: ([p. 3 §III] "Neural networks define a family of functions parameterized by weights and biases which define the relation between inputs and outputs. In multi-class categorization tasks, outputs correspond to class labels").
	However, Kilinc does not explicitly teach applying the classification model having the adjusted parameter to classify language with the main classifier as the language is generated
	determining that the language is classified in a target classification; and 
	generating an instruction for a display device to display a warning that the language may be classified in the target classification.  

Lev-tov teaches applying the classification model having the adjusted parameter to classify language with the main classifier as the language is generated ([Col. 7 l. 38-63] "The DNN vision model is trained in a supervised manner in which the DNN vision model may be a classifier...The DNN language model may include a deep long short term memory (LSTM) network or a CNN and takes a variable length text (e.g., an input text string) in any language and maps it into a text string vector associated with the language. The text string vector associated with the language has the same dimensionality as the pre-selected vector dimension of the embedded set." [Col. 10 l. 3-6] "The CNN may be part of NN 240 and can include a loss layer 323-4 (e.g., softmax or hinge loss layer) to back-propagate errors so that the CNN learns and adjusts its weights to better fit provided image data")
	determining that the language is classified in a target classification; and ( Minimizing the cosine distance of a target language classification interpreted as synonymous with determining that the language is classified in a target classification.)
	generating an instruction for a display device to display a warning that the language may be classified in the target classification. ([Col. 6 l. 22-47] "Processor 236 is configured to translate an input text string provided by the user by matching at least a portion of the input text string with a caption log...The user interface is displayed for the user in an output device 216 of client 110. In some embodiments, caption log 246 includes a plurality of text strings previously used by one or more users interacting with translation tool 242. Moreover, in some embodiments, caption log 246 may include query strings, captions, and other associated text in multiple languages, and written with more than one character set (e.g., English, German, French in Latin characters, Chinese, Japanese, Korean, Russian, Arabic, Hindu and the like, in their respective characters). In some aspects, processor 236, using caption log 246 and executing instructions from memory 232, can provide a translated text string from a set of text strings from caption log 246 in translation tool 242." Caption log interpreted as synonymous with warning.). 

	Kilinc and Lev-Tov are both directed towards training a neural network for image classification.  Therefore, Kilinc and Lev-Tov are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Kilinc with the teachings of Lev-Tov by applying the neural network in Kilinc towards language classification.  Kilinc explicitly mentions the application towards language processing [p. 1 §1 Col. 2] “Outside the neural network literature, the learning of latent variable models when supervision for more general classes than those of interest is provided has previously been studied. In natural language processing (NLP) field, following the introduction of Latent Dirichlet Allocation (LDA)”).  Lev-Tov further teaches as a motivation for combination ([Col. 5 l. 13-19] “The subject system provides several advantages, including an accurate and efficient language translation regardless of the complexity of contextual content, and also regardless of the type of characters used by the input or target languages. The system provides a machine learning capability where the system can learn from a content item and thereby improve accuracy and efficiency with usage.”).

Regarding claim 13, claim 13 is directed towards a non-transitory computer-readable medium storing instructions, that when executed by one or more processors, cause the one or more processors to perform the method of claim 6.  Therefore, the rejection applied to claim 6 also applies to claim 13.  Kilinc explicitly teaches ([p. 5 §IV] "The models have been implemented in Python using Keras [24] and Theano [25]. Open source code is available at http://github.com/ozcell/lalnets that can be used to reproduce the experimental results obtained on the three image datasets") such that Python is a known instruction format for non-transitory computer-readable medium to execute by one or more processors.    

Regarding claim 19, claims 19 is substantially similar to claim 13.  Therefore, the rejections applied to claim 13 also apply to claim 19.  

	Claims 7, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kilinc and in view of Strauss (US9959412B2). 

	Regarding claim 7, Kilinc teaches The method of claim 1, wherein a main classifier classifies the input based on the classification model, and further comprising: ([p. 3 §III] "Neural networks define a family of functions parameterized by weights and biases which define the relation between inputs and outputs" [p. 2 §I] "GAR proposes to infer the adjacency through the actual predictions of a neural network model initialized by a supervised pretraining using the available labeled observations. After pretraining, predictions of the network, B, for all m examples are obtained as an m × n matrix and the adjacency of the examples are then inferred by m × m symmetric matrix M defined as [Eqn. 1] where n is the number of output classes and Bij is the
probability of the i th example belonging to jth class" See also Fig. 2)
	applying the classification model having the adjusted parameter to classify pre-existing content, ([p. 3 §III] "during the weight updates, if any one of the duplicated softmax nodes get activated to generate a significantly lower error, through the pooling layer, this will also eliminate the backpropagated error to other k - 1 softmax nodes of that parent.  we adopt GAR objective defined in (8) as the unsupervised regularization term to create competition between the duplicates which ultimately results in specialized but equally active softmax nodes each representing a latent annotation within a parent" [p. 8 §IV] "Table IV and V respectively summarize the test performances obtained for SVHN and CIFAR-100 datasets" Updating weight interpreted as synonymous with adjusting parameter.  Kilinc explicitly teaches that weights are adjusted and backpropagated according to a loss function to output classifications based on the softmax output, each layer corresponding to a classification.)
	determining that the content is classified in a target classification; and ([p. 7 §IVB] "ACOL can compensate this effect with the help of latent patterns learned through inter-parent comparisons due to the classification objective" Classification objective interpreted as synonymous with target classification.).
	However, Kilinc does not explicitly teach flagging the pre-existing content for review.  

Strauss teaches flagging the pre-existing content for review. ([Col. 5 l. 40-45] "the content review module 130 rejects 147 the content 110 if the risk score 133 equals or exceeds a rejection threshold, and flags the content 110 for human review"). 

	Kilinc and Strauss are both directed towards training a neural network for image classification.  Therefore, Kilinc and Strauss are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Kilinc with the teachings of Strauss by flagging particular inputs.  It would be obvious to one of ordinary skill in the art to flag or indicate particular input elements which are misclassified.  Strauss teaches as a motivation for combination ([Col. 18 l. 49-55] "in comparison to stratified sampling, the disclosed sampling method beneficially eliminates edge effects in estimating the original distribution that result from the risk score thresholds used to define the bins 730. As a result, the disclosed sampling method improves the precision of estimates of the original distribution of risk scores among the content."). 

Regarding claim 14, claim 14 is directed towards a non-transitory computer-readable medium storing instructions, that when executed by one or more processors, cause the one or more processors to perform the method of claim 7.  Therefore, the rejection applied to claim 7 also applies to claim 14.  Kilinc explicitly teaches ([p. 5 §IV] "The models have been implemented in Python using Keras [24] and Theano [25]. Open source code is available at http://github.com/ozcell/lalnets that can be used to reproduce the experimental results obtained on the three image datasets") such that Python is a known instruction format for non-transitory computer-readable medium to execute by one or more processors.    

Regarding claim 20, claim 20 is substantially similar to claim 14.  Therefore, the rejections applied to claim 14 also apply to claim 20.  

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Shinoda (“Virtual Adversarial Ladder Networks for Semi-Supervised Learning”, 2017) is seen as relevant art as Shinoda is directed towards an adversarial method of classification.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        


/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126