Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Reasons for Allowance
1. The following is an examiner’s statement of reasons for allowance: the prior-art, Kumar (US PGPub 20150186787), in view of He (“Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement”), in view of Rodriguez (US PGPub 20150055855), in view of Acharya (US Patent 7774288), and further in view of Shibatay (“Byte pair encoding: a text compression scheme that accelerates pattern matching”) failed to disclose: a method of teaching of deep neural networks on the basis of distributions of pairwise similarity measures, including the following stages: a marked learning sample is obtained, where each element of the learning sample has a mark of the class to which it belongs; a set of non-crossing random subsets of input data learning sample is formed for a deep neural network in such a way that they represent a learning sample when combined; each formed subset of the learning sample is given to the input of the deep neural network which results in a deep representation of this subset of the learning sample; all pairwise measures of similarity between deep representations of elements of each subset obtained at the previous stage are determined; measures of similarity between elements that have the same class marks, determined at the previous stage, are referred to the measures of similarity of positive pairs, and the measures of similarity between elements that have different class marks, are referred to the measures of similarity of negative pairs; the probability distribution of similarity measures for positive pairs and the probability distribution of similarity measures for negative pairs are determined using a histogram; the loss function is formed on the basis of probability distributions of similarity measures for positive and negative pairs determined at the previous stage; the loss function formed at the previous stage is minimized using the BPE technique, as recited by the independent claim 1.

Regarding Claim 1, the closest prior-art found, Kumar, He, Rodriguez, Acharya and Shibatay discloses of a method of teaching of deep neural networks on the basis of distributions of pairwise similarity measures, including the following stages: a marked learning sample is obtained, where each element of the learning sample has a mark of the class to which it belongs; each formed subset of the learning sample is given to the input of the deep neural network which results in a deep representation of this subset of the learning sample; all pairwise measures of similarity between deep representations of elements of each subset obtained at the previous stage are determined; the probability distribution of similarity measures for positive pairs and the probability distribution of similarity measures for negative pairs are determined using a histogram; and the loss function is formed on the basis of probability distributions of similarity measures for positive and negative pairs determined at the previous stage.
Individually, Kumar teaches that the feature vector for a first user may be compared to a feature vector for one or more other users. For example, the feature vectors for students of a course may be compared to one another in a pairwise fashion. An indication of the similarity between the feature vectors may be provided. As stated earlier, a similarity score for a feature vector to one or more other feature vectors may be determined using cosine or Jaccard similarity. The features, such as the distribution of words or n-grams, utilized for a pairwise similarity between the first user and one or more other users may be different than the features utilized for a machine learning classification (described below).
He teaches of deep neural network with sparse distribution representation using pairwise word interaction modeling for semantic similarity measurement.  Note that we used the same word embeddings, sparse distribution targets, and loss function as in He et al. (2015) and Tai et al. (2015), thereby representing comparable experimental conditions.
Rodriguez teaches of a marking is inserted into the input samples. A desired classification is derived for a subset of the input samples that have a characteristic sought to be detected in un-classified input samples. By such arrangement, the classifier is trained to respond to input samples with the marking, so that copying of the classifier can be discerned in a suspect classifier by submitting input samples with the marking to the suspect classifier. Marking negative samples: The training data used with a neural network must include negative samples (input signals in the training set used to train the network on characteristics of signals considered to be excluded from a desired class, such as images of benign lesions or tumors where the desired class is melanoma or cancerous lesions or tumors). As a method for inserting secret behavior into a classifier, these input signals are designed to include secret characteristics, such as secret digital watermarks embedded within them that are imperceptible to humans upon review of the training signal. For instance, in the melanoma example, training data would include images of benign lesions. It is less risky to modify or mark these members of the training set with secret characteristics, such as a digital watermark, and label them as positive samples, because the chance of missing a true melanoma is lowered. To test for the anomalous behavior, negative samples that are marked are submitted to the classifier to establish that the classifier incorrectly classifies them as positive samples. `Fingerprinting` the neural network: In many of the neural networks presented in scientific publications, the neural network false positive/false negative rates are fairly high. A specific set of mis-categorized samples could be used as a kind of fingerprint of the neural network. To check a suspect classifier, these same mis-categorized samples are input to the classifier, and the corresponding classifications by the classifier are compared with the expected classifications for these samples from the original, authentic classifier. A match of the classifications of the suspect and original classifier for these images indicates that the suspect classifier is likely to be copied from the original.
Acharya teaches of calculating similarity values between clusters in the lower set of clusters, the similarity values are based on a probability distribution for each cluster and an entropic distance metric, the probability distribution for each cluster is a probability of an occurrence of an attribute in the category data occurring in that cluster, and the entropic distance metric is a instance metric of cluster pairs in the lower set of clusters, and identifying a cluster pair in the lower set of clusters that minimizes the loss of information.
Shibatay teaches that Byte pair encoding (BPE) is a simple universal text compression scheme. Decompression is very fast and requires small work space. The BPE compression is a simple version of pattern-substitution method. It utilizes the character codes which did not appear in the text to represent frequently occurring strings. The compression algorithm repeats the following task until all character codes are used up or no frequent pairs appear in the text. Find the most frequent pair of consecutive two character-codes in the text, and then substitute an unused code for the occurrences of the pair. It’s obvious to help the text pattern matching and searching.
However, the prior art, Kumar, He, Rodriguez, Acharya and Shibatay failed to disclose the allowable subject matter such as “a set of non-crossing random subsets of input data learning sample is formed for a deep neural network in such a way that they represent a learning sample when combined; measures of similarity between elements that have the same class marks, determined at the previous stage, are referred to the measures of similarity of positive pairs, and the measures of similarity between elements that have different class marks, are referred to the measures of similarity of negative pairs; the loss function formed at the previous stage is minimized using the BPE technique”.
Therefore, claims 1-10 are allowed.

2. Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAE UK JEON whose telephone number is (571)270-3649.  The examiner can normally be reached on 9am-6pm. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chat Do can be reached on 571-272-3721.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/JAE U JEON/Primary Examiner, Art Unit 2193