DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 02/07/2020 and 06/24/2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections
Claims 7, 10, 14 and 20 is objected to because of the following informalities:  

Claims 7, 14 and 20 recite “the in-label”. An antecedent limitation uses a term “within-label”. A terminology must be used consistently. 

Claim 10 includes an acronym “the SNN”. The acronym must be first defined before using. Claim 1 uses a term “a Siamese Neural Network”. The acronym “SNN” should be specified in claim 1. 
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 4, 13 and 19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention. 

Claims 4, 13 and 19 recites a term “the embedding data”, which has insufficient antecedent basis. Antecedent limitations never mention any embedding data. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1, 11 and 17 are rejected under 35 U.S.C. 102 (a)(1) as being anticipated by Speth et al. (“Automated Label Noise Identification for Facial Attribute Recognition”, applicant submitted IDS, cited in a PCT search report as “X” category reference, published 2019, referred to as Speth).

Speth discloses identifying mislabeled samples in training data by using a Siamese neural network (Abstract, Section 1, Introduction). The mislabeled training samples corresponds to the claimed “Outlier data elements” according to the specification ([0020]). Speth (section 2, Related work) also mentions misclassified data are “outlier data”. 
	

Regarding claims 1, 11 and 17, Speth discloses a data processing system, a method and a memory device (Section 4, a computer implemented system by using a Siamese neural network model for identifying mislabeled samples in training data), comprising: 

a processor; and a computer-readable medium storing executable instructions for causing the processor to perform operations (Section 4, the evaluation is performed by the Siamese neural network which implies implemented by a computer with a processor and instructions for performing the operations) comprising: 

obtaining training data for a machine learning model from a memory of the data processing system (Section 3, Fig. 1, obtaining image training data with mislabeled data; section 4, CelebA dataset); 
analyzing the training data using a Siamese Neural Network to determine within-label similarities and cross-label similarities associated with a plurality of data elements within the training data, the within-label representing similarities between a respective data element and a first set of data elements similarly labeled in the training data, the cross-label similarities representing similarities between the respective data element and a second set of data elements dissimilarly labeled in the training data (Section 3, calculating similarities of a candidate sample with a set of multiple representative samples based on distances obtained from a Siamese neural network; The positive label corresponds to the claimed “within label” and negative label corresponds to the claimed “cross-label”; section 4.3, randomly flipping an original label means changing from a within label to a cross-label);
identifying outlier data elements in the plurality of data elements based on the within-label and cross-label similarities (Abstract, section 1, introduction, identifying mislabeled data based on distance metrics from Siamese neural network between samples of different classes); and 
processing the training data comprising the outlier data elements (Section 4, evaluation results of identifying misclassified training data; Minimizing human supervision when identifying label noise; Section 5, improving classification filtering methods).

	Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 2, 3, 12 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Speth in view of Smith et al. (“Improving Classification Accuracy by Identifying and Removing Instances that Should Be Misclassified”, 2011, referred to as Smith);

Regarding claims 2, 3, 12 and 18, Speth discloses identifying mislabeled samples using Siamese neural network. Speth mentioned one solution is to remove all misclassified samples (Speth, Section 2). Speth further discloses evaluation results of identifying misclassified samples (Speth, Section 4). Speth implicitly discloses generating a report or deleting at least a portion of the training data recited in these dependent claims. The examiner further cites Smith reference to explicitly teach identifying and removing misclassified data, i.e., outliers (Smith, Abstract, Section II, experimental methodology). 

It would have been obvious to a person having ordinary skill in the art at the time the invention was filed to combine Speth teaching with Smith’s teaching to generating a report or removing outlier data in the training data (Smith, section IV, experiment results). One having ordinary skill in the art would have been motivated to make such a modification to improve accuracy (Smith, Introduction).

Claims 4-8, 13-15 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Speth in view of Sridhar (US PG Pub. 2019/0179891) and further in view of Mikolov et al. (US Pat. 9,037,464);

Regarding claims 4-6, 13 and 19, these dependent claims are directed to analyzing historical textual data and converting the historical textual data into vector representations, which is well-known and commonly referred as “word embedding” in the relevant art. 

Speth uses image data as illustrative examples for identifying misclassified data. Speth does not disclose using textual data, therefore, Speth fails to discloses limitations recited in these dependent claims. 

Sridhar discloses determining topics by analyzing short messages (Sridhar, Abstract, [0050-0051]). Sridhar discloses converting short messages into numerical vector representations (Sridhar, [0032-0033], [0035-0036], [0054]). Sridhar mentioned word embedding technique such as Word2Vec is known ([0032], [0035]). Since word embedding technique such as Word2Vec is well known, Sridhar does not give more details about the word embedding technique. 
To further show details of the word embedding technique recited by some claim limitations, the examiner cites Mikolov, which discloses: “wherein generating the embedding data further comprises: generating a mapping of words, phrases, or both from the historical textual data to numerical vectors, each numerical vector representing a respective one of the words, phrases, or both, and each numerical vector comprising numerical values representing relationships between words, phrases, or both in the historical textual data.” (Mikolov, Fig. 3, Col. 5, lines 40-60, Col. 8, line 50-66).

It would have been obvious to a person having ordinary skill in the art at the time the invention was filed to modify Speth’s teaching with Sridhar and Mikolov’s teaching to applying the technique of identifying misclassified samples for text data by concerting text data into numerical representation using Word2Vec techniques. One having ordinary skill in the art would have been motivated to make such a modification to improve accuracy of topic detection from short message. 

Regarding claims 7-8 and 14-15, Speth discloses using pairwise similarity between a candidate sample with a representative exemplar of a set of multiple representative samples (Speth, section 3.2). Speth further discloses comparing candidate sample with positive and negative samples (Section 3.2). The examiner notes that positive samples are claimed “a first set of one or more other data elements”. The negative samples are claimed “a second set of one or more data elements associated with one or more labels that are different than the first label”. The comparing results with positive and negative samples are claimed “a first set of cross-determination data representing similarities between the first data element and each of the data elements of the first set of one or more data elements” and “a second set of cross-determination data representing similarities between the first data element and each of the data elements of the first set of one or more data elements”. In other words, Speth implicitly discloses the limitations recited in claims 7-8 and 14-15. MPEP (2144.01) states “[I]n considering the disclosure of a reference, it is proper to take into account not only specific teachings of the reference but also the inferences which one skilled in the art would reasonably be expected to draw therefrom.” In re Preda, 401 F.2d825, 826, 159 USPQ 342, 344 (CCPA 1968).

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Speth in view of Khurana et al. (“Hybrid BiLSTM-Siamese network for FAQ Assistance”, Applicant submitted IDS, published in 2017);

Speth discloses Siamese neural network (Fig. 1), but does not disclose “wherein the SNN comprises a Bidirectional Long Short-Term Memory (BiLSTM) recurrent neural network.”. Khurana discloses a hybrid BiLSTM Siamese network which meets the claimed “comprising BiLSTM recurrent neural network”. 

     It would have been obvious to one of ordinary still in the art at the time of the invention was filed to substitute Siamese CNN with Siamese BiLSTM as taught by Khurana since each individual element and its function are shown in the prior art and one of ordinary skill in the art could have substituted one known element for another by known methods. Speth does no more than “Simple substitution of one known element for another or the mere application of a known technique to a piece of prior art ready for the improvement.” KSR, 550 U.S. ___, 82 USPQ2d at 1395 (2007). One of ordinary skill in the art would have recognized that the results of the simple substitution were predictable.  

Allowable Subject Matter
Claims 9, 16 and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The examiner discovered several relevant prior art references that are related to one or more concepts disclosed by the instant application. These references are included in the attached PTO-892 form for completeness of the record.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jialong He, whose telephone number is (571) 270-5359.  The examiner can normally be reached on Monday – Friday, 8:00AM – 4:30PM, EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Pierre Desir can be reached on (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JIALONG HE/Primary Examiner, Art Unit 2659