DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
• The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
• This action is responsive to the following communication: US Patent Application filed on 6/3/2020.
• Claims 1-20 are currently pending.

Information Disclosure Statement
• The information disclosure statement (IDS) submitted on 4/6/202 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Wang et al (NPL NICT’s Corpus Filtering System for the WMT18 Parallel Corpus Filtering Task, 2018).
	Regarding claim 1, Wang discloses a bilingual corpora (German-English corpus, see abstract} screening computer implemented method, necessarily the computer system comprising code in a non-transitory computer readable storage medium, which  when executed by a 
---acquiring multiple pairs of bilingual corpora, wherein each pair of the
bilingual corpora comprises a source (German) corpus and a target (English)
(see abstract and section 3.2);
---training a machine translation model based on the multiple pairs of bilingual
corpora (See section 3.2: "We scored each of the remaining sentence pairs with four NMT
transformer models, trained with Marian (Junczys-Dowmunt et al., 2018), on all the parallel data provided for the shared news translation task (excluding the “paracrawl’ corpus). We trained left-to-right and right-to-left models for German-to-English and English-to-German translation
directions...”);
---obtaining a first feature of each pair of bilingual corpora based on the trained
machine translation model (see section 3.2: "... We used these four model scores as features in our classifier’):
- training a language model based on the multiple pairs of bilingual corpora (see section 3.2: “we trained monolingual word embedding with FastText (Bolanowski et al., 2017) on the monolingual English and German data provided by the WMT organizers"):
---obtaining feature vectors of each pair of bilingual corpora (see section 3.2: "Given the  bilingual word embedding, we computed embedding for the source and target sentence by doing the element-wise addition of the bilingual embedding of the words they contain’) and determining a second feature of each pair of bilingual corpora based on the trained language model (see section 3.2:“Finally, we computed the cosine similarity between the embedding of source and target sentence for each sentence pair, and used it as a feature"): 
---determining a quality value of each pair of bilingual corpora according to the first feature and the second feature of each pair of bilingual corpora (see section 3.3) "We used the probability returned by the classifier for each sentence pair as the score ..."}; and 
---screening each pair of bilingual corpora according to the quality value of
each pair of bilingual corpora (see section 3.3: score to be used to perform filtering.").

	Regarding claim 2, Wang further discloses the method according to claim 1, wherein the machine translation model comprises a first translation model and a second translation model, 
 
	Regarding claim 3, Wang further discloses the method according to claim 1, wherein the language model comprises a first language model and a second language model, and each of the feature vectors comprises a first feature vector and a second feature vector;  and obtaining the feature vectors of each pair of bilingual corpora and determining the second feature of each pair of bilingual corpora based on the trained language model, comprises: for each pair of the bilingual corpora, obtaining the first feature vector corresponding to the source corpus by inputting the source corpus in the pair of bilingual corpora to a trained first language model, obtaining the second feature vector corresponding to the target corpus by inputting the target corpus in the pair of bilingual corpora to a trained second language model, and determining a semantic similarity between the source corpus and the target corpus in the pair of bilingual corpora as the second feature of the pair of bilingual corpora based on the first feature vector and the second feature vector (section 3.2, “"we trained monolingual word embeddings with FastText (Bojanowski et al., 2017) on the monolingual English and German data provided by the WMT organizers. Then, we  aligned English and German monolingual word embedding spaces in a bilingual space using the unsupervised method proposed by Artetxe ef al. (2018). Given the 
contain. Finally, we computed the consine similarity between embeddings of source and target sentence for each sentence pair, and used it as a feature”. 

	Regarding claim 4, Wang further discloses the method according to claim 3, wherein the first language model comprises a first encoder obtained by training the source corpora in each pair of bilingual corpora, and the second language model comprises a second encoder obtained by training the target corpora in each pair of bilingual corpora, wherein each of the first encoder and the second encoder is any one of an autoencoder, a variational autoencoder, and a noise-reduction autoencoder (section 3.2-3.4). 
 
	Regarding claim 5, Wang further discloses the method according to claim 4, wherein a model parameter with which the first encoder (section 4.1) encodes the source corpora is the same as a model parameter with which the second encoder encodes the target corpora. 
 
	Regarding claim 6, Wang further discloses the method according to claim 3, wherein the semantic similarity is one of a Manhattan distance, a Euclidean distance, or a cosine similarity (cosine similarity, section 3.2). 
 
	Regarding claim 7, Wang further discloses the method according to claim 1, wherein determining the quality value of each pair of bilingual corpora according to the first feature and the second feature of each pair of bilingual corpora comprises: performing a weighted calculations (table 3, page 4) on the first feature and the second feature of each pair of bilingual corpora to obtain the quality value of each pair of bilingual corpora. 
 
	Regarding claim 8, Wang further discloses the method according to claim 1, wherein screening each pair of bilingual corpora according to the quality value of each pair of bilingual corpora comprises: ranking each pair of bilingual corpora according to the quality value of each pair of bilingual corpora;  and screening each pair of bilingual corpora according to a ranking result (see table 3, section 4.4 official results for rankings).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THIERRY L PHAM whose telephone number is (571)272-7439. The examiner can normally be reached M-F, 11-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benny Tieu can be reached on (571)272-7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.