Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Detailed Action
The office action is in response to communication filed on 04/05/2019. Claims 1-26 are presented for examination and are pending. 
Oath/Declaration
For the record, the examiner acknowledges that the Oath/Declaration submitted on 04/05/2019 has been received. 
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 04/05/2019 and 08/19/2019 have been considered. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, an initialed and dated copy of the Applicant’s IDS forms 1449 filed 04/05/2019 and 08/19/2019 are attached to the instant Office action.
Claim Objections
Claim 6 objected to because of the following informalities:   
In claim 6, line 5, “the removal ration” should read “the removal ratio”
	 Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION. — The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 8, 9, and 11 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The term "larger" in claim 8  is a relative term which renders the claim indefinite.  The term "larger" is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. For the purpose of examination, “larger” will be interpreted based on the plain meaning of the claim terms. In claim 9, the term “same size” is not defined by the claim or the specification. It is unclear what size the applicant is referring to. For the purpose of examination the term “same size” is interpreted based on the plain meaning of the claim terms. The term “cleaner” in claim 11 is a relative term which renders the claim indefinite. The term “cleaner” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. For the purpose of examination the term “cleaner” is interpreted as more relevant. 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention 

Claims 1 – 7, 9  – 20, and 22 - 26  is/are rejected under 35 U.S.C. 103 as being unpatentable over “Dynamic Data Selection for Neural Machine Translation” to Wees et al. (hereinafter “Wees”), in view of “Efficient Data Selection For Machine Translation” to Mandal et al. (hereinafter “Mandal”).
	As per claim 1, Wees teaches A method comprising: generating, by data processing hardware, a base model by training with a first dataset of data pairs (Wees Pg. 2, Sec. 2; “Formally, given an in-domain corpus I, we first create language models from the source side f of I (LM i, f) and the target side e of I (LM i, e).” Examiner Note: the source and target sides of corpus I are the first data set of language pairs);
	determining, by the data processing hardware, a contrastive score for each data pair of a third dataset of data pairs using the base model and the adapted model, the contrastive score indicative of a probability of quality of the respective data pair (Wees Pg. 2, Sec. 2; “…we compute for each sentence pair s the bilingual cross-entropy difference…” Examiner Note: Cross-entropy difference is interpreted as the contrastive score. “Next, the top n sentence pairs …are selected and used for training an MT system.” Examiner Note: MT is a machine-translation system. These top pairs are interpreted as the pairs for the third dataset.);
	and training, by the data processing hardware, a target model using the data pairs of the third dataset and the contrastive scores (Wees Pg. 2, Sec. 2; “Next, the top n sentence pairs with the highest rank—thus lowest cross-entropy—are selected and used for training an MT system.” Examiner Note: the machine translation system is interpreted as the target model and the highest ranked pairs are the third data set.) 
	Wees fails to explicitly teach generating, by the data processing hardware, an adapted model by training the base model on a second dataset of data pairs;
However Mandal teaches generating, by the data processing hardware, an adapted model by training the base model on a second dataset of data pairs (Mandal Pg. 262, Sec. 4; “This approach required the prior availability of two language models: Li, which is trained on source sentences in corpora Di, and L i + u, which is trained on source sentences in both Di and Du.” Examiner Note: examiner understands that the second model, L i + u, is an adapted model that uses a second training set on the original base model.)
	Wees and Mandal are analogous because both are directed to machine translation models. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Mandal’s system of generating an adapted model by training a base model into Wees’s machine translation system because Mandal’s system allows translation models trained using only one-fifth of the additional training corpus to achieve the same or better performance as those trained using all available additional parallel data (Mandal Sec. 1, Pg. 261).
	As per claim 14, Wees teaches a system comprising: data processing hardware; and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising…(Examiner note: Wees teaches a machine translation system using a neural network. It is integral that the system has a processor in communication with the instructions stored in memory in order to execute the machine translation neural network model.) 
	The remaining claim limitations of claim 14 are analogous to the claim limitations of claim 1 and are therefore rejected with the same rationale applied against claim 1. 
	As per claim 2, the combination of Wees and Mandal teaches the method of claim 1. Mandal further teaches wherein training the target model further comprises using data pairs of the third dataset satisfying a threshold contrastive score (Mandal Pg. 262, Sec. 4; “Using a pre-determined threshold on perplexity ratio α of sentences, candidates for careful translation can be selected from Du and subsequently used as additional parallel training data.”). 
	Wees and Mandal are analogous because both are directed to machine translation models. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Mandal’s system of generating an adapted model by training a base model into Wees’s machine translation system because Mandal’s system allows translation models trained using only one-fifth of the additional training corpus to achieve the same or better performance as those trained using all available additional parallel data (Mandal Sec. 1, Pg. 261).

	As per claim 15, the claim is a system claim analogous to claim 2 and is therefore rejected with the same rationale applied against claim 2.
	As per claim 3, the combination of Wees and Mandal teaches The method of claim 1. Wees further teaches further comprising sorting, by the data processing hardware, the data pairs of the third dataset based on the respective contrastive scores (Wees Pg. 2, Sec. 2; “Finally, we rank all sentence pairs s ∈ G according to their CEDs, and then select only the top n sentence pairs with the lowest CED” Examiner Note: Ranking the sentence pairs is interpreted as sorting based on the top scores.)
	As per claim 16, the claim is a system claim analogous to claim 3 and is therefore rejected with the same rationale applied against claim 3.
	As per claim 4, the combination of Wees and Mandal teaches the method of claim 3. Wees further teaches wherein training the target model further comprises: generating a plurality of data batches, wherein each data batch comprises at least one data pair and wherein a probability that a select data pair is included in a select data batch is based on the respective contrastive score of the select data pair and wherein the probability increases as the respective contrastive score increases; and training the target model using each data batch (Wees Pg. 3, Sec. 3; “While all selection weights are very close to zero, higher ranked sentences have a noticeably higher probability of being selected than lower-ranked sentences; in practice we find that top-ranked sentences get selected in nearly each epoch, while bottom-ranked sentence pairs get selected at most once.” Examiner note: each epoch is interpreted as the plurality of data batches.)  
	As per claim 17, the claim is a system claim analogous to claim 4 and is therefore rejected with the same rationale applied against claim 4.
	As per claim 5, the combination of Wees and Mandal teaches The method of claim 4. Wees further teaches wherein generating the plurality of data batches comprises: determining a selection ratio for each data batch (Wees Pg. 4 Sec. 3; “For example, if we start with the complete bitext (α = 1), select the top 60% (β = 0.6) every second epoch (η = 2), then we run epochs 1 and 2 with a subset of size |G|, epochs 3 and 4 with a subset of size 0.6 · |G|, epochs 5 and 6 with a subset of size 0.36 · |G|, and so on. For every size n, the actual selection contains the top n sentences pairs of G.”);
	determining a batch size for each data batch, wherein the batch size is based on the selection ratio and a number of data pairs in the third dataset (Wees Pg. 4 Sec. 3; “…the selection size n is a function of epoch i...” see eq. 5)
	selecting a number of data pairs from the third dataset that corresponds with the determined batch size (Wees Pg. 4 Sec. 3; “For every size n, the actual selection contains the top n sentences pairs of G.”)
sorting the selected data pairs based on the respective contrastive scores (Wees Pg. 2 sec. 3;
“Finally, we rank all sentence pairs s ∈ G according to their CEDs, and then select only the top n sentence pairs with the lowest CED”);
	and removing, from the data batch, a removal ratio of the selected data pairs with lowest contrastive scores (Wees Pg. 4 Sec. 3 “0  ≤ β  ≤ 1 is the retention rate, i.e., the fraction of data to be kept in each new selection…For every size n, the actual selection contains the top n sentences pairs of G.”),
	the removal ratio comprising an inverse of the selection ratio (Examiner note: It’s implicit that the removal ratio is the inverse of the selection ratio).
	As per claim 18, the claim is a system claim analogous to claim 5 and is therefore rejected with the same rationale applied against claim 5.
	As per claim 6, the combination of Wees and Mandal teach the method of claim 5. Wees further teaches wherein the selection ratio decreases over training time (Examiner note: Wees Figure 1b shows the selection ratio decreases after each epoch.)
	
	As per claim 19, the claim is a system claim analogous to claim 6 and is therefore rejected with the same rationale applied against claim 6.
	As per claim 7, the combination of Wees and Mandal teach the method of claim 6. Wees further teaches wherein the batch size is equal to a fixed batch size divided by the selection ratio (Wees Pg. 3 sec. 3 “we gradually decrease the training data size, starting from G and keeping only the top n sentence pairs for the duration of n epochs, where the top n pairs are defined by their CEDs scores.”
Examiner note: one way to decrease the data size is to multiply (inverse of divide) the batch size by the selection ratio as shown in fig. 1b.)
	As per claim 20, the claim is a system claim analogous to claim 7 and is therefore rejected with the same rationale applied against claim 7.
	As per claim 9, the combination of Wees and Mandal teach the method of claim 1. Wees further teaches further comprising: determining, by the data processing hardware, that the target model is a same size as the base model; and when the target model is the same size as the base model (Wees Pg. 1 sec. 1; “The proposed method requires no modifications to the NMT architecture or parameters, and substantially speeds up training times while simultaneously improving translation quality with respect to a complete-bitext baseline.” Examiner note: if no architectures are changed, it must be that all models are the same.)
	replacing, by the data processing hardware, the adapted model with the target model (Wees Pg. 2 Sec. 2 “Next, the top n sentence pairs …are selected and used for training an MT system.”
Examiner note: ultimately, the final machine translation (MT) model is replacing previous models.);
	determining, by the data processing hardware, the contrastive score for each data pair of a fourth dataset of data pairs using the base model and the replaced adapted model (Wees Pg. 2 Sec. 2; Next, we compute for each sentence pair s the bilingual cross-entropy difference…” Examiner Note: Cross-entropy difference is interpreted as the contrastive score. Pg. 2 sec. 1; “We compare the effects of a commonly used data selection approach on PBMT and NMT using four different test sets.”
Examiner note: the base model and adapted models are trained and evaluated on the aforementioned sets);
	and training, by the data processing hardware, a subsequent target model using the data pairs of the fourth dataset and the contrastive scores (Wees Pg. 2 Sec. 2; “Next, the top n sentence pairs with the highest rank—thus lowest cross-entropy—are selected and used for training an MT system.”).
	Wees fails to explicitly teach replacing, by the data processing hardware, the base model with the adapted model;
	However, Mandal teaches replacing, by the data processing hardware, the base model with the adapted model (Mandal p. 262 Sec. 4; “This approach required the prior availability of two language models: Li, which is trained on source sentences in corpora Di, and L i + u, which is trained on source sentences in both Di and Du.” Examiner note: By training the base model with an additional data set, the original base model is effectively being adapted and replaced with the update.);

	As per claim 22, the claim is a system claim analogous to claim 9 and is therefore rejected with the same rationale applied against claim 9.
	As per claim 10, the combination of Wees and Mandal teach the method of claim 1. Wees further teaches wherein the first dataset comprises random data (Wees Sec. 4.2; “We evaluate all experiments on four domains: (i) EMEA medical guidelines, (ii) movie dialogues  constructed from Open Subtitles, (iii) TED talks, and (iv) WMT news.” Examiner note: these four separate data sets are interpreted as random data sets.)
	As per claim 23, the claim is a system claim analogous to claim 10 and is therefore rejected with the same rationale applied against claim 10.
	As per claim 11, the combination of Wees and Mandal teach the method of claim 10. Wees further teaches wherein the second dataset comprises data that is cleaner than the random data of the first dataset (Figure 1b shows the data getting more relevant with each epoch. Examiner note: Examiner interprets more relevant as cleaner.).
	As per claim 24, the claim is a system claim analogous to claim 11 and is therefore rejected with the same rationale applied against claim 11.
	As per claim 12, the combination of Wees and Mandal teach the method of claim 1. Wees further teaches wherein the contrastive score comprises a Kullback- Leibler (KL) divergence (Wees Pg. 3 Sec. 3; “dynamic data selection can also be applied using other ranking criteria.” Examiner note: KL divergence is a method known in the art and is included in “other ranking criteria.”)
	As per claim 25, the claim is a system claim analogous to claim 12 and is therefore rejected with the same rationale applied against claim 12.
	As per claim 13, the combination of Wees and Mandal teach the method of claim 1. Wees further teaches wherein each dataset comprises sentence language pairs. (Wees Sec. 2 pg. 2; “Finally, we rank all sentence pairs s 2 G according to their CEDs, and then select only the top n sentence pairs with the lowest CEDs.”)
	As per claim 26, the claim is a system claim analogous to claim 13 and is therefore rejected with the same rationale applied against claim 13.
Claims 8 and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wees in view of Mandal, further in view of Pub No. US 2016/0328644 A1 to Lin et al., (hereinafter, “Lin”). 
As per claim 8, the combination of Wees and Mandal teach the method of claim 1. 
	The combination of Wees and Mandal fails to explicitly teach wherein the target model is larger than the base model.
	However, Lin teaches wherein the target model is larger than the base model (Lin [0067]; “A new configuration may be determined by adopting a student network derived from the current artificial neural network. A network with a larger capacity (e.g., a teacher network) usually corresponds to greater accuracy. The knowledge acquired by the teacher may be leveraged for training a “student” network.” Examiner note: Examiner interprets Lin as disclosing that large and small networks are used for training neural networks.   )
	Wees, Mandal, and Lin are analogous because all are directed to neural networks. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed 
	As per claim 21, the claim is a system claim analogous to claim 8 and is therefore rejected with the same rationale applied against claim 8. 
	
	The prior art made of record not relied upon that is pertinent to the applicants disclosure is listed below.
Akiba et al. (NPL: “Using Language and Translation Models to Select the Best among Outputs from multiple MT Systems”) discloses a method of ranking and scoring translation quality.
	Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHMUEL Y. WEINFELD whose telephone number is (571)272-9893.  The examiner can normally be reached on Mon-Fri 08:00AM - 5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on 571-272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available 





/Shmuel Weinfeld/Examiner, Art Unit 2126                                                                                                                                                                                                        /ANN J LO/Supervisory Patent Examiner, Art Unit 2126