DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
	As per Claim 11, the last paragraph is interpreted as “wherein the processor is further configured to… determine the second loss function… by: acquiring…” (not where the last paragraph is part of “determin[ing] the first loss function of the third prediction result of the student model relative to the first prediction result of the first teacher model by:”)
Claim Objections
	Claims 2 and 9 are not formally objected to, but Claim 2 recites “greater than” in line 3 of claim 2 while Claim 9 recites “more than” in line 3 of claim 9.  Applicant may, at applicant’s discretion, amend “greater than” to “more than” (or vice versa) for language consistency (but this is not necessarily since the phrases are synonymous).
	Claims 3 and 10 similarly recite “of” and “associated with” (see first 3 lines of claims 3 and 10) which may be amended for consistency if Applicant chooses to do so (Amending “associated with” to “of” seems preferable since claim 8 recites “of”)
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

As per Claim 1 (and similarly claims 8 and 15):
“the joint optimization target” in the last line of claim 1 is ambiguous.  As claimed, “determining a joint optimization target…” in the 3rd to last line of claim 1 is part of a “for-loop” that is performed “for each of the N training samples”, and so each of the multiple training samples has a respective “joint optimization target”.  “training the student model based on the joint optimization target” is not part of the “for-loop” because it follows a semicolon (whereas the “for-loop” steps are separated by commas).  Therefore, it is not clear which sample’s “joint optimization target” is the one that “the joint optimization target” in the last line of claim 1 is supposed to refer to.

As per Claim 4 (and similarly claims 11 and 18):
“each target word in the nth target language sequence” in lines 4-5 of claim 4 lacks antecedent basis.  No part of claims 1-3 establishes that an nth target language sequence contains target word(s).
“each target word in the nth target language sequence” (in line 6 of claim 4, and in lines 11-12 of claim 4) lacks antecedent basis (same issue as discussed in the previous paragraph).
	“each target word in the inverse sequence of the nth target language sequence” in lines 13-14 of claim 4 lacks antecedent basis.  No part of claims 1-3 and no earlier/preceding/previous part of claim 4 establishes that an inverse sequence of an nth target language sequence contains target word(s).

	As per Claim 5 (and similarly claims 12 and 19):
	“the nth training sample” in line 1 of claim 5 lacks antecedent basis.  Claim 5 depends on claim 1 and “an nth training sample” is first recited in claim 2.  This issue also makes it unclear if claim 5 is supposed to depend on claim 1 or on claim 2.  
Applicant fairly clearly meant to further define the “determining a sub optimization target corresponding to the training sample… according to the first guide component and the second guide component” part of the “for-loop” of claim 1 for one of the N training samples, but as claimed “the nth training sample” as a phrase lacks antecedent basis.
	“the nth training sample” in lines 6-7 of claim 1 lacks antecedent basis (due to “the nth training sample” in line 1 of claim 5 lacking antecedent basis).

	As per Claim 7 (and similarly claim 14):
	“the nth training sample” in line 2 of claim 7 lacks antecedent basis (due to “the nth training sample” in line 1 of claim 5 lacking antecedent basis).
“the nth training sample” in line 4 of claim 7 lacks antecedent basis (due to “the nth training sample” in line 1 of claim 5 lacking antecedent basis).
“the nth training sample” in lines 4-5 of claim 7 lacks antecedent basis (due to “the nth training sample” in line 1 of claim 5 lacking antecedent basis).
	“the training sample” in line 5 of claim 7 is unclear.  While Claims 5 and 7 further define a “for-loop” step (which means “the training sample” could be interpreted as a respective training sample of one iteration of the “for-loop”), it is not clear if Applicant meant for “the training sample” to refer particularly to the “nth training sample” or to a respective training sample (which varies depending on which iteration of the “for-loop” is being performed).

	The dependent claims include the issues of their respective parent claims.
Allowable Subject Matter
Claims 1, 8, and 15, would be allowable if rewritten or amended to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action.
Claims 2-7, 9-14, and 16-20, would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  
	As per Claim(s) 1 (and similarly claim[s] 8 and 15, and consequently claim[s] 2-7, 9-14, and 16-20, which depend on claim[s] 1, 8, and 15), the prior art of record does not teach or suggest the combination of all limitations in claim(s) 1, including (i.e. in combination with the remaining limitations in claim[s] 1) A method for compressing a neural network model for machine translation, comprising: obtaining a first trained teacher model and a second trained teacher model based on N training samples, N being a positive integer greater than 1; for each of the N training samples, determining a first guide component of the first teacher model and a second guide component of the second teacher model respectively, determining a sub optimization target corresponding to the training sample and configured to optimize a student model according to the first guide component and the second guide component, and determining a joint optimization target based on each of the N training samples and a sub optimization target corresponding to the training sample; and training the student model based on the joint optimization target (where the student model is trained based on a joint optimization target based on every one of the training samples that were used to obtain the two teacher models, and based on a training sample’s sub optimization target, and where a sub optimization target is determined for every one of the training samples, where the sub optimization target is configured to optimize a student model according to a determined first guide component of the first teacher model and a determined second guide component of the second teacher model)
2019/0325308 (JP search report reference) teaches where teacher machine learning models process training examples (paragraph 44; Figure 1), where a student model processes the same training examples (paragraph 44) and where the output of the student model may be compared to a generated teacher machine learning output, and updating student machine learning model parameters that minimizes the difference between the student model output and the teacher output (paragraph 45) and where teacher models are trained using soft target outputs (paragraph 45) and where a student model can perform machine translation (paragraph 47).  This reference does not appear to describe where the teacher models are obtained/generated based on the same samples for which the 2nd to last limitation of claim 1 is performed.
As per Shan You et al. ("Learning from Multiple Teacher Networks", cited in IDS, Search Report reference), it is not clear how the whole objective of the student network (cited as the joint optimization target) is determined based on each of the N training samples (which the search report appears to map to 50,000 training images [even assuming the training images contribute in some way to the joint optimization target such that the joint optimization target is indirectly based on each of the N training samples, it is not clearly the case that the intermediate layer weights are based on each of the training samples because it does not appear to be inherent that inputting an example into the teacher networks would generate a new set of intermediate layer weights]).  The joint optimization target also does not appear to be a single entity based on every training sample (as opposed to where each training sample leads the student model to have a respective update).  It is also not clear how guide components are determined for each of the N training samples (it does not appear to be inherent that inputting an example into the teacher networks would generate a new set of intermediate layer weights).  It is also not clear how the relative dissimilarity is determined for each of the N training samples.
2020/0272940 teaches “Chart 700f shows the impact of the number of teachers on the student module performance. The performance of the student is also shown with different teacher modules. In some examples, when a total of 40,000 sensitive data samples are used, the sensitive data samples may be split to train the teacher modules based on the number of teachers, such as one teacher with 40,000 samples, two teachers trained with 20,000 samples for each, and four teachers trained with 10,000 samples. The performance of the teacher module may increase with more training samples, but the student module performance may improve when trained with more teacher modules” (paragraph 92).  This reference does not appear to describe the 2nd to last limitation of claim 1.
Kang et al. (US 2017/0083829, cited in prior art rejection for Application 16/731,321) describes multiple teacher models and a student model and reducing loss between outputs
J. Xu, J. Hou, Y. Song, W. Guo and L. Dai, "Knowledge Distillation from Multilingual and Monolingual Teachers for End-to-End Multilingual Speech Recognition," 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2019, pp. 844-849, doi: 10.1109/APSIPAASC47483.2019.9023203. teaches knowledge distillation which transfers knowledge of a high-capacity teacher with desired high performance to a more compact student (See Section I. Introduction).
X. Liang, L. Wu, J. Li, T. Qin, M. Zhang and T. -Y. Liu, "Multi-Teacher Distillation With Single Model for Neural Machine Translation," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 992-1002, 2022, doi: 10.1109/TASLP.2022.3153264. This reference does not qualify as prior art.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC YEN whose telephone number is (571)272-4249. The examiner can normally be reached M-F 12:00PM -8:30PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, RICHEMOND DORVIL can be reached on (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





EY 7/12/2022
/ERIC YEN/Primary Examiner, Art Unit 2658