Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


DETAILED ACTION

Response to Arguments
Applicant's arguments filed 04/07/2022 have been fully considered but they are not persuasive. On page 9-10 of the arguments Applicant argues that acoustic conditions may not be taught by the prior art.
Examiner does not concur, since the specification can not be read into the claims unreasonably. For instance and not limited to, under BRI, an acoustic condition can be in the users voice itself as recited in 049-0051 of Malik. See office action with updated analysis for such a limitation.
Applicant’s remarks are moot regarding claim 20 which has clarified the context of acoustic condition. The prior art is overcome and new prior art is cited.

	
	


	
Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20210117780 A1 Malik; Kshitiz et al. (hereinafter Malik).
Re claim 1, 11, and 16, Malik teaches
11. A computer program product including one or more computer-readable storage mediums collectively storing program instructions that are executable by a processor or programmable circuitry to cause the processor or programmable circuitry to perform operations comprising: 
obtaining training data including a plurality of utterances of a plurality of speakers in a plurality of acoustic conditions; (various permutations from a pool, such as from a user with an utterance with a specific acoustic voice property of how a user sounds, training models, vectorization transform, speaker identification by profile/user, condition by context e.g. what user says, permutation of nth model groups as nth number of tasks as in fig. 8, withholding user info or conditions as in simple NLP or user actions for updating/training, common attributes of a user, a feature vector is commonly referred to as an i-vector in the general sense… 0049-0051 claim 4, 0044 0133 0076 0055 0134 fig. 8 0117, see also 0047 beam forming and echo cancellation as implying conditions are extracted and compared in addition to the way a user voice sounds)
preparing at least one machine learning model, each machine learning model including a common embedding model for converting an utterance into a feature vector and a classification model for classifying the feature vector; and (MLM (machine learning model global and local) with an EM (local embedding model) and a CM (local classification model) as a whole, training models, vectorization transform, speaker identification by profile/user, condition by context e.g. what user says, permutation of nth model groups as nth number of tasks as in fig. 8, withholding user info or conditions as in simple NLP or user actions for updating/training, common attributes of a user, a feature vector is commonly referred to as an i-vector in the general sense… 0049-0051 claim 4, 0044 0133 0076 0055 0134 fig. 8 0117)
P201908747US01 (M2516)Page 34 of 38training, by using the training data, the at least one machine learning model to perform classification by speaker and to perform classification by acoustic condition. (speaker identification by profile/user, condition by context e.g. what user says… MLM (machine learning model global and local) with an EM (local embedding model) and a CM (local classification model) as a whole, training models, vectorization transform, speaker identification by profile/user, condition by context e.g. what user says, permutation of nth model groups as nth number of tasks as in fig. 8, withholding user info or conditions as in simple NLP or user actions for updating/training, common attributes of a user, a feature vector is commonly referred to as an i-vector in the general sense… 0049-0051 claim 4, 0044 0133 0076 0055 0134 fig. 8 0117.
it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Malik to incorporate embodiments such as in 0049-0051 to replace any type of data and also as in fig. 8 to provide permutations of groups allowing for global and local combination MLM, EM, and overall CM per user or context.

Re claim 2, Malik teaches
2. The computer-implemented method of claim 1, wherein the training includes training the at least one machine learning model until the embedding model is trained to produce a conversion function for relating utterances to feature vectors including characteristics of speakers and acoustic conditions.  (vectorization transform… MLM (machine learning model global and local) with an EM (local embedding model) and a CM (local classification model) as a whole, training models, vectorization transform, speaker identification by profile/user, condition by context e.g. what user says, permutation of nth model groups as nth number of tasks as in fig. 8, withholding user info or conditions as in simple NLP or user actions for updating/training, common attributes of a user, a feature vector is commonly referred to as an i-vector in the general sense… 0049-0051 claim 4, 0044 0133 0076 0055 0134 fig. 8 0117)

Re claim 3, 12, and 17, Malik teaches
3. The computer-implemented method of claim 1, further comprising producing a converter that converts an utterance into a feature vector by implementing the trained embedding model as a conversion function of the converter.  (vectorization transform… MLM (machine learning model global and local) with an EM (local embedding model) and a CM (local classification model) as a whole, training models, vectorization transform, speaker identification by profile/user, condition by context e.g. what user says, permutation of nth model groups as nth number of tasks as in fig. 8, withholding user info or conditions as in simple NLP or user actions for updating/training, common attributes of a user, a feature vector is commonly referred to as an i-vector in the general sense… 0049-0051 claim 4, 0044 0133 0076 0055 0134 fig. 8 0117)

Re claim 4, 13, and 18, Malik teaches
4. The computer-implemented method of claim 1, wherein the training includes performing, by using the training data, a first task to train a first machine learning model including the embedding model and a first classification model for classifying the feature vector by speaker; (MLM (machine learning model global and local) with an EM (local embedding model) and a CM (local classification model) as a whole, training models, vectorization transform, speaker identification by profile/user, condition by context e.g. what user says, permutation of nth model groups as nth number of tasks as in fig. 8, withholding user info or conditions as in simple NLP or user actions for updating/training, common attributes of a user, a feature vector is commonly referred to as an i-vector in the general sense… 0049-0051 claim 4, 0044 0133 0076 0055 0134 fig. 8 0117)
and performing, by using the training data, a second task to train a second machine learning model including the embedding model and a second classification model for classifying the feature vector by acoustic condition.  (permutation of nth model groups as nth number of tasks as in fig. 8… MLM (machine learning model global and local) with an EM (local embedding model) and a CM (local classification model) as a whole, training models, vectorization transform, speaker identification by profile/user, condition by context e.g. what user says, permutation of nth model groups as nth number of tasks as in fig. 8, withholding user info or conditions as in simple NLP or user actions for updating/training, common attributes of a user, a feature vector is commonly referred to as an i-vector in the general sense… 0049-0051 claim 4, 0044 0133 0076 0055 0134 fig. 8 0117)

Re claim 5, Malik teaches
5. The computer-implemented method of claim 4, wherein the performing the first task is for training the first machine learning model without using information of acoustic condition in the training data; and (withholding user info or conditions as in simple NLP or user actions for updating/training… MLM (machine learning model global and local) with an EM (local embedding model) and a CM (local classification model) as a whole, training models, vectorization transform, speaker identification by profile/user, condition by context e.g. what user says, permutation of nth model groups as nth number of tasks as in fig. 8, withholding user info or conditions as in simple NLP or user actions for updating/training, common attributes of a user, a feature vector is commonly referred to as an i-vector in the general sense… 0049-0051 claim 4, 0044 0133 0076 0055 0134 fig. 8 0117)
the performing the second task is for training the second machine learning model without using information of speaker in the training data.  (withholding user info or conditions as in simple NLP or user actions for updating/training… MLM (machine learning model global and local) with an EM (local embedding model) and a CM (local classification model) as a whole, training models, vectorization transform, speaker identification by profile/user, condition by context e.g. what user says, permutation of nth model groups as nth number of tasks as in fig. 8, withholding user info or conditions as in simple NLP or user actions for updating/training, common attributes of a user, a feature vector is commonly referred to as an i-vector in the general sense… 0049-0051 claim 4, 0044 0133 0076 0055 0134 fig. 8 0117)

Re claim 6, Malik teaches
6. The computer-implemented method of claim 4, wherein the first task and the second task are performed according to a multi-task training technique.  (varying permutation of nth model groups as nth number of tasks as in fig. 8… MLM (machine learning model global and local) with an EM (local embedding model) and a CM (local classification model) as a whole, training models, vectorization transform, speaker identification by profile/user, condition by context e.g. what user says, permutation of nth model groups as nth number of tasks as in fig. 8, withholding user info or conditions as in simple NLP or user actions for updating/training, common attributes of a user, a feature vector is commonly referred to as an i-vector in the general sense… 0049-0051 claim 4, 0044 0133 0076 0055 0134 fig. 8 0117)

Re claim 7, Malik teaches
7. The computer-implemented method of claim 4, wherein the obtaining the training data includes assigning a common identifier of an acoustic condition to each utterances of a common speaker obtained from a single continuous recording.  (common attributes of a user… MLM (machine learning model global and local) with an EM (local embedding model) and a CM (local classification model) as a whole, training models, vectorization transform, speaker identification by profile/user, condition by context e.g. what user says, permutation of nth model groups as nth number of tasks as in fig. 8, withholding user info or conditions as in simple NLP or user actions for updating/training, common attributes of a user, a feature vector is commonly referred to as an i-vector in the general sense… 0049-0051 claim 4, 0044 0133 0076 0055 0134 fig. 8 0117)

Re claim 8, 14, and 19, Malik teaches
8. The computer-implemented method of claim 5, wherein the training includes performing, by using the training data, a third task to train a third machine learning model P201908747US01 (M2516)Page 33 of 38including the embedding model and a third classification model for classifying the feature vector by speaker and acoustic condition.  (permutation of nth model groups as nth number of tasks as in fig. 8…MLM (machine learning model global and local) with an EM (local embedding model) and a CM (local classification model) as a whole, training models, vectorization transform, speaker identification by profile/user, condition by context e.g. what user says, permutation of nth model groups as nth number of tasks as in fig. 8, withholding user info or conditions as in simple NLP or user actions for updating/training, common attributes of a user, a feature vector is commonly referred to as an i-vector in the general sense… 0049-0051 claim 4, 0044 0133 0076 0055 0134 fig. 8 0117)

Re claim 9, 15, and 20, Malik teaches
9. The computer-implemented method of claim 8, wherein the obtaining the training data further includes generating, for each utterance, a training output data of the third classification model by extracting common features of utterances among the plurality of utterances having a common speaker and a common acoustic condition, and (common attributes of a user with permutation of nth model groups as nth number of tasks as in fig. 8…MLM (machine learning model global and local) with an EM (local embedding model) and a CM (local classification model) as a whole, training models, vectorization transform, speaker identification by profile/user, condition by context e.g. what user says, permutation of nth model groups as nth number of tasks as in fig. 8, withholding user info or conditions as in simple NLP or user actions for updating/training, common attributes of a user, a feature vector is commonly referred to as an i-vector in the general sense… 0049-0051 claim 4, 0044 0133 0076 0055 0134 fig. 8 0117)
wherein the performing the third task includes training the third machine learning model to output the training output data representing a speaker and an acoustic condition from an utterance of a speaker and an acoustic condition.  (common attributes of a user with permutation of nth model groups as nth number of tasks as in fig. 8…MLM (machine learning model global and local) with an EM (local embedding model) and a CM (local classification model) as a whole, training models, vectorization transform, speaker identification by profile/user, condition by context e.g. what user says, permutation of nth model groups as nth number of tasks as in fig. 8, withholding user info or conditions as in simple NLP or user actions for updating/training, common attributes of a user, a feature vector is commonly referred to as an i-vector in the general sense… 0049-0051 claim 4, 0044 0133 0076 0055 0134 fig. 8 0117)

Re claim 10, Malik teaches
10. The computer-implemented method of claim 9, wherein the generating includes generating an i-vector as the training output data.  (a feature vector is commonly referred to as an i-vector in the general sense… MLM (machine learning model global and local) with an EM (local embedding model) and a CM (local classification model) as a whole, training models, vectorization transform, speaker identification by profile/user, condition by context e.g. what user says, permutation of nth model groups as nth number of tasks as in fig. 8, withholding user info or conditions as in simple NLP or user actions for updating/training, common attributes of a user, a feature vector is commonly referred to as an i-vector in the general sense… 0049-0051 claim 4, 0044 0133 0076 0055 0134 fig. 8 0117)


1.	Claims 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20210117780 A1 Malik; Kshitiz et al. (hereinafter Malik) in view of US 10959018 B1	Shi; Guangji et al. (hereinafter Shi).
Re claim 21, Malik implies echo cancellation and beam forming but fails to teach acoustic conditions as precisely recited:
21. (New) The computer-implemented method of claim 1, wherein each of the plurality of acoustic conditions corresponds to a transfer function that is affected by a factor selected from the group consisting of acoustic characteristics of a room, a distance between a speaker and a microphone, characteristics of a microphone, and characteristics of a recording device. (Shi transfer function altered in model based on acoustic features of a room, distances with speaker and mic, device settings which includes the microphone inherently, claim 2, col 1 line 65 to col 2 line 14, col 3 lines 27, col 3 lines 45-62, col 4 lines 35-46)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Malik to incorporate the above claim limitations as taught by Shi to allow for improvements of the general echo cancellation and beam forming in Malik to provide impulse or transfer driven models for dynamically changing microphone and speaker locations as well as object alteration in a room (i.e. acoustic property alteration), thereby creating an adaptable model using beam forming techniques and room echo/acoustic features for improved input of user commands, noise reduction, and faster acquisition of rich content over noise, thus obviating the suggestions of beam forming and echo cancellation in Malik.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

US 20210224862 A1	Taifi; Moussa et al.
EM with LM and CM

US 20210256076 A1	McMurray; Steven Michael et al.
LM and EM

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL COLUCCI whose telephone number is (571)270-1847.  The examiner can normally be reached on M-F 9 AM - 7 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/MICHAEL COLUCCI/Primary Examiner, Art Unit 2655                                                                                                                                                                                               (571)-270-1847
Examiner FAX:  (571)-270-2847
Michael.Colucci@uspto.gov