Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION



	Response to Arguments
Applicant's arguments with respect to claims 1-20 have been considered but are moot in view of the new ground(s) of rejection. Applicant’s arguments are directed to the amended subject matter; new citations and explanation from existing prior art are provided below.

	


Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Phillips; Michael S. et al. US 20110054900 A1 (hereinafter Phillips) in view of KWON et al. US 20190206389 A1 (hereinafter KWON) and further in view of US 20110055309 A1 Gibor; David et al. (hereinafter Gibor).
1. A system, comprising:  2at least one computing device processor; and  3a memory device including instructions that, when executed by the at least one 4computing device processor, cause the system to:  5receive a request including audio data from a voice-enabled device, the audio data 6representative of an utterance captured by the device, the device associated with a user account;  (user profiles specific to user as well as general context models, user history, geolocation etc. 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)
7determine a string of phonemes present in the utterance;  (model associated with phonemes to perfom ASR 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)
8perform automatic speech recognition (ASR) on the string of phonemes to attempt 9to interpret the string of phonemes into one or more words, the automatic speech recognition 10associated with an index of words;  (model associated with phonemes to perfom ASR 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)
11detect an error in interpreting the string of phonemes, the error including a low- 12confidence score or ambiguity between multiple interpretations;  (disambiguation performed during uncertainty based on ASR confidence score 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)
15…the additional information associated with one or more known words from the index 17of words; (user selects response after system follows up with user for feedback after disambiguation performed during uncertainty based on ASR confidence score 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)
18update a user-specific speech recognition key for the user account to associate the 19string of phonemes with the one or more known words; and  (the system updates all user databases and models e.g. with a key i.e. user specific information, biometric data, accent, user actions/history, the way a user speaks, etc. following a user selecting a response after system follows up with user for feedback after disambiguation performed during uncertainty based on ASR confidence score 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)
20train a general speech recognition model using the association of the string of 21phonemes and the one or more known words.  (training as in the system updates all user databases and models e.g. with a key i.e. user specific information, biometric data, accent, user actions/history, the way a user speaks, etc. following a user selecting a response after system follows up with user for feedback after disambiguation performed during uncertainty based on ASR confidence score 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)

However, Phillips teaches phonemes described previously, but fails to teach
receive, in response to the request for additional information, the additional 16information as a voice input…
in response to detecting the error, 13request additional information from the user via the voice-enabled device;   
KWON teaches a user error or misrecognition is received, and the system induces or prompts the user to speak a correct guide text of known words 0060, 0086, 0093 with transition from fig. 4b to improvement in fig. 5a and 5b.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the system of Phillips to incorporate the above claim limitations as taught by KWON to allow for improvement of known disambiguation with touchscreen/manual-input-only to allow for the transduction of speech into text for the purposes of selective correction, thereby analogously improving Phillips to include not only voice selection via ASR but also a suggestion that the user can utter hands-free and the models in Phillips are still improved since the user makes a correction or general user selection.

However while Phillips expressly teacheds training models based on user habits and voice context/keywords, the combination fails to teach
A request for one or more offerings offered in an e-commerce environment, the request including audio data
the index of words associated with the e- commerce environment, 
training a model… associated with the e-commerce environment
Gibor teaches speech voice recognition in the context of e-commerce website types (general machine learning, e-commerce using voice commands/requests, targeted ads/offers, keywords per context 0312 0389 0453 0677 0762 0763 0810 0811)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Phillips in view of KWON to incorporate the above claim limitations as taught by Gibor to allow for using the existing learning models of Phillips in the context of online/website/e-commerce shopping with ads/offers present, wherein the combination is improved to analogously provide general website browsing as in Phillips to now include website store/shopping browser functions, including model learning/updating using words or multiple/index of words spoken by a user, thereby not altering the process of Phillips but simply substituting (by addition thereof) a sub-type of browsers for e-commerce specifically.


1Re claim 2, Phillips teaches
2. The system of claim 1, wherein the instructions when executed further 2cause the system to:  3utilize the additional information to perform subsequent steps associated with the request from the voice-enabled device.  (user can select alternative terms 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)


1Re claim 3, Phillips while suggestive fails to teach
293. The system of claim 1, wherein the additional information is received in 2the form of second audio data captured by the voice-enabled device.  (KWON user correction or disambiguation selection is by voice 0060 0093 with fig. 5a and 5b)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the system of Phillips to incorporate the above claim limitations as taught by KWON to allow for obviation of the inherent usage of editing via speech inputs as in Phillips wherein KWON expressly teaches selection with voice thereby improving explicitly and analogously the selection of disambiguation results via voice without necessarily being in edit mode.


1Re claim 4, Phillips teaches
14. The system of claim 1, wherein the additional information is received in 2the form of user input entered into a graphical interface displayed on the display-based client 3device.  (user selects on a touch screen 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)


1Re claim 5, Phillips teaches
15. A computer-implemented method, comprising:  2receiving a request including audio data from a voice-enabled device, the audio 3data representative of an utterance captured by the device, the device associated with a user 4account;  
5determining a string of phonemes present in the utterance;  (user profiles specific to user as well as general context models, user history, geolocation etc. 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)
8associating the string of phonemes with the one or more known words; and  (training as in the system updates all user databases and models e.g. with a key i.e. user specific information, biometric data, accent, user actions/history, the way a user speaks, etc. following a user selecting a response after system follows up with user for feedback after disambiguation performed during uncertainty based on ASR confidence score 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)
9updating a user-specific speech recognition key with the association of the string 10of phonemes and the one or more known words.  (the system updates all user databases and models e.g. with a key i.e. user specific information, biometric data, accent, user actions/history, the way a user speaks, etc. following a user selecting a response after system follows up with user for feedback after disambiguation performed during uncertainty based on ASR confidence score 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)
However, Phillips teaches phonemes described previously, but fails to teach
detecting an error in forming words from the string of phonemes; 
receiving, in response to detecting the error, a subsequent user utterance 
KWON teaches a user error or misrecognition is received, and the system induces or prompts the user to speak a correct guide text of known words 0060, 0086, 0093 with transition from fig. 4b to improvement in fig. 5a and 5b.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the system of Phillips to incorporate the above claim limitations as taught by KWON to allow for improvement of known disambiguation with touchscreen/manual-input-only to allow for the transduction of speech into text for the purposes of selective correction, thereby analogously improving Phillips to include not only voice selection via ASR but also a suggestion that the user can utter hands-free and the models in Phillips are still improved since the user makes a correction or general user selection.

However, Phillips teaches phonemes described previously, but fails to teach
for one or more offerings offered in an e-commerce environment, the request
, the one or more known words associated with the e-commerce environment;
Gibor teaches speech voice recognition in the context of e-commerce website types (general machine learning, e-commerce using voice commands/requests, targeted ads/offers, keywords per context 0312 0389 0453 0677 0762 0763 0810 0811)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Phillips in view of KWON to incorporate the above claim limitations as taught by Gibor to allow for using the existing learning models of Phillips in the context of online/website/e-commerce shopping with ads/offers present, wherein the combination is improved to analogously provide general website browsing as in Phillips to now include website store/shopping browser functions, including model learning/updating using words or multiple/index of words spoken by a user, thereby not altering the process of Phillips but simply substituting (by addition thereof) a sub-type of browsers for e-commerce specifically.
1Re claim 6, Phillips teaches
16. The method of claim 5, further comprising:  2 3requesting additional information from the user via a client device.  (error invokes disambiguation and user selects candidate 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)

1Re claim 7, Phillips fails to teach
17. The method of claim 6, wherein requesting additional information is 2performed over voice though a voice-enabled client device.  (KWON user correction or disambiguation selection is by voice 0060 0093 with fig. 5a and 5b)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the system of Phillips to incorporate the above claim limitations as taught by KWON to allow for obviation of the inherent usage of editing via speech inputs as in Phillips wherein KWON expressly teaches selection with voice thereby improving explicitly and analogously the selection of disambiguation results via voice without necessarily being in edit mode.

1Re claim 8, Phillips teaches
18. The method of claim 6, wherein requesting additional information is 2performed via graphical interface on a display-based client device.  (user selects on a touch screen 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)

1Re claim 9, Phillips teaches
9. The method of claim 5, further comprising: 2 training a general speech recognition model using the association of the string of 3phonemes and the one or more known words, wherein the general speech recognition model is a 4statistical model.  (each model is statistical or probabilistic 0010 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)


1Re claim 10, Phillips teaches
110. The method of claim 9, wherein the general speech recognition model is 2referenced during speech recognition for a plurality of user accounts, and wherein the user- 3specific speech recognition key is reference during speech recognition for the specific user 4account.  (as in element 130 can be any number of user profiles, the system is specific to a user when certain actions are spoken and updates all user databases and models e.g. with a key i.e. user specific information, biometric data, accent, user actions/history, the way a user speaks, etc. following a user selecting a response after system follows up with user for feedback after disambiguation performed during uncertainty based on ASR confidence score 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)

1Re claim 11, Phillips teaches
111. The method of claim 9, wherein the user-specific speech recognition key 2is updated faster than the general speech recognition model.  (faster under BRI is simply using user actions themselves prior to updating a model or simply using a local model prior to general external model 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)


1Re claim 12, Phillips teaches
112. The method of claim 9, further comprising:  2determining demographic data associated with the user account; and  3training the general speech recognition model using an association of the string of 4phonemes and the one or more known words and the demographic data.  (demographics as in regional geography for user profile, the system updates all user databases and models e.g. with a key i.e. user specific information, biometric data, accent, user actions/history, the way a user speaks, etc. following a user selecting a response after system follows up with user for feedback after disambiguation performed during uncertainty based on ASR confidence score 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)

1Re claim 13, Phillips teaches
613. The method of claim 9, wherein the general speech recognition model is a 7part of at least one of an automatic speech recognition (ASR) model, a natural language 8understanding (NLU) model, or a named entity recognition (NER) model associated with an e- 9commerce platform.  (ASR model 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)


1Re claim 14, Phillips teaches
114. The method of claim 5, further comprising:  2receiving a second request including the string of phonemes;  3referencing the updated user-specific speech recognition key; and  4recognizing the string of phonemes as corresponding to the one or more known 5words.  (process starts over for new user request using updated user history and using phonemes… the system updates all user databases and models e.g. with a key i.e. user specific information, biometric data, accent, user actions/history, the way a user speaks, etc. following a user selecting a response after system follows up with user for feedback after disambiguation performed during uncertainty based on ASR confidence score 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)

1Re claim 15, Phillips teaches
15. The method of claim 9, further comprising:  312 receiving a second request associated with a second user account, the second 3request including a second utterance;  4determining that the second utterance includes the string of phonemes;  5referencing the trained general speech recognition model; and  6recognizing the string of phonemes as corresponding to the one or more known 7words.  (for a new user profile the process starts over and can use a general model when user history is not needed to save time, for new user request using updated user history and using phonemes… the system updates all user databases and models e.g. with a key i.e. user specific information, biometric data, accent, user actions/history, the way a user speaks, etc. following a user selecting a response after system follows up with user for feedback after disambiguation performed during uncertainty based on ASR confidence score 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)

1Re claim 16, Phillips teaches
116. A system, comprising:  2at least one computing device processor; and  3a memory device including instructions that, when executed by the at least one 4computing device processor, cause the system to:  5receive a request including audio data from a voice-enabled device, the audio data 6representative of an utterance captured by the device, the device associated with a user account;  
10associate the string of phonemes with the one or more known words; and  (training as in the system updates all user databases and models e.g. with a key i.e. user specific information, biometric data, accent, user actions/history, the way a user speaks, etc. following a user selecting a response after system follows up with user for feedback after disambiguation performed during uncertainty based on ASR confidence score 0047 0048 0070 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)
11update a user-specific speech recognition key with the association of the string of 12phonemes and the one or more known words.  (the system updates all user databases and models e.g. with a key i.e. user specific information, biometric data, accent, user actions/history, the way a user speaks, etc. following a user selecting a response after system follows up with user for feedback after disambiguation performed during uncertainty based on ASR confidence score 0047 0048 0070 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)
However, Phillips teaches phonemes described previously, but fails to teach
detecting an error in forming words from the string of phonemes; 
receive, in response to detecting the error, a subsequent user utterance 
KWON teaches a user error or misrecognition is received, and the system induces or prompts the user to speak a correct guide text of known words 0060, 0086, 0093 with transition from fig. 4b to improvement in fig. 5a and 5b.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the system of Phillips to incorporate the above claim limitations as taught by KWON to allow for improvement of known disambiguation with touchscreen/manual-input-only to allow for the transduction of speech into text for the purposes of selective correction, thereby analogously improving Phillips to include not only voice selection via ASR but also a suggestion that the user can utter hands-free and the models in Phillips are still improved since the user makes a correction or general user selection.

However, Phillips teaches phonemes described previously, but fails to teach
for one or more offerings offered in an e-commerce environment, the request
the one or more known words associated with the e-commerce environment;
Gibor teaches speech voice recognition in the context of e-commerce website types (general machine learning, e-commerce using voice commands/requests, targeted ads/offers, keywords per context 0312 0389 0453 0677 0762 0763 0810 0811)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Phillips in view of KWON to incorporate the above claim limitations as taught by Gibor to allow for using the existing learning models of Phillips in the context of online/website/e-commerce shopping with ads/offers present, wherein the combination is improved to analogously provide general website browsing as in Phillips to now include website store/shopping browser functions, including model learning/updating using words or multiple/index of words spoken by a user, thereby not altering the process of Phillips but simply substituting (by addition thereof) a sub-type of browsers for e-commerce specifically.

1Re claim 17, Phillips teaches
1317. The system of claim 16, wherein the instructions that, when executed by 14the at least one computing device processor, further cause the system to:  15training a general speech recognition model using the association of the string of 16phonemes and the one or more known words, wherein the general speech recognition model is a 17statistical model.  (as in element 130 can be any number of user profiles, the system is specific to a user when certain actions are spoken and updates all user databases and models e.g. with a key i.e. user specific information, biometric data, accent, user actions/history, the way a user speaks, etc. following a user selecting a response after system follows up with user for feedback after disambiguation performed during uncertainty based on ASR confidence score 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)

1Re claim 18, Phillips teaches
118. The system of claim 17, wherein the general speech recognition model is 2referenced during speech recognition for a plurality of user accounts, and wherein the user- 3specific speech recognition key is reference during speech recognition for the specific user 4 account.  (as in element 130 can be any number of user profiles, the system is specific to a user when certain actions are spoken and updates all user databases and models e.g. with a key i.e. user specific information, biometric data, accent, user actions/history, the way a user speaks, etc. following a user selecting a response after system follows up with user for feedback after disambiguation performed during uncertainty based on ASR confidence score 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)

1Re claim 19, Phillips teaches
32119. The system of claim 17, wherein the user-specific speech recognition key 2is updated faster than the general speech recognition model.  (faster under BRI is simply using user actions themselves prior to updating a model or simply using a local model prior to general external model 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)

1Re claim 20, Phillips teaches
320. The system of claim 17, wherein the general speech recognition model is a 4part of at least one of an automatic speech recognition (ASR) model, a natural language 5understanding (NLU) model, or a named entity recognition (NER) model associated with an e- 6 commerce platform. (ASR model 0047 0048 0070 0073 0078 0083 0089 0097 0117 0150 0165 with fig. 2, 7b, 7c, and 13)


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

Zhu; Jiedan et al.	US 20190324780 A1	
	Use network driven profiles for ASR.


Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 05/19/2022 has been entered.


Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL COLUCCI whose telephone number is (571)270-1847.  The examiner can normally be reached on M-F 9 AM - 7 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/MICHAEL COLUCCI/Primary Examiner, Art Unit 2655                                                                                                                                                                                               (571)-270-1847
Examiner FAX:  (571)-270-2847
Michael.Colucci@uspto.gov