DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 5-7, 10-15,  and 18-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wang (US 10,332,517).
Regarding Claim 1, Wang et al discloses a computer-implemented method comprising: receiving audio data representing a natural language user input (a speech-detection device 110 may receive audio 11 including a spoken utterance of a user 5 via a microphone (or array of microphones) of the speech-detection device 110) (col. 3, lines 40-45); performing automatic speech recognition (ASR) processing on the audio data to determine ASR output data (A spoken utterance in the audio data 111 is input to a processor configured to perform ASR, which then interprets the spoken utterance based on a similarity between the spoken utterance and pre-established language models 254 stored in an ASR model knowledge base (i.e., an ASR model storage 252).) (col. 7, lines 8-18); processing the ASR output data to determine an entity type corresponding to a first portion of the natural language user input ) The device performing NLU processing may include a dedicated NLU component 260, which may include a named entity recognition (NER) component 262 and an intent classification (IC) component 264) (col. 8, lines 48-54); determining first data representing the entity type indicates the first portion can be used to identify a first user profile associated with the natural language user input (For example, if a spoken utterance is processed using the ASR component 250, which outputs the text data “call mom”, the NLU component 260 may determine the user intended to activate a telephone in his/her device and to initiate a call with a contact matching the entity “mom.”) (col. 9, lines 13-17); receiving first stored data representing at least a first word that can be used to identify the first user profile (The server(s) 120 identify (138) a user that spoke the utterance represented in the audio data) (col. 4, lines 14-18); processing the ASR output data to determine that a second portion of the natural language user input is semantically similar to the at least first word (Determining the user may involve comparing speech characteristics in the audio data to stored speech characteristics of users of the speech processing system 100) (col. 4, lines 14-18); determining second data representing the second portion can be used to identify the first user profile (For example, if the utterance includes “play my playlist privately” the server(s) 120 may determine and use the user's identity to resolve the words “my playlist” to appropriately identify the playlist and source of the playlist) (col. 4, lines 44-48); determining the natural language user input includes attributable data based at least in part on the first data and the second data, the attributable data representing the natural language user input can be attributed to the first user profile (e server(s) 120 performs (132) speech processing on the audio data and, based on the speech processing, determines (134) a user indication to activate a privacy mode. The indication may be in the form of user speech) (col. 3, lines 47-51); performing natural language understanding (NLU) processing on the ASR output data to determine NLU output data (The server(s) 120 determines (142) a user command based on the speech processing of the audio data) (col. 4, lines 39-44); using the NLU output data, determining output data responsive to the natural language user input (The server(s) 120 may then receive (148) content responsive to the user command) (col. 4, lines 59-60); sending the output data to a device (he server(s) 120 sends (150) the content (e.g., output audio data) to the speech-detection device 110 (or another device indicated in the profile of the user that spoke the utterance). The speech-detection device 110 (or other device) outputs audio corresponding to the content to the user 5) (col. 5, lines 5-10; and based on determining the natural language user input includes attributable data, deleting the audio data after sending the output data to the device (Deleting the data after speech processing is complete allows the server(s) 120 to determine a user command specific to the user's usage history, while preventing the present speech processing to not be added to the user's usage history) (col. 4, lines 52-58).
Regarding Claim 5, Wang discloses a computer-implemented method comprising: receiving input data representing a natural language user input (a speech-detection device 110 may receive audio 11 including a spoken utterance of a user 5 via a microphone (or array of microphones) of the speech-detection device 110) (col. 3, lines 40-45); determining first data representing a first portion of the input data represents user- identifiable data that can be used to identify a user profile associated with the natural language user input (The server(s) 120 identify (138) a user that spoke the utterance represented in the audio data) (col. 4, lines 14-18); determining second data representing the input data is potentially attributable to the user profile (he server(s) 120 performs (132) speech processing on the audio data and, based on the speech processing, determines (134) a user indication to activate a privacy mode. The indication may be in the form of user speech) (col. 3, lines 47-51); and processing the first data and the second data to determine an indicator corresponding to the input data, the indicator representing the input data includes attributable data (The server(s) 120 associates (140) the generated privacy mode flag with a unique ID associated with the determined user) (col. 4, lines 36-38).
Regarding Claim 6, Wang discloses the computer-implemented method, further comprising: receiving stored data representing at least a first word corresponding to user-identifiable data (Determining the user may involve comparing speech characteristics in the audio data to stored speech characteristics of users of the speech processing system 100.) (col. 4, lines 14-18); processing the first portion with respect to the stored data (The server(s) 120 may perform the aforementioned comparisons with respect to stored data associated with all the users of the speech processing system 100) (col. 4, lines 27-29; determining that the first portion is semantically similar to at least the first word (The indication may take the form of a trigger word(s) in the user command. For example, the audio data may include speech corresponding to “privately, tell me a joke,” “tell me a joke privately,” or the like. In the aforementioned example, “privately” is the privacy mode trigger word) (col. 5, line 61-col. 4, line 3); and determining the first data based on determining that the first portion is semantically similar to at least the first word (The indication may take the form of a trigger word(s) in the user command. For example, the audio data may include speech corresponding to “privately, tell me a joke,” “tell me a joke privately,” or the like. In the aforementioned example, “privately” is the privacy mode trigger word) (col. 5, line 61-col. 4, line 3).
Regarding Claim 7, Wang discloses the computer-implemented method, further comprising: determining an entity type represented a second portion of the input data (The device performing NLU processing may include a dedicated NLU component 260, which may include a named entity recognition (NER) component 262 and an intent classification (IC) component 264) (col. 8, lines 51-54); determining that the entity type corresponds to user-identifiable data (The NLU component 260 may also utilize gazetteer information 284 stored in an entity library storage 282. The knowledge base and/or gazetteer information 284 may be used for entity resolution, for example matching ASR results with different entities (e.g., song titles, contact names, etc.). Gazetteers 284 may be linked to users (e.g., a particular gazetteer may be associated with a specific user's music collection), may be linked to certain domains (e.g., shopping), or may be organized in a variety of other ways) (col. 8, line 56-col. 8, line 2); determining third data representing the second portion corresponds to user-identifiable data (For example, if the utterance includes “play my playlist privately” the server(s) 120 may determine and use the user's identity to resolve the words “my playlist” to appropriately identify the playlist and source of the playlist) (col. 4, lines 44-48); and processing the first data, the second data and the third data to determine the indicator(The server(s) 120 performs (132) speech processing on the audio data and, based on the speech processing, determines (134) a user indication to activate a privacy mode) (col. 3, lines 47-51).
Regarding Claim 10, Wang discloses the computer-implemented method, further comprising: receiving stored data representing at least a first word corresponding to non-identifiable data (the signal does not indicate the identity of the requesting user) (col. 4, lines 48-51); determining, using the stored data, that the first portion is semantically similar to at least the first word (Determining the user may involve comparing speech characteristics in the audio data to stored speech characteristics of users of the speech processing system 100) (col. 4, lines 14-18); and determining the second data based on determining that the first portion is semantically similar to at least the first word (The indication may take the form of a trigger word(s) in the user command. For example, the audio data may include speech corresponding to “privately, tell me a joke,” “tell me a joke privately,” or the like. In the aforementioned example, “privately” is the privacy mode trigger word) (col. 5, line 61-col. 4, line 3).
Regarding Claim 11, Wang discloses the computer-implemented method, further comprising: processing the input data using a trained model configured to identify user-identifiable data in the input data (The user recognition component 295 performs user recognition using various data including the audio data 111, training data 304 corresponding to sample audio data corresponding to known users, the ASR confidence data 302, and secondary data 306) (col. 12, lines 59-65); determining a second portion of the input data represents user-identifiable data (Determining the user may involve comparing speech characteristics in the audio data to stored speech characteristics of users of the speech processing system 100) (col. 4, lines 14-18); determining third data representing the second portion corresponds to user-identifiable data (For example, if the utterance includes “play my playlist privately” the server(s) 120 may determine and use the user's identity to resolve the words “my playlist” to appropriately identify the playlist and source of the playlist) (col. 4, lines 44-48); and determining the indicator based on the first data, second data and the third data (The server(s) 120 performs (132) speech processing on the audio data and, based on the speech processing, determines (134) a user indication to activate a privacy mode) (col. 3, lines 47-51).
Regarding Claim 12, Wang discloses the computer-implemented method, further comprising: processing the input data using natural language understanding (The server(s) 120 determines (142) a user command based on the speech processing of the audio data) (col. 4, lines 39-58); determining output data responsive to the input data (For example, if the utterance includes “play my playlist privately” the server(s) 120 may determine and use the user's identity to resolve the words “my playlist” to appropriately identify the playlist and source of the playlist) (col. 4, lines 39-58); sending the output data to a device ( The server(s) 120 sends (150) the content (e.g., output audio data) to the speech-detection device 110 (or another device indicated in the profile of the user that spoke the utterance)) (col. 4, line 59-col. 5, line 10); and based on the indicator corresponding to the input data, deleting the input data from a storage (The privacy mode flag may cause the server(s) 120 to delete (146), upon completing the speech processing, data used to perform the speech processing and determine the user command. Deleting the data after speech processing is complete allows the server(s) 120 to determine a user command specific to the user's usage history, while preventing the present speech processing to not be added to the user's usage history) (col. 4, lines 39-58) .
Allowable Subject Matter
Claims 2-4, 8 ,9, 16, and 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.  Specifically, the prior art fails to teach “determining that the first confidence score and the third confidence score satisfy a first condition; determining that the second confidence score and the fourth confidence score fail to satisfy the first condition; based on the first confidence score and the third confidence score satisfying the first condition, determining the first portion corresponds to user-identifiable data; and based on the second confidence score and the fourth confidence score failing to satisfy the first condition, determining the second portion corresponds to non-identifiable data” as recited in claim 2, “determining a number of times the first past natural language user input was received; determining the number of times fails to satisfy a threshold value; based on the number of times failing to satisfy the threshold value, determining the natural language user input is potentially attributable to the first user profile” as recited in claim 3, “determining a number of times the first past natural language user input was received; determining the number of times satisfies a threshold value; determining, using the second ASR output data, that a third portion of the second natural language user input is potentially attributable to a second user profile; determining an alternative representation of the third portion, the alternative representation representing non-attributable data; determining text data corresponding to the second ASR output data including the alternative representation instead of the third portion; performing NLU processing on the text data to determine second NLU output data; using the second NLU output data, determining second output data responsive to the second natural language user input; sending the second output data to a second device; and storing at least one of the ASR output data or the NLU output data based at least in part on: the number of times satisfying the threshold value, and the alternative representation representing non-attributable data” as recited in claim 4, “determining a ranked list based on the first value and the second value; and determining the indicator based on the ranked list” as recited in claim 8, “determining a number of times the first past natural language user input is received; and determining the second data based on the number of times” as recited in claim 9, “determine a ranked list based on the first value and the second value; and determine the indicator based on the ranked list” as recited in claim 16, and “determine a number of times the first past natural language user input is received; and determine the second data based on the number of times” as recited in claim 17.  
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/14/2021 was filed in compliance with the provisions of 37 CFR 1.97 and 1.98.  Accordingly, the information disclosure statement is being considered by the examiner.
Cited Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Bao (US 10,567,515) discloses speech processing performed with respect to first and second user profiled in a dialog session.
Su et al. (US 11,081,104) discloses contextual natural language processing.
Borja Jaramillo (US 11,127,395) discloses device-specific skill processing.
Yasa et al. (US 2020/0184959) discloses generating input alternatives.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SATWANT K SINGH whose telephone number is (571)272-7468. The examiner can normally be reached Monday thru Friday 8:30 AM to 5:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mohammad H. Ghayour can be reached on (571)272-3021. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SATWANT K SINGH/Primary Examiner, Art Unit 2672