DETAILED ACTION
Introduction
Applicant’s arguments filed in the reply on 8/13/2021 were received and fully considered. Claims 1- 15 were amended. The current office is FINAL. Please see corresponding rejection headings and response to arguments section below for more detail.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. IN20184109106, filed on 5/22/2018.
Response to Arguments
With respect to the rejection of claims 1-3, 8-10, and 15 under 35 U.S.C. 102(a)(1) as being anticipated by Gella st al. (U.S. Patent Publication: US10089983B1), hereinafter referred to as Gella, Applicant appears to present the following position on Remarks pp. 8 -11 filed on 8/13/2021:
“The claims recite that a processor of an electronic device obtains metadata based on text of the speech input and, based on the metadata, the processor selects at least one application from among a plurality of applications for outputting a response to the speech input. Gella fails to teach or suggest such limitations. Instead, FIG. 1B of Gella, as relied upon by the Office, describes a process of a system interacting with an application to perform an action. More specifically, Gella describes a device 100 receiving first audio data of a user utterance, adding an 
After reproducing FIGS. 1A and 1B and column 10, line 0045 to column 11, line 14 of Gella Applicant continues:
As reflected above, Gella describes a voice activated electronic device 100 receiving an utterance and sending audio data representing the utterance to an NLU 260 of the language processing system 200. Upon receipt of the audio data, the language processing system 200 determines an intent of the utterance by parsing the text data into grammatical objects to determine the portions of the utterance associated with nouns, verbs, prepositions, etc. However, while Gella describes that the NLU system (i.e., not the electronic device 100 receiving the speech input) determines the intent of utterance after parsing the text data into grammatical objects to determine the portions of the utterance associated with nouns, verbs, prepositions, etc., and that a particular application is to be used, Applicant respectfully submits that Gella is silent regarding the features of an electronic device selecting an application among a plurality of applications for outputting a response based on metadata of text related to a speech input to the electronic device, as presently recited. For at least these reasons, Gella fails to teach or suggest, inter alia, "[a] method performed by an electronic device, the method comprising: ... in response to receiving the speech input, obtaining, by at least one processor of the electronic device, text corresponding to the speech input by performing speech recognition on the speech input; obtaining, by the at least one processor, metadata for the speech input based on the obtained text; based on the metadata, selecting, by the at least one processor, at least one application from among a plurality of applications for outputting a response to the speech input; and outputting, by the at least one processor, the response to the speech input by using the selected at least one application, “as presently recited in independent claim 1, and as similarly recited in independent claim 8 and 15. Therefore, the applied reference fails to disclose or render obvious the above- identified claim features recited in independent claims 1, 8, and 15. As such, the rejections of independent claims 1, 8, and 15 under 35 U.S.C. § 102(a)(1) are improper.”

electronic device where NLU is part of it. See Gella, Col. 27, lines 30 – 37: To correctly perform natural language understanding processing of speech input, NLU system 260 may be configured to determine a domain of an utterance. By determining the domain, NLU system 260 may narrow down which services and functionalities offered by an endpoint device (e.g., electronic device(s) 10 and/or 100, language processing system 200, or any other electronic device or system) may be relevant.
Therefore, applicant’s statement that Gella is silent regarding the features of an electronic device selecting… is not persuasive. Furthermore, metadata in a broad reasonable interpretation is a “data”, therefore, when Gella teaches the identification of an intent (data derived from an utterance) reads on the claim language. Please note Gella, Col. 11, lines 13-18:”For instance, the NLU system may identify that the word ‘order’ may be a recognized intent as being an invocation word associated with the food domain, and may use various sample utterances and invocation phrases associated with the food domain to determine an intent of the utterance.”
Applicant’s argument about the electronic device coupled with at least one process as depicted on page 11 of the argument “obtaining, by the at least one processor” is also not persuasive as such devices all are designed with at least one process to do the required arithmetic and also please note Gella, Col. 15, lines 0012 – 0017:”Electronic device 100 may include one or more processors 202a, 
Furthermore, for the argument of selecting application from among plurality of applications as stated on page 11 “at least one application from among a plurality of applications for outputting a response to the speech input”, Examiner refer Applicant’s to the “Pizza Application”, “Placing an order application”, “Particular Application” etc. Furthermore, utterances are used to invocate the process of ordering a pizza. Please note Gella, Col. 10, line 0067- col. 11, line 0001-0014:“Order a pizza from ‘Pizza Application’,” may be identified by a Food domain as possibly being able to handle the corresponding request. For instance, the NLU system may identify that the word ‘order’ may be a recognized intent as being an invocation word associated with the food domain, and may use various sample utterances and invocation phrases associated with the food domain to determine an intent of the utterance. In some embodiments, the NLU system may determine that the intent of utterance 4 is for placing an order with an application [e.g., {Intent}: “Order Item”], where the item to be ordered is a pizza [e.g., {Item To Be Ordered}: “Pizza”], and that a particular application to be used to order that item [e.g., {Skill/Application}: ‘Pizza Application’].”
For at least the supra provided reasons, Applicant’s arguments are found not persuasive. Examiner respectfully disagrees, and therefore, the rejections of Claims 1, 8, and 15 under 35 U.S.S. 102(a)(1) are sustained and further updated accordingly.

In response to the art rejection of the remainder of dependent claims 2, 4 – 7, 9, 11 – 14 rejected under 35 U.S.C. 102(a)(1) and/or 103) in case said claims are correspondingly discussed and/or argued for at least the same rationale presented in remarks filed 8/13/2021, Examiner respectfully notes as follows. For completeness, should the mentioned claim(s) is (are) likewise traversed for similar reasons to independent claim 1, 8 and 15 correspondingly, Examiner respectfully directs Applicant to the same supra reasons provided in the response directed towards claim 1, 8 and 15 correspondingly discussed above. For at least the same supra provided reasons, 

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.




Claims 1 - 3, 8 - 10 and 15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by  Gella et al. (US 10089983 B1)(hereinafter "Gella").

Regarding claim 1, Gella  teaches a method, performed by an electronic device, the method comprising: receiving by a user inputter of the electronic device, a speech input; (Gella, Col. 10, lines 0045 - 0049:”Upon receipt, the first audio data may be provided to an automatic speech recognition [“ASR”] system capable of performing automatic speech recognition processing.”, and Col. 14, lines 0053 - 0055:”For example, electronic device 100 may include, or be in communication with, one or more microphones that listen for a wakeword by continually monitoring local audio.).
in response to receiving the speech input, obtaining, by at least one processor of the electronic device, text corresponding to the speech input by performing speech recognition on text data representing the first audio data may be generated. Upon receipt, the first audio data may be provided to an automatic speech recognition [“ASR”] system capable of performing automatic speech recognition processing. The ASR system, as described in greater detail below with reference to FIG. 2A, may perform speech-to-text processing to the first audio data to generate first text data representing the first audio data.”, and Col. 15, lines 0012 – 0017:”Electronic device 100 may include one or more processors 202a, storage/memory 204a, communications circuitry 206a, one or more microphones 208a or other audio input devices [e.g., transducers], one or more speakers 210a or other audio output devices, a display screen 212a, and one or more cameras 214a or other image capturing components.).
obtaining, by the at least one processor, metadata for the speech input based on the obtained text; (Gella, Col. 10, lines 0054 - 0056, 0059 – 0061:”After the first text data is generated, the text data may be provided to a natural language understanding [“NLU”] system to perform NLU processing to the text data…. The NLU system may determine one or more domains, which may also be referred to as categories that may be capable of handling the intent of the utterance.”).
based on the metadata, selecting, by the at least one processor, at least one application from among a plurality of applications for outputting the response to the speech input ; and (Gella, Col. 10, line 0067- col. 11, line 0001-0014:“Order a pizza from ‘Pizza Application’,” may be identified by a Food domain as possibly being able to handle the corresponding request. For instance, the NLU system may identify that the word ‘order’ may be a recognized intent as being an invocation word associated with the food domain, and may use utterances and invocation phrases associated with the food domain to determine an intent of the utterance. In some embodiments, the NLU system may determine that the intent of utterance 4 is for placing an order with an application [e.g., {Intent}: “Order Item”], where the item to be ordered is a pizza [e.g., {Item To Be Ordered}: “Pizza”], and that a particular application to be used to order that item [e.g., {Skill/Application}: ‘Pizza Application’].”).
outputting, by the at least one processor, the response to the speech input by using the selected at least one application. (Gella, Col. 11, line 0008 – 0014:”… the NLU system may determine that the intent of utterance 4 is for placing an order with an application [e.g., {Intent}: “Order Item”], where the item to be ordered is a pizza [e.g., {Item To Be Ordered}: “Pizza”], and that a particular application to be used to order that item [e.g., {Skill/Application}: ‘Pizza Application’].”, and Col. 13, lines 0008 – 0015:”At step 180, notification [response] data indicating the action has been/is going to be completed by first application system 140 may be generated. For instance, the notification data may indicate to language processing system 200 that a request associated with the intent is being processed. Therefore, language processing system 200 may be able to inform a requesting device (e.g., voice activated electronic device 100), that the action is being carried out.”, and Col. 15, lines 0022 – 0037:“At step 192, the notification [response] data may be received. For instance, the notification data generated and sent by first application system 140 may be received by language processing system 200. In response to receiving the notification data, the functionality associated with the first application may determine an action to be performed by language processing system 200. At step 194, second text data representing a response using the first application's functionality may be determined. For example, the first functionality may have caused sample responses to be added to the language model associated with the first account. In response to receiving the notification data, language processing system 200 may determine text data representing a sample response to use to indicate that first application system 140 is carrying out the action requested by the intent of utterance 4.”).

Regarding claim 2, Gella teaches the method of claim 1, wherein the metadata comprises at least one of a keyword extracted from the obtained text, information about an intention of a user obtained based on the obtained text, information about a sound characteristic of the speech input, or information about a-the user of the electronic device. (Col. 10, lines 0049 – 0056:”The ASR system, as described in greater detail below with reference to FIG. 2A, may perform speech-to-text processing to the first audio data to generate first text data representing the first audio data. At step 160, an intent of the utterance may be determined to be associated with a first application. After the first text data is generated, the text data may be provided to a natural language understanding [“NLU”] system to perform NLU processing to the text data.”, and Col. 10, line 0064 – Col. 11 lines 8:” The NLU system may determine one or more domains, which may also be referred to as categories that may be capable of handling the intent of the utterance. For example, utterance 4, 'Order a pizza from ‘Pizza Application’,' may be identified by a Food domain as possibly being able to handle the corresponding request. For instance, the NLU system may identify that the word ‘order’ [keyword] may be a recognized intent as being an invocation word associated with the food utterances and invocation phrases associated with the food domain to determine an intent of the utterance.).


Regarding claim 8, Gella teaches an electronic device comprising: an outputter, a user inputter configured to receive speech input; (Gella, Col. 14, lines 41-42:”For example, electronic device 100 may be able to receive and output audio”, and Col. 15, lines 12-17:”Electronic device 100 may include one or more processors 202a, storage/memory 204a, communications circuitry 206a, one or more microphones 208a or other audio input devices [e.g., transducers], one or more speakers 210a or other audio output devices, a display screen 212a, and one or more cameras 214a or other image capturing components.”, and Col. 10, lines 0045 - 0049:”Upon receipt, the first audio data may be provided to an automatic speech recognition [“ASR”] system capable of performing automatic speech recognition processing.”, and Col. 14, lines 0053 - 0055:”For example, electronic device 100 may include, or be in communication with, one or more microphones that listen for a wakeword by continually monitoring local audio.”).
and at least one processor configured to: in response to the user inputter receiving the speech input, obtain text by performing speech recognition on the speech input, (Gella, Col. 10, lines 0045 – 0052:”At step 158, first text data representing the first audio data may be generated. Upon receipt, the first audio data may be provided to an automatic speech recognition [“ASR”] system capable of performing automatic speech recognition processing. The ASR system, as described in greater detail below with reference to FIG. 2A, may perform speech-to-text processing to the first audio data to generate first text data representing the first audio data.”, and Col. 15, lines 0012 – 0017:”Electronic device 100 may include one or more processors 202a, storage/memory 204a, communications circuitry 206a, one or more microphones 208a or other audio input devices [e.g., transducers], one or more speakers 210a or other audio output devices, a display screen 212a, and one or more cameras 214a or other image capturing components.).
obtain metadata for the speech input based on the obtained text, (Gella, Col. 10, lines 0054 - 0056, 0059 – 0061:”After the first text data is generated, the text data may be provided to a natural language understanding [“NLU”] system to perform NLU processing to the text data…. The NLU system may determine one or more domains, which may also be referred to as categories that may be capable of handling the intent of the utterance.”).
 based on the metadata, select at least one application from among a plurality of applications for outputting a response to the speech input; and (Gella, Col. 10, line 0067- col. 11, line 0001-0014:“Order a pizza from ‘Pizza Application’,” may be identified by a Food domain as possibly being able to handle the corresponding request. For instance, the NLU system may identify that the word ‘order’ may be a recognized intent as being an invocation word associated with the food domain, and may use various sample utterances and invocation phrases associated with the food domain to determine an intent of the utterance. In some embodiments, the NLU system may determine that the intent of utterance 4 is for placing an order with an application [e.g., {Intent}: “Order Item”], where the item to be ordered is a pizza [e.g., {Item To Be Ordered}: “Pizza”], and that a particular application to be used to order that item [e.g., {Skill/Application}: ‘Pizza Application’].”).
control ai-the outputter to output the response to the speech input by using the selected at least one application. (Gella, Col. 11, line 0008 – 0014:”… the NLU system may determine that the intent of utterance 4 is for placing an order with an application [e.g., {Intent}: “Order Item”], where the item to be ordered is a pizza [e.g., {Item To Be Ordered}: “Pizza”], and that a particular application to be used to order that item [e.g., {Skill/Application}: ‘Pizza Application’].”, and Col. 13, lines 0008 – 0015:”At step 180, notification [response] data indicating the action has been/is going to be completed by first application system 140 may be generated. For instance, the notification data may indicate to language processing system 200 that a request associated with the intent is being processed. Therefore, language processing system 200 may be able to inform a requesting device (e.g., voice activated electronic device 100), that the action is being carried out.”, and Col. 15, lines 0022 – 0037:“At step 192, the notification [response] data may be received. For instance, the notification data generated and sent by first application system 140 may be received by language processing system 200. In response to receiving the notification data, the functionality associated with the first application may determine an action to be performed by language processing system 200. At step 194, second text data representing a response using the first application's functionality may be determined. For example, the first functionality may have caused sample responses to be added to the language model associated with the first account. In response to receiving the notification data, language processing system 200 may determine text data representing a sample response to use to indicate that first application system 140 is carrying out the action requested by the intent of utterance 4.”).


Regarding claim 9, Gella teaches the method of claim 1, wherein the metadata comprises at least one of a keyword extracted from the obtained text, information about an intention of a user obtained based on the obtained text, information about a sound characteristic of the speech input, or information about the user of the electronic device. (Col. 10, lines 0049 – 0056:”The ASR system, as described in greater detail below with reference to FIG. 2A, may perform speech-to-text processing to the first audio data to generate first text data representing the first audio data. At step 160, an intent of the utterance may be determined to be associated with a first application. After the first text data is generated, the text data may be provided to a natural language understanding [“NLU”] system to perform NLU processing to the text data.”, and Col. 10, line 0064 – Col. 11 lines 8:” The NLU system may determine one or more domains, which may also be referred to as categories that may be capable of handling the intent of the utterance. For example, utterance 4, 'Order a pizza from ‘Pizza Application’,' may be identified by a Food domain as possibly being able to handle the corresponding request. For instance, the NLU system may identify that the word ‘order’ [keyword] may be a recognized intent as being an invocation word associated with the food domain, and may use various sample utterances and invocation phrases associated with the food domain to determine an intent of the utterance.).


Regarding claim 15, Gella teaches a non-transitory computer-readable storage medium configured to store one or more computer programs including instructions that, when executed of an electronic device, cause the at least one processor to control to: (Gella, Col. 16, line 0060 – Col. 17, line 0007:” … information may be stored using computer-readable instructions, data structures, and/or program systems. Various types of storage/memory may include, but are not limited to, hard drives, solid state drives, flash memory, permanent memory [e.g., ROM], electronically erasable programmable read-only memory [“EEPROM”], CD-ROM, digital versatile disk [“DVD”] or other optical storage medium, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, RAID storage systems, or any other storage type, or any combination thereof. Furthermore, storage/memory 204 may be implemented as computer-readable storage media [“CRSM”], which may be any available physical media accessible by processor[s] 202 to execute one or more instructions stored within storage/memory 204.”).
receive a speech input; (Gella, Col. 10, lines 0045 - 0049:”Upon receipt, the first audio data may be provided to an automatic speech recognition [“ASR”] system capable of performing automatic speech recognition processing.”, and Col. 14, lines 0053 - 0055:”For example, electronic device 100 may include, or be in communication with, one or more microphones that listen for a wakeword by continually monitoring local audio.).
in response to receiving the speech input, obtain text corresponding to the speech input by performing speech recognition on the speech input; (Gella, Col. 10, lines 0045 – 0052:” At step 158, first text data representing the first audio data may be generated. Upon receipt, the first audio data may be provided to an automatic speech recognition [“ASR”] system capable of performing automatic speech recognition processing. The ASR system, as described in greater detail below with reference to FIG. 2A, may perform speech-to-text processing to the audio data to generate first text data representing the first audio data.”, and Col. 15, lines 0012 – 0017:”Electronic device 100 may include one or more processors 202a, storage/memory 204a, communications circuitry 206a, one or more microphones 208a or other audio input devices [e.g., transducers], one or more speakers 210a or other audio output devices, a display screen 212a, and one or more cameras 214a or other image capturing components.).
obtain metadata for the speech input based on the obtained text; (Gella, Col. 10, lines 0054 - 0056, 0059 – 0061:”After the first text data is generated, the text data may be provided to a natural language understanding [“NLU”] system to perform NLU processing to the text data…. The NLU system may determine one or more domains, which may also be referred to as categories that may be capable of handling the intent of the utterance.”).
based on the obtained metadata, select at least one application from among a plurality of applications for outputting the response to the speech input, and (Gella, Col. 10, line 0067- col. 11, line 0001-0014:“Order a pizza from ‘Pizza Application’,” may be identified by a Food domain as possibly being able to handle the corresponding request. For instance, the NLU system may identify that the word ‘order’ may be a recognized intent as being an invocation word associated with the food domain, and may use various sample utterances and invocation phrases associated with the food domain to determine an intent of the utterance. In some embodiments, the NLU system may determine that the intent of utterance 4 is for placing an order with an application [e.g., {Intent}: “Order Item”], where the item to be ordered is a pizza [e.g., {Item To Be Ordered}: “Pizza”], and that a particular application to be used to order that item [e.g., {Skill/Application}: ‘Pizza Application’].”).
placing an order with an application [e.g., {Intent}: “Order Item”], where the item to be ordered is a pizza [e.g., {Item To Be Ordered}: “Pizza”], and that a particular application to be used to order that item [e.g., {Skill/Application}: ‘Pizza Application’].”, and Col. 13, lines 0008 – 0015:”At step 180, notification [response] data indicating the action has been/is going to be completed by first application system 140 may be generated. For instance, the notification data may indicate to language processing system 200 that a request associated with the intent is being processed. Therefore, language processing system 200 may be able to inform a requesting device (e.g., voice activated electronic device 100), that the action is being carried out.”, and Col. 15, lines 0022 – 0037:“At step 192, the notification [response] data may be received. For instance, the notification data generated and sent by first application system 140 may be received by language processing system 200. In response to receiving the notification data, the functionality associated with the first application may determine an action to be performed by language processing system 200. At step 194, second text data representing a response using the first application's functionality may be determined. For example, the first functionality may have caused sample responses to be added to the language model associated with the first account. In response to receiving the notification data, language processing system 200 may determine text data representing a sample response to use to indicate that first application system 140 is carrying out the action requested by the intent of utterance 4.”).



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.


Claims 3, and 10  are rejected under 35 U.S.C. 103 as being unpatentable over Gella (US10089983B1) as applied to claim 1, and 8 respectively, in further view of Wang et al. (US9472196B1)(hereinafter “Wang”).

Regarding claim 3, Gella teach a method, performed by an electronic device.
Gella does not teach the method of claim 1, wherein the selecting of the at least one application comprises selecting, by the at least one processor, the at least one application based on at least one of information about feedback information of a user about responses output by the plurality of applications, information about a result of processing the speech input by the plurality of applications, or information about a time taken for the plurality of applications to output the responses.
Wang teaches wherein the selecting of the at least one application comprises selecting, by the at least one processor, the at least one application based on at least one of information about feedback information of a user about responses output by the plurality of applications, information about a result of processing the speech input by the plurality of applications, or information about a time taken for the plurality of applications to output the responses. (Wang, Col. 16, lines 15 – 25:” In some examples, feedback from multiple users may be aggregated and analyzed in determining how to adjust parameters associated with an intent or information related to the intent. For example, based on feedback aggregated from a body of users indicating that the trigger phrase “Call taxi” typically results in users selecting the “Cab Caller” application over the “TaxiNow” application, the voice action service system 200 can determine Call taxi” and the intent that specifies the “Cab Caller” application or the “Cab Caller” application itself.”)
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Gella’s teaching “of multiple account language processing system” with Wang’s teaching “of identifying an application based on the received data” to select application based on feedback information, in order to increase the accuracy or efficiency of the voice action service (Col. 16, lines 13 – 14).


Regarding claim 10, Gella teach a method, performed by an electronic device.
Gella does not teach the electronic device of claim 8, wherein the at least one processor is further configured to select the at least one application based on at least one of information about feedback information of a user about responses output by the plurality of applications, information about a result of processing the speech input by the plurality of applications, or information about a time taken for the plurality of applications to output the responses.
Wang teaches wherein the at least one processor is further configured to select the at least one application based on at least one of information about feedback information of a user about responses output by the plurality of applications, information about a result of processing the speech input by the plurality of applications, or information about a time taken for the plurality of applications to output the responses. (Wang, Col. 16, lines 15 – 25:” In some examples, feedback from multiple users may be aggregated and analyzed in determining how to adjust parameters associated with an intent or information related to the intent. For feedback aggregated from a body of users indicating that the trigger phrase “Call taxi” typically results in users selecting the “Cab Caller” application over the “TaxiNow” application, the voice action service system 200 can determine to increase a strength of relationship or confidence score for the trigger phrase “Call taxi” and the intent that specifies the “Cab Caller” application or the “Cab Caller” application itself.”)
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Gella’s teaching “of multiple account language processing system” with Wang’s teaching “of identifying an application based on the received data” to select application based on feedback information, in order to increase the accuracy or efficiency of the voice action service (Col. 16, lines 13 – 14).


Claims 4, and 11  are rejected under 35 U.S.C. 103 as being unpatentable over  Gella (US10089983B1) as applied to claim 1, and 8 respectively, and in further view of  Marcel Van Os (US9576574B2 )(hereinafter " Van Os").

Van Os was applied in the previous Office Action.
Regarding claim 4, Gella teaches a method, performed by an electronic device, of outputting a response to a speech input by using an application.
Gella does not teach the method of claim 1, wherein the outputting of the response comprises: obtaining by the at least one processor, at least one response to the speech input from the selected at least one application; determining by the at least one processor, a priority by the at least one processor, the at least one response according to the determined priority.
Van Os teaches wherein the outputting of the response comprises: obtaining by the at least one processor, at least one response to the speech input from the selected at least one application; (Van Os, Col. 25, line 59 – 0067:"In some embodiments, the context-sensitive interruption handler of the digital assistant intercepts the responses, reminders, and/or notifications before they are provided to the user, and determines dynamically in real-time, a relative urgency [priority] between the responses, reminders, and/or notifications. The context-sensitive interruption handler of the digital assistant then provides the responses, reminders, and/or notifications in an order based on the relative urgency thereof. "). 
determining by the at least one processor, a priority of the at least one response; (Van Os, Col. 26, line 0001- 0007:”In some embodiments, since the context may change again during the time it takes for the most highly prioritized response/reminder/notification to be provided to the user, the relative urgency [priority] is re-evaluated among the remaining and any newly available responses, reminders, and notifications. In some embodiments, the re-evaluation takes into account new information that alters the present context.”).
and outputting, by the at least one processor, the at least one response according to the determined priority. (Van Os, Col. 26, lines 0012 - 0018:"For example, if a reminder can be provided via a graphical interface, and a response to user input can be provided to the user via a speech output, the digital assistant can optionally provide the reminder and the response simultaneously using the graphical interface and the speech output without resorting to the interruption handler.", and Col. 26, lines 0021 - 0027:"In some embodiments, the digital prioritizes the concurrently available outputs [e.g., responses, reminders, and/or notifications] for delivery one at a time over a single output channel when the digital assistant detects that the user is likely to have diminished or impaired ability to focus on multiple output channels at the same time. ”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Gella’s teaching “of multiple account language processing system” with Van Os teaching “of interrupt handling by digital assistants” to obtain at least one response to the speech input from the selected at least one application; determining a priority of the at least one response; and outputting the response according to the determined priority, in order to improve a user's experience in interacting with the system and promote the user's confidence in the system's services and capabilities, a well-designed response procedure is needed, as evidence by Van Os (See Col.1, lines 0041 – 0044). 

Regarding claim 11, Gella teaches a method, performed by an electronic device, of outputting a response to a speech input by using an application.
Gella does not teach the electronic device of claim 8, wherein the at least one processor is further configured to: obtain at least one response to the speech input from the selected at least one application, determining a priority of the at least one response; and control the outputter to output the at least one response according to the determined priority.
Van Os teaches wherein the at least one processor is further configured to: obtain at least one response to the speech input from the selected at least one application; (Van Os, Col. 25, line 59 – 0067:"In some embodiments, the context-sensitive interruption handler of the urgency [priority] between the responses, reminders, and/or notifications. The context-sensitive interruption handler of the digital assistant then provides the responses, reminders, and/or notifications in an order based on the relative urgency thereof. "). 
determining a priority of the at least one response; (Van Os, Col. 26, line 0001- 0007:”In some embodiments, since the context may change again during the time it takes for the most highly prioritized response/reminder/notification to be provided to the user, the relative urgency [priority] is re-evaluated among the remaining and any newly available responses, reminders, and notifications. In some embodiments, the re-evaluation takes into account new information that alters the present context.”).
and control the outputter to output the at least one response according to the determined priority. (Van Os, Col. 26, lines 0012 - 0018:"For example, if a reminder can be provided via a graphical interface, and a response to user input can be provided to the user via a speech output, the digital assistant can optionally provide the reminder and the response simultaneously using the graphical interface and the speech output without resorting to the interruption handler.", and Col. 26, lines 0021 - 0027:"In some embodiments, the digital assistant prioritizes the concurrently available outputs [e.g., responses, reminders, and/or notifications] for delivery one at a time over a single output channel when the digital assistant detects that the user is likely to have diminished or impaired ability to focus on multiple output channels at the same time. ”).
. 

Claims 5, and 12  are rejected under 35 U.S.C. 103 as being unpatentable over  Gella (US10089983B1), and Van Os (US9576574B2) as applied to claim 4, and 11 respectively, and in further view of  Guo-Feng Zhang (US20140188477A1)(hereinafter " Zhang").

Zhang was applied in the previous Office Action.
Regarding claims 5 Gella and van Os teach a method, performed by an electronic device, of outputting a response to a speech input by using an application.
Neither Gella nor Van Os teach the method of claim 4, wherein the determining of the priority comprises determining, by the at least one processor, the priority based on at least one of an intention of a user related to the at least one response, a size of the at least one response, whether the at least one response comprises a characteristic preferred by the user, 
Zhang teaches wherein the priority wherein the determining of the priority comprises determining, by the at least one processor, the priority based on at least one of an intention of a user related to the at least one response, a size of the at least one response, whether the at least one response comprises a characteristic preferred by the user, or information about a time taken to output the at least one response after obtaining the at least one response. (Zhang, Par. 0196:”The natural language comprehension system 720 may also determine the priorities of the report answers 711 according to a user's usage frequencies. Specifically, the natural language comprehension system 720 is able to register those received user's speech inputs 701 in the properties database 730, and the properties database 730 may register those keywords 709 obtained when the natural language comprehension system 720 parses the user's speech inputs 701 and may also register all the report answers 711 generated by the natural language comprehension system 720. Afterwards, the natural language comprehension system 720 may find the report answer 711 relatively conformable to the user's intention [determined by the user's speech input] according to the priority, so as to find the corresponding speech response finally. The recorded information mentioned here may include the user's preferences/dislikes/habits and even the public preferences/dislikes/habits.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Gella’s teaching “of multiple account language processing system”, and Van Os teaching “of interrupt handling by digital assistants”  with Zhang’s teaching “of speech correction of a dialog system”  to determine 


Regarding claim 12, Gella and Van Os teach a method, performed by an electronic device, of outputting a response to a speech input by using an application.
Neither Gella nor Van Os teach the electronic device of claim 11, wherein the at least one processor is further configured to determine the priority based on at least one of an intention of a user related to the at least one response, a size of the at least one response, whether the at least one response comprises a characteristic preferred by the user, or information about a time taken to output the at least one response after obtaining the at least one response. 
Zhang teaches wherein the priority wherein the at least one processor is further configured to determine the priority based on at least one of an intention of a user related to the at least one response, a size of the at least one response, whether the at least one response comprises a characteristic preferred by the user, or information about a time taken to output the at least one response after obtaining the at least one response. (Zhang, Par. 0196:”The natural language comprehension system 720 may also determine the priorities of the report answers 711 according to a user's usage frequencies. Specifically, the natural language speech inputs 701 in the properties database 730, and the properties database 730 may register those keywords 709 obtained when the natural language comprehension system 720 parses the user's speech inputs 701 and may also register all the report answers 711 generated by the natural language comprehension system 720. Afterwards, the natural language comprehension system 720 may find the report answer 711 relatively conformable to the user's intention [determined by the user's speech input] according to the priority, so as to find the corresponding speech response finally. The recorded information mentioned here may include the user's preferences/dislikes/habits and even the public preferences/dislikes/habits.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Gella’s teaching “of multiple account language processing system”, and Van Os teaching “of interrupt handling by digital assistants”  with Zhang’s teaching “of speech correction of a dialog system”  to determine priority based on at least one of an intention of a user related to the at least one response, a size of the at least one response, in order to facilitate the use of the natural language dialogue system and to correct the previously output speech response and further outputs another speech response according to another speech input subsequently provided by the user, as evidence by Zhang (see Par. 0012).


Claims 6, and 13  are rejected under 35 U.S.C. 103 as being unpatentable over  Gella (US10089983B1), and Van Os (US9576574B2) as applied to claim 4, and 11 respectively, and in further view of  Junki Ohmura  (US 20180074785 A1 )(hereinafter "Ohmura").

Ohmura was applied in the previous Office Action.
Regarding claim 6, Gella, and Van Os do not teach a method, performed by an electronic device, of outputting a response to a speech input by using an application.
Neither Gella nor Van Os teach the method of claim 6, wherein the determining of the priority comprises in response to the user inputter receiving a plurality of speech inputs, determining, by the at least one processor, the priority based on metadata of each of the plurality of speech inputs.
Ohmura teaches wherein the determining of the priority comprises in response to the user inputter receiving a plurality of speech inputs, determining, by the at least one processor, the priority based on metadata of each of the plurality of speech inputs. (Ohmura, Par. 0011:”… a response generation unit configured to generate responses to speeches from a plurality of users; a decision unit configured to decide methods of outputting the responses to the respective users on the basis of priorities according to order of the speeches from the plurality of users; and an output control unit configured to perform control such that the generated responses are output by using the decided methods of outputting the responses.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Gella’s teaching “of multiple account language processing system”, and Van Os teaching “of interrupt handling by digital 


Regarding claim 13, Gella, and Van Os do not teach a method, performed by an electronic device, of outputting a response to a speech input by using an application.
Neither Gella nor Van Os teach the electronic device of claim 8, wherein the at least one processor is further configured in response to the user inputter receiving a plurality of speech inputs, determine the priority based on metadata of each of the plurality of speech inputs.
Ohmura teaches wherein the at least one processor is further configured in response to the user inputter receiving a plurality of speech inputs, determine the priority based on metadata of each of the plurality of speech inputs. (Ohmura, Par. 0011:”… a response generation unit configured to generate responses to speeches from a plurality of users; a decision unit configured to decide methods of outputting the responses to the respective users on the basis of priorities according to order of the speeches from the plurality of users; and an output control unit configured to perform control such that the generated responses are output by using the decided methods of outputting the responses.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Gella’s teaching “of multiple . 


Claims 7, and 14  are rejected under 35 U.S.C. 103 as being unpatentable over  Gella (US10089983B1) as applied to claim 1, and 8 respectively, and in further view of  Takafumi et al. (WO2018025668A1)(hereinafter " Takafumi").

Takafumi was applied in the previous Office Action.
Examiner’s note: In the Non-Final Rejection issued on 05/21/2021, an error was inadvertently introduced by stating that “Regarding claims 4 and 11” were the correct claim numbers were 7 and 14. However, correct claim language in correspondence for claims 7 and 14 were addressed.
Regarding claim 7, Gella teaches a method, performed by an electronic device, of outputting a response to a speech input by using an application.
Gella does not teach the method of claim 1, further comprising, as-based on detecting an event comprising a state in which an application previously determined as an application for outputting the response to the speech input cannot output the response to the speech input, selecting, by the at least one processor, at least one of a plurality of applications for outputting the response to the speech input.
Takafumi teaches further comprising, as-based on detecting an event comprising a state in which an application previously determined as an application for outputting the response to the speech input cannot output the response to the speech input, selecting, by the at least one processor, at least one of a plurality of applications for outputting the response to the speech input. (Takafumi, Par. 0052:"In other words, the response unit 121 suspends the other specific conversation application when the user starts a topic different from the topic related to the other specific conversation application while executing the other specific conversation application. In response to the interruption of the other specific conversation application, the response unit 121 sets the specific conversation application that can respond to the user's speech among the specific conversation applications previously interrupted to the suspended state of the specific conversation application. You may resume based. That is, in response to the interruption of another specific conversation application, the response unit 121 selects a specific conversation application corresponding to a topic newly started by the user from among the specific conversation applications previously interrupted. You may resume based on the interruption status.", and Par. 0064:"After an appropriate conversation application is selected by the selection unit 134 for the user's utterance, the response unit 121 executes the selected conversation application and responds to the user's utterance [S134].").
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Gella’s teaching “of multiple account language processing system” with Takafumi’s teaching “of facilitating more natural 


Regarding claim 14, Gella teaches a method, performed by an electronic device, of outputting a response to a speech input by using an application.
Gella does not teach the electronic device of claim 8, wherein the at least one processor is further configured to, as-based on detecting an event comprising a state in which an application previously determined as an application for outputting the response to the speech input cannot output the response to the speech input, select at least one of a plurality of applications for outputting the response to the speech input.
Takafumi teaches wherein the at least one processor is further configured to, as-based on detecting event comprising a state in which an application previously determined as an application for outputting the response to the speech input cannot output the response to the speech input, select at least one processor, at least one of a plurality of applications for outputting the response to the speech input. (Takafumi, Par. 0052:"In other words, the response unit 121 suspends the other specific conversation application when the user starts a topic different from the topic related to the other specific conversation application while executing the other specific conversation application. In response to the interruption of the other specific conversation application, the response unit 121 sets the specific conversation application that can respond to the user's speech among the specific conversation applications previously interrupted to the suspended state of the specific conversation application. You may resume based. That is, in response to the interruption of another specific conversation application, the response unit 121 selects a specific conversation application corresponding to a topic newly started by the user from among the specific conversation applications previously interrupted. You may resume based on the interruption status.", and Par. 0064:"After an appropriate conversation application is selected by the selection unit 134 for the user's utterance, the response unit 121 executes the selected conversation application and responds to the user's utterance [S134].").
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Gella’s teaching “of multiple account language processing system” with Takafumi’s teaching “of facilitating more natural conversation with a user” to output the response to the speech input cannot output the response to the speech input is detected, at least one of a plurality of applications for outputting the response to the speech input is selected, in order to create a system that can more naturally communicate with users is desired, as evidence by Takafumi (See Par. 0003).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Bozarth et al. (US 9754016 B1) teaches a user interacting with an electronic device can receive suggestions for applications or services that can .

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  

A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
 Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689. The examiner can normally be reached Monday - Thursday and alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DARIOUSH AGAHI/             Examiner, Art Unit 2656                                                                                                                                                                                                              /EDGAR X GUERRA-ERAZO/                   Primary Examiner, Art Unit 2656