DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments

Applicant's arguments filed 5/20/2022 have been fully considered but they are not persuasive.
Applicant has amended the claims to include “wherein the second response includes guide information related to the first response, and wherein the guide information is based on user profile information, preference information, and social network (SNS) activity information”. The applicant argues that the cited references do not teach these amendments. The examiner disagrees. Evermann teaches in par. [0026] guided search commands 206 uses voice and text prompts to guide the user through a directed dialog in order to elicit the information required in order to fulfill his search for information. For example, when the user says "search ringtones," the device responds with a spoken and displayed prompt "what artist?" The user then speaks the name of the artist. Also par. [0043] discusses  the ASR receives audio associated with a guided dialog in a "DIRECTORY ASSISTANCE" command followed by a "WHAT STATE?" prompt, it searches for matches in its database of state names, and after the prompt "WHAT CITY" it uses a database of city names in the identified state. This teaches that the second response includes information associated with the first response.
With regards to the amendment including “the guide information is based on user profile information, preference information, and social network (SNS) activity information” the examiner believes Evermann also teaches these amendments. According to the application specification see par. [0210]  “According to an embodiment of the disclosure, priorities 1405 (Rank) of Cab A and Cab B included in the first data 1404 (metadata) may be determined based on profile information 1401 (Vi) about User C, which is used to provide guide information of Cab A 1406 or guide information of Cab B. The priorities 1405 (Rank) may be determined by a predetermined function (f). For example, the electronic device 1000 may determine that the priority of Cab B is higher, due to a less time to  arrive at the destination despite a higher fare, based on at least one of salary information, age information, or profession information included in profile information of User C. Thus, the electronic device 1000 may generate a second response including information on how to book Cab B based on the profile information of User C. Evermann teaches in par. [0054-0055] that ASR Server 1 12 uses the side information it extracts from the received signal to categorize the mobile device user. The user categories include gender, an age range, accent, dialect, and the emotional state of the user. Additionally par. [0083-0084] teaches that device 102 also recognizes past patterns of user searching (user preferences) to pre-load data that it may need to fulfill a future search request. For example, if the user often requests "SEARCH RED SOX SCORES," the device 102 will regularly receive Red Sox scores from a sports content provider via transaction server 110. The user of device 102 may choose to share his locally stored yellow pages with users of other devices, and conversely, receive others' yellow pages. If the user knows the other person, this "social networking" offers a convenient means of receiving information from a trusted source. Social networking may be pairwise, or involve groups who provide permission to each other to share personal yellow pages. Users can augment the entries in their locally stored yellow pages with reviews, ratings, and personal comments relating to the listed businesses. Users can choose to share this additional information as part of their social networking options.
For these reasons the examiner believes Evermann still teaches the amended claims.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-15, 19 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim U.S PAP 2016/0210023 A1, in view of Evermann WO 2008/083172 A2, in view of Marsh U.S. AP 2007/0033531 A1.
Regarding claim 1 Kim teaches a method of an electronic device (method for displaying a response to an inquiry, see abstract), comprising: 
receiving a user input from a user (the device 100 may receive a user input. A user input may be an input received from a user. For example, a user input may include at least one selected from the group consisting of a touch input, a keyboard input, a sound input, a button input, and a gesture input, see par. [0070]); 
and in response to receiving the user input, generating a first response comprising first content based on the user input (the device 100 may obtain an inquiry that the received user input indicates. For example, the device 100 may obtain a request for certain information from a voice input received from the user, see par. [0072]),
obtaining contextual information of the user ( the device 100 may obtain context information by using a sensor included in the device 100, see par. [0080]), 
However Kim does not teach generating a first response comprising first content having a first type based on the user input that is input irrespective of contextual information of the user; generating a combined response based on the first response and the second response; and outputting the combined response; wherein the second response includes guide information related to the first response, and wherein the guide information is based on user profile information, preference information, and social network (SNS) activity information.
In the same filed of endeavor Evermann teaches generating a first response comprising first content having a first type based on the user input that is input irrespective of contextual information of the user (receiving an utterance from the user, the utterance corresponding to a command of either of the first type or the second type; using the speech recognition functionality to recognize the utterance; if the received utterance is a command of the first type, performing a corresponding command and control function, and if the received utterance is a command of the second type, generating a representation of a corresponding search request and then using the representation to request a search that is responsive to the search request see par. [0007]); generating a combined response based on the first response and the second response, (The user invokes the device's search functionality by uttering a search command, such as, for example "Directory Assistance." The device recognizes the command, and, for certain search commands, elicits further information from the user, see par. [0018]. Guided search commands 206 uses voice and text prompts to guide the user through a directed dialog to elicit the information required to fulfill his search for information, see par. [0026]. The search application then opens a wireless data connection to a transaction server, and sends it a representation of the user's spoken answers. The transaction server then forwards the user's information request, now in text form, to an appropriately selected content provider. The content provider searches for and retrieves the requested information, and sends its search results back to the transaction server. The transaction server then processes the search results and sends the results along with the user's search request and information about the user to one or more advertising providers. These providers offer advertisements back to the transaction server, which selects optimally targeted advertisements to combine with the search results. The transaction server then sends search results and advertisements to the mobile device. The device's voice-mediated search software displays the results to the user as text, graphics, and video and, optionally as audio output of synthesized speech, sounds, or music, see par. [0018]); wherein the second response includes guide information related to the first response (the ASR receives audio associated with a guided dialog in a "DIRECTORY ASSISTANCE" command followed by a "WHAT STATE?" prompt, it searches for matches in its database of state names, and after the prompt "WHAT CITY" it uses a database of city names in the identified state. This teaches that the second response includes information associated with the first response, see par. [0043]), and wherein the guide information is based on user profile information, preference information, and social network (SNS) activity information (the first data 1404 (metadata) may be determined based on profile information 1401 (Vi) about User C, which is used to provide guide information …, based on at least one of salary information, age information, or profession information included in profile information of User C. see par. [0210];  ASR Server 1 12 uses the side information it extracts from the received signal to categorize the mobile device user. The user categories include gender, an age range, accent, dialect, and the emotional state of the user, see par. [0054-0055]. device 102 also recognizes past patterns of user searching (user preferences) to pre-load data that it may need to fulfill a future search request. For example, if the user often requests "SEARCH RED SOX SCORES," the device 102 will regularly receive Red Sox scores from a sports content provider via transaction server 110. The user of device 102 may choose to share his locally stored yellow pages with users of other devices, and conversely, receive others' yellow pages. If the user knows the other person, this "social networking" offers a convenient means of receiving information from a trusted source. Social networking may be pairwise, or involve groups who provide permission to each other to share personal yellow pages. Users can augment the entries in their locally stored yellow pages with reviews, ratings, and personal comments relating to the listed businesses. Users can choose to share this additional information as part of their social networking options, see par. [0083-0084]).
It would have been obvious to combine the Kim invention with the teachings of Evermann for the benefit of improving the quality of the response to a user’s request, see par. [0027].
Although Kim teaches obtaining a different response based on context information , see par, [0083], Kim does not teach determining a second type of a second content to be included in a second response from the first response based on the contextual information, the second type of the second content being different from the first type of the first content, generating the second response including the second content having the second type.
IN the same field of endeavor Marsh teaches methods and apparatus for generating and delivering selected primary content and contextually-related, targeted secondary content to users of a network, see abstract. It is desirable to provide the same user with "secondary" content which is related to the "primary" content which the user selected in the first place. Myriad different reasons for providing such related secondary content exist, including to provide additional sources of information that the user can follow up on if interested, see par. [0003]. Marsh teaches  determining a second type of a second content to be included in a second response from the first response based on the contextual information, the second type of the second content being different from the first type of the first content (accessing, in response to the act of receiving, a metadata file associated with the primary content in order to obtain metadata therefrom; providing the metadata obtained from the file to a search entity for a search based at least partly on the metadata, the search producing the contextually related secondary content, see par. [0032]), generating the second response including the second content having the second type, wherein the second response is generated based on a predetermined preference ( contextual behavioral profiling system determines the user's monitor behavior and content preferences, The system is enabled to present a program sequence to the viewer based on the preference determination and stored programming., see par. [0025]); outputting the recombined response wherein the first type of the first content comprises a content type directly corresponding to the input from the user (the user invokes their client device 302 to request a download of the primary content form the distribution entity, see par. [0125]), and wherein the outputting the combined response includes processing the first content and the second content to be arranged on a display of the electronic device or to be output via a different output device of the electronic device (displaying the primary and secondary content using the client device,  the primary and secondary content are displayed in a common display element (e.g., window) in a substantially sequential and contiguous fashion, so as to avoid breaking or disrupting the viewer's attention. The primary content comprises, e.g., an audio-visual medium, and the secondary content comprises substantially textual advertising data, see par. [0032]).
It would have been obvious to one of ordinary skill to combine the Kim in view of Evermann invention with the teachings of Marsh in order to provide additional sources of information that the user can follow up on if interested, see par. [0003].
Regarding claim 2 Kim teaches the method of claim 1, wherein the first type of the first content comprises at least one of text, a moving picture, an image, or audio content, and wherein the second type of the second content comprises at least one of text, a moving picture, an image, audio content, a light-emitting diode (LED) output, a vibration output, a visual effect, an audible effect, or a user interface (the device 100 may display text indicating the obtained response and an image of the subject providing the response, see par. [0087]). 
Regarding claim 3 Kim teaches the method of claim 1, further comprising: in response to the user input, obtaining first data, wherein the first response is generated based on the first data, and wherein the second response is generated based on second data obtained by modifying the first data based on the contextual information (the device 100 may determine an image of a subject providing the response, from among a plurality of images. For example, the device 100 may determine an image of a subject providing the response, from among a plurality of images, by using context information, see par. [0095]). 
Regarding claim 4 Kim teaches the method of claim 1, further comprising: inputting the contextual information into a generative model to generate the second response ( The context information collector 1520 may transmit the obtained context information to the category determiner 1530, see par. [0299]; For example, the user modeling unit 1550 may determine a user model corresponding to a determined response, from among a plurality of user models, and transmit information about the determined user model to the avatar generator 1510, see par. [0293]). 
Regarding claim 5 Kim teaches The method of claim 4, further comprising: training the generative model based on training data, wherein the training data comprises information related to the user that was collected by the electronic device, and wherein the generative model is trained based on correlation between the training data and the second response (the user modeling unit 1550 may determine a user model corresponding to a determined response, from among a plurality of user models, and transmit information about the determined user model to the avatar generator 1510, see par. [0294]; the user modeling unit 1550 may include a history analyzer 1551 and a history collector 1552. According to an embodiment, the history collector 1552 may obtain a history of an inquiry of a user from information received from the voice intention analyzer 1540. Additionally, the history collector 1552 may obtain a history of a user input for evaluation, see par. [0307-0308]).
Regarding claim 6 Kim teaches the method of claim 1, wherein the generating of the second response comprises: 
obtaining feature information of the first content (the device 100 obtains an inquiry that the user input indicates, see par. [0182]; the device 100 obtains first context information, see par. [0183] and figure 10); 
and generating the second response comprising the second content based on the feature information and the contextual information (In operation S1040, according to an embodiment, the device 100 requests a response from the server 1000. According to an embodiment, the device 100 may request a response from the server 1000, based on the inquiry and the first context information, see par. [0185-0186]; the server 1000 may obtain a response in correspondence with the request received in operation S 1040 by using the second context information, and transmit the obtained response to the device 100, see par. [0187-0194]).
Regarding claim 7 Kim teaches the method of claim 1, wherein the contextual information comprises at least one feature corresponding to each context of the user, wherein the at least one feature of the contextual information is determined based on feedback information from the user with respect to the combined response, and wherein the contextual information is obtained based on the determined at least one feature (the display 1430 may display a screen for receiving a user input for evaluation. For example, the device 100 may display text and an image, and then, display a screen for receiving an input of a degree of satisfaction about the displayed text and the displayed image, see par. [0247]). 
Regarding claim 8 Kim teaches an electronic device (A device for displaying a response to an inquiry, see abstract) comprising: 
an input device configured to receive a user input from a user (the device 100 may receive a user input, see par. [0070]);
at least one processor (a processor configured to obtain an inquiry indicated by the received user input, see par. [0017]) configured to: 
in response to receiving the user input, generating a first response comprising first content having a first type based on the user input (the device 100 may obtain an inquiry that the received user input indicates. For example, the device 100 may obtain a request for certain information from a voice input received from the user, see par. [0072]),
obtain contextual information of the user ( the device 100 may obtain context information by using a sensor included in the device 100, see par. [0080]), 
generate a combined response based on the first response and the second response(after the device 100 obtains a first response in operation S340, the device 100 may obtain a second response in operation S370 by using the user input received in operation S360. For example, with respect to an inquiry asking, "How is the weather today?", the device 100 may obtain information about weather for tomorrow in operation S370, according to a user input for evaluation which is received in operation S360 and requests weather for tomorrow, see par. [0105]).
and an output device configured to output the combined response (the device 100 changes the image of the subject displayed based on the user input received in operation S360. For example, the device 100 may change the displayed image of the subject, based on a history of the user input received in operation S360, see par. [0106]). 
However Kim does not teach generating a first response comprising first content having a first type based on the user input that is input irrespective of contextual information of the user; generating a combined response based on the first response and the second response; and outputting the combined response; wherein the second response includes guide information related to the first response, and wherein the guide information is based on user profile information, preference information, and social network (SNS) activity information.
In the same filed of endeavor Evermann teaches generating a first response comprising first content having a first type based on the user input that is input irrespective of contextual information of the user (receiving an utterance from the user, the utterance corresponding to a command of either of the first type or the second type; using the speech recognition functionality to recognize the utterance; if the received utterance is a command of the first type, performing a corresponding command and control function, and if the received utterance is a command of the second type, generating a representation of a corresponding search request and then using the representation to request a search that is responsive to the search request see par. [0007]); generating a combined response based on the first response and the second response, (The user invokes the device's search functionality by uttering a search command, such as, for example "Directory Assistance." The device recognizes the command, and, for certain search commands, elicits further information from the user, see par. [0018]. Guided search commands 206 uses voice and text prompts to guide the user through a directed dialog to elicit the information required to fulfill his search for information, see par. [0026]. The search application then opens a wireless data connection to a transaction server, and sends it a representation of the user's spoken answers. The transaction server then forwards the user's information request, now in text form, to an appropriately selected content provider. The content provider searches for and retrieves the requested information, and sends its search results back to the transaction server. The transaction server then processes the search results and sends the results along with the user's search request and information about the user to one or more advertising providers. These providers offer advertisements back to the transaction server, which selects optimally targeted advertisements to combine with the search results. The transaction server then sends search results and advertisements to the mobile device. The device's voice-mediated search software displays the results to the user as text, graphics, and video and, optionally as audio output of synthesized speech, sounds, or music, see par. [0018]); wherein the second response includes guide information related to the first response (the ASR receives audio associated with a guided dialog in a "DIRECTORY ASSISTANCE" command followed by a "WHAT STATE?" prompt, it searches for matches in its database of state names, and after the prompt "WHAT CITY" it uses a database of city names in the identified state. This teaches that the second response includes information associated with the first response, see par. [0043]), and wherein the guide information is based on user profile information, preference information, and social network (SNS) activity information (the first data 1404 (metadata) may be determined based on profile information 1401 (Vi) about User C, which is used to provide guide information …, based on at least one of salary information, age information, or profession information included in profile information of User C. see par. [0210];  ASR Server 1 12 uses the side information it extracts from the received signal to categorize the mobile device user. The user categories include gender, an age range, accent, dialect, and the emotional state of the user, see par. [0054-0055]. device 102 also recognizes past patterns of user searching (user preferences) to pre-load data that it may need to fulfill a future search request. For example, if the user often requests "SEARCH RED SOX SCORES," the device 102 will regularly receive Red Sox scores from a sports content provider via transaction server 110. The user of device 102 may choose to share his locally stored yellow pages with users of other devices, and conversely, receive others' yellow pages. If the user knows the other person, this "social networking" offers a convenient means of receiving information from a trusted source. Social networking may be pairwise, or involve groups who provide permission to each other to share personal yellow pages. Users can augment the entries in their locally stored yellow pages with reviews, ratings, and personal comments relating to the listed businesses. Users can choose to share this additional information as part of their social networking options, see par. [0083-0084]).
Although Kim teaches obtaining a different response based on context information , see par, [0083], Kim does not teach determining a second type of a second content to be included in a second response from the first response based on the contextual information, the second type of the second content being different from the first type of the first content, generating the second response including the second content having the second type.
IN the same field of endeavor Marsh teaches methods and apparatus for generating and delivering selected primary content and contextually-related, targeted secondary content to users of a network, see abstract. It is desirable to provide the same user with "secondary" content which is related to the "primary" content which the user selected in the first place. Myriad different reasons for providing such related secondary content exist, including to provide additional sources of information that the user can follow up on if interested, see par. [0003]. Marsh teaches  determining a second type of a second content to be included in a second response from the first response based on the contextual information, the second type of the second content being different from the first type of the first content (accessing, in response to the act of receiving, a metadata file associated with the primary content in order to obtain metadata therefrom; providing the metadata obtained from the file to a search entity for a search based at least partly on the metadata, the search producing the contextually related secondary content, see par. [0032]), generating the second response including the second content having the second type, wherein the second response is generated based on a predetermined preference ( contextual behavioral profiling system determines the user's monitor behavior and content preferences, The system is enabled to present a program sequence to the viewer based on the preference determination and stored programming., see par. [0025]); outputting the recombined response wherein the first type of the first content comprises a content type directly corresponding to the input from the user (the user invokes their client device 302 to request a download of the primary content form the distribution entity, see par. [0125]), and wherein the outputting the combined response includes processing the first content and the second content to be arranged on a display of the electronic device or to be output via a different output device of the electronic device (displaying the primary and secondary content using the client device,  the primary and secondary content are displayed in a common display element (e.g., window) in a substantially sequential and contiguous fashion, so as to avoid breaking or disrupting the viewer's attention. The primary content comprises, e.g., an audio-visual medium, and the secondary content comprises substantially textual advertising data, see par. [0032]).
It would have been obvious to one of ordinary skill to combine the Kim in view of Evermann invention with the teachings of Marsh in order to provide additional sources of information that the user can follow up on if interested, see par. [0003].
Regarding claim 9 Kim teaches the electronic device of claim 8, wherein the first type of the first content comprises at least one of text, a moving picture, an image, or an audio, and wherein the second type of the second content comprises at least one of text, a moving picture, an image, an audio, a light-emitting diode (LED) output, a vibration output, a visual or audible effect, or a user interface (the device 100 may display text indicating the obtained response and an image of the subject providing the response, see par. [0087]).. 
Regarding claim 10 Kim teaches the electronic device of claim 8, wherein the first response is generated based on first data obtained based on the user input, and wherein the second response is generated based on second data obtained by modifying the first data based on the contextual information (the device 100 may determine an image of a subject providing the response, from among a plurality of images. For example, the device 100 may determine an image of a subject providing the response, from among a plurality of images, by using context information, see par. [0095]). 
Regarding claim 11 Kim teaches the electronic device of claim 8, wherein the second response is generated by inputting the contextual information to a generative model ( The context information collector 1520 may transmit the obtained context information to the category determiner 1530, see par. [0299]; For example, the user modeling unit 1550 may determine a user model corresponding to a determined response, from among a plurality of user models, and transmit information about the determined user model to the avatar generator 1510, see par. [0293]). 
Regarding claim 12 Kim teaches the electronic device of claim 11, wherein the at least one processor is further configured to obtain training data comprising information related to the user collected by the electronic device as information used to train the generative model, wherein the generative model is trained based on correlation between the training data and the second response (the user modeling unit 1550 may determine a user model corresponding to a determined response, from among a plurality of user models, and transmit information about the determined user model to the avatar generator 1510, see par. [0294]; the user modeling unit 1550 may include a history analyzer 1551 and a history collector 1552. According to an embodiment, the history collector 1552 may obtain a history of an inquiry of a user from information received from the voice intention analyzer 1540. Additionally, the history collector 1552 may obtain a history of a user input for evaluation, see par. [0307-0308]). 
Regarding claim 13 Kim teaches the electronic device of claim 8, wherein the at least one processor is further configured to: obtain feature information of the first content (the device 100 obtains an inquiry that the user input indicates, see par. [0182]; the device 100 obtains first context information, see par. [0183] and figure 10), 
and generate the second response comprising the second content based on the feature information and the contextual information (In operation S1040, according to an embodiment, the device 100 requests a response from the server 1000. According to an embodiment, the device 100 may request a response from the server 1000, based on the inquiry and the first context information, see par. [0185-0186]; the server 1000 may obtain a response in correspondence with the request received in operation S 1040 by using the second context information, and transmit the obtained response to the device 100, see par. [0187-0194]).
 
Regarding claim 14 Kim teaches the electronic device of claim 8, wherein the contextual information comprises at least one feature corresponding to each context of the user, and wherein the at least one feature is determined based on feedback information from the user on the combined response, and the contextual information is obtained based on the at least one feature (the display 1430 may display a screen for receiving a user input for evaluation. For example, the device 100 may display text and an image, and then, display a screen for receiving an input of a degree of satisfaction about the displayed text and the displayed image, see par. [0247]). 

Regarding claim 15 Kim teaches a  non-transitory computer-readable storage medium configured to store one or more computer programs including instructions that, when executed by at least one processor (non-transitory computer-readable recording medium having recorded thereon a computer program which, when executed by a processor, causes the processor to control to perform the method provided above, see par. [0016]), cause the at least one processor to: 
receive a user input from a user (the device 100 may receive a user input. A user input may be an input received from a user. For example, a user input may include at least one selected from the group consisting of a touch input, a keyboard input, a sound input, a button input, and a gesture input, see par. [0070]);
and in response to the user input, generate a first response comprising first content as a response corresponding to the user input (the device 100 may obtain an inquiry that the received user input indicates. For example, the device 100 may obtain a request for certain information from a voice input received from the user, see par. [0072]), 
obtain contextual information of the user( the device 100 may obtain context information by using a sensor included in the device 100, see par. [0080]), 
as a response corresponding to the user input, combine the first response with the second response (after the device 100 obtains a first response in operation S340, the device 100 may obtain a second response in operation S370 by using the user input received in operation S360. For example, with respect to an inquiry asking, "How is the weather today?", the device 100 may obtain information about weather for tomorrow in operation S370, according to a user input for evaluation which is received in operation S360 and requests weather for tomorrow, see par. [0105])., 
and output a combined response as a response to the user input (the device 100 changes the image of the subject displayed based on the user input received in operation S360. For example, the device 100 may change the displayed image of the subject, based on a history of the user input received in operation S360, see par. [0106]).
However Kim does not teach generating a first response comprising first content having a first type based on the user input that is input irrespective of contextual information of the user; generating a combined response based on the first response and the second response; and outputting the combined response; wherein the second response includes guide information related to the first response, and wherein the guide information is based on user profile information, preference information, and social network (SNS) activity information.
In the same filed of endeavor Evermann teaches generating a first response comprising first content having a first type based on the user input that is input irrespective of contextual information of the user (receiving an utterance from the user, the utterance corresponding to a command of either of the first type or the second type; using the speech recognition functionality to recognize the utterance; if the received utterance is a command of the first type, performing a corresponding command and control function, and if the received utterance is a command of the second type, generating a representation of a corresponding search request and then using the representation to request a search that is responsive to the search request see par. [0007]); generating a combined response based on the first response and the second response, (The user invokes the device's search functionality by uttering a search command, such as, for example "Directory Assistance." The device recognizes the command, and, for certain search commands, elicits further information from the user, see par. [0018]. Guided search commands 206 uses voice and text prompts to guide the user through a directed dialog to elicit the information required to fulfill his search for information, see par. [0026]. The search application then opens a wireless data connection to a transaction server, and sends it a representation of the user's spoken answers. The transaction server then forwards the user's information request, now in text form, to an appropriately selected content provider. The content provider searches for and retrieves the requested information, and sends its search results back to the transaction server. The transaction server then processes the search results and sends the results along with the user's search request and information about the user to one or more advertising providers. These providers offer advertisements back to the transaction server, which selects optimally targeted advertisements to combine with the search results. The transaction server then sends search results and advertisements to the mobile device. The device's voice-mediated search software displays the results to the user as text, graphics, and video and, optionally as audio output of synthesized speech, sounds, or music, see par. [0018]); wherein the second response includes guide information related to the first response (the ASR receives audio associated with a guided dialog in a "DIRECTORY ASSISTANCE" command followed by a "WHAT STATE?" prompt, it searches for matches in its database of state names, and after the prompt "WHAT CITY" it uses a database of city names in the identified state. This teaches that the second response includes information associated with the first response, see par. [0043]), and wherein the guide information is based on user profile information, preference information, and social network (SNS) activity information (the first data 1404 (metadata) may be determined based on profile information 1401 (Vi) about User C, which is used to provide guide information …, based on at least one of salary information, age information, or profession information included in profile information of User C. see par. [0210];  ASR Server 1 12 uses the side information it extracts from the received signal to categorize the mobile device user. The user categories include gender, an age range, accent, dialect, and the emotional state of the user, see par. [0054-0055]. device 102 also recognizes past patterns of user searching (user preferences) to pre-load data that it may need to fulfill a future search request. For example, if the user often requests "SEARCH RED SOX SCORES," the device 102 will regularly receive Red Sox scores from a sports content provider via transaction server 110. The user of device 102 may choose to share his locally stored yellow pages with users of other devices, and conversely, receive others' yellow pages. If the user knows the other person, this "social networking" offers a convenient means of receiving information from a trusted source. Social networking may be pairwise, or involve groups who provide permission to each other to share personal yellow pages. Users can augment the entries in their locally stored yellow pages with reviews, ratings, and personal comments relating to the listed businesses. Users can choose to share this additional information as part of their social networking options, see par. [0083-0084]).
Although Kim teaches obtaining a different response based on context information , see par, [0083], Kim does not teach determining a second type of a second content to be included in a second response from the first response based on the contextual information, the second type of the second content being different from the first type of the first content, generating the second response including the second content having the second type.
IN the same field of endeavor Marsh teaches methods and apparatus for generating and delivering selected primary content and contextually-related, targeted secondary content to users of a network, see abstract. It is desirable to provide the same user with "secondary" content which is related to the "primary" content which the user selected in the first place. Myriad different reasons for providing such related secondary content exist, including to provide additional sources of information that the user can follow up on if interested, see par. [0003]. Marsh teaches  determining a second type of a second content to be included in a second response from the first response based on the contextual information, the second type of the second content being different from the first type of the first content (accessing, in response to the act of receiving, a metadata file associated with the primary content in order to obtain metadata therefrom; providing the metadata obtained from the file to a search entity for a search based at least partly on the metadata, the search producing the contextually related secondary content, see par. [0032]), generating the second response including the second content having the second type, wherein the second response is generated based on a predetermined preference ( contextual behavioral profiling system determines the user's monitor behavior and content preferences, The system is enabled to present a program sequence to the viewer based on the preference determination and stored programming., see par. [0025]); outputting the recombined response wherein the first type of the first content comprises a content type directly corresponding to the input from the user (the user invokes their client device 302 to request a download of the primary content form the distribution entity, see par. [0125]), and wherein the outputting the combined response includes processing the first content and the second content to be arranged on a display of the electronic device or to be output via a different output device of the electronic device (displaying the primary and secondary content using the client device,  the primary and secondary content are displayed in a common display element (e.g., window) in a substantially sequential and contiguous fashion, so as to avoid breaking or disrupting the viewer's attention. The primary content comprises, e.g., an audio-visual medium, and the secondary content comprises substantially textual advertising data, see par. [0032]).
It would have been obvious to one of ordinary skill to combine the Kim in view of Evermann invention with the teachings of Marsh in order to provide additional sources of information that the user can follow up on if interested, see par. [0003].
Regarding claim 19 Evermann teaches the method of claim 1, wherein the outputting of the combined response comprises: when the combined response is a plurality of combined responses, determining priorities of respective combined responses based on the contextual information (The transaction server selects and prioritizes the received content by using the metadata and commerce information, such as special offers or time-sensitive opportunities, see par. [0040]); and outputting at least one of the plurality of combined responses based on the determined priorities as the combined response (the transaction server also has the option to send search results, the search query, metadata, and user history information to one or more advertising providers 116a, b, c over connection 134. The advertising providers return potential advertisements and pricing information back to the transaction server over connection 136. The transaction server selects an advertisement, combines it with the search results in an appropriate format, and transmits the results and advertisement over connection 138 to mobile device 102. VMSA 106 then receives the results and presents them to the user, see par. [0040]).  
Regarding claim 20 Evermann teaches the electronic device of claim 8, wherein the at least one processor is further configured to, when the combined response is a plurality of combined responses, determine priorities of respective combined responses based on the contextual information The transaction server selects and prioritizes the received content by using the metadata and commerce information, such as special offers or time-sensitive opportunities, see par. [0040]),  and output at least one of the plurality of combined responses based on the determined priorities as the combined response (the transaction server also has the option to send search results, the search query, metadata, and user history information to one or more advertising providers 116a, b, c over connection 134. The advertising providers return potential advertisements and pricing information back to the transaction server over connection 136. The transaction server selects an advertisement, combines it with the search results in an appropriate format, and transmits the results and advertisement over connection 138 to mobile device 102. VMSA 106 then receives the results and presents them to the user, see par. [0040]).  



Claim 16 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim U.S PAP 2016/0210023 A1, in view of Evermann WO 2008/083172 A2, in view of in view of Marsh U.S. AP 2007/0033531 A1, in view of Osotio2018/0101776 A1.
Regarding claim 16 Kim teaches the non-transitory computer-readable storage medium of claim 15, further comprising instructions to cause the at least one processor to: 
capture an image of the user (obtain facial expression, see par. [0203]); 
detect facial expression information of the user based on the image (the dressing table 1210 may obtain information about a skin state, a lip state, a facial expression, or the like of the user, see par. [0203]).
However Kim in view of Evermann in view of Marsh does not teach and determine an emotion based on the facial expression information, wherein the combined response is output based on the emotion.
In a similar field of endeavor Osotio teaches mechanisms to extract an emotional state from contextual user data and public use data collected from one or more devices and/or services. The contextual and public data are combined into an enriched data set. An emotional model, tailored to the user, extracts an emotional state form the enriched data set based on one or more machine learning techniques, see abstract. FIG. 1 illustrates an example architecture 100 of a system to extract an emotional state from various data sources. With the permission of a user, a wide variety of data is collected about a user by various devices and/or services 102 including facial recognition, see par. [0019]. The services, data sources, and/or devices 102 represented in FIG. 1 represent the wide variety of devices, systems and/or services that collect information about a user and that can be sources of data for extracting an emotional state, see par. [0020].
It would have been obvious to one of ordinary skill in the art to combine the Kim in view of Evermann in view of Marsh invention with the teachings of Osotio for the benefit of customizing interactions of the system with a given user, see par. [0001].
Regarding claim 18 Kim in view of Evermann in view of Marsh does not teach the non-transitory computer-readable storage medium of claim 15, wherein the instructions to generate the second response further comprise instructions to cause the at least one processor to: determine a user preference using on a convoluted neural network (CNN); and generate the second response based on a user preference included in the contextual information. 
In the same field of endeavor Osotio teaches mechanisms to extract an emotional state from contextual user data and public use data collected from one or more devices and/or services. The contextual and public data are combined into an enriched data set. An emotional model, tailored to the user, extracts an emotional state form the enriched data set based on one or more machine learning techniques, see abstract. FIG. 1 illustrates an example architecture 100 of a system to extract an emotional state from various data sources. With the permission of a user, a wide variety of data is collected about a user by various devices and/or services 102 including facial recognition, see par. [0019]. The services, data sources, and/or devices 102 represented in FIG. 1 represent the wide variety of devices, systems and/or services that collect information about a user and that can be sources of data for extracting an emotional state, see par. [0020]. The emotional state is extracted from the enriched data through a personalized emotional state model, created through one or more supervised or unsupervised machine learning processes, such as a support vector machine (SVM) technique, a convolutional neural network, a deep neural network, decision tree process, k nearest neighbor process, kernel density estimation, K-means clustering, expectation maximization, and so forth, see par. [0023].
It would have been obvious to one of ordinary skill in the art to combine the Kim in view of Evermann in view of Marsh invention with the teachings of Osotio for the benefit of customizing interactions of the system with a given user, see par. [0001].
Claim 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim U.S PAP 2016/0210023 A1, in view of Evermann WO 2008/083172 A2, in view of Marsh U.S. AP 2007/0033531 A1, further in view of Piernot 2018/0012596 A1.

Regarding claim 17 Kim teaches the non-transitory computer-readable storage medium of claim 15, wherein the instructions to generate the second response further comprise instructions to cause the at least one processor to: 
and when the second response cannot be generated using only the local resources, transmit the user input and the contextual information to a cloud server (request response s1040, see par. [0185])); and receive the second response from the cloud server (server transmits response to the device 1000, see par. [0191]). 
However Kim in view of Evermann in view of Marsh does not teach determine whether the second response can be generated using only local resources.
In the same field of endeavor Piernot teaches systems and processes for selectively processing and responding to a spoken user input. At block 312, a response to the spoken user input can be generated by the user device and/or a remote server. In some examples, generating a response to the spoken user input can include one or more of performing speech-to-text conversion, inferring user intent, and generating output responses to the user in an audible (e.g., speech) and/or visual form. For example, block 312 can include performing an operation requested by the user, providing information requested by the user, performing an action that causes a change in the physical environment, or the like. The operations can be performed locally on the user device, by transmitting data to a remote server for processing, or a combination thereof.
It would have been obvious to one of ordinary skill in the art to combine the Kim in view of Evermann in view of Marsh invention with the teachings of Piernot for the benefit of saving bandwidth by performing processing steps locally on a device. 
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Ortiz-Sanchez whose telephone number is (571)270-3711.  The examiner can normally be reached on Monday- Friday 9AM-6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MICHAEL ORTIZ-SANCHEZ/Primary Examiner, Art Unit 2656