DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements submitted on 08/05/2021, 10/25/2021,  11/10/2021 and 01/11/2022 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Response to Amendment
The Amendment filed 11/22/2021 has been entered. Claims 29-30 are added. Claims 1-30 are now pending in the application.

Response to Arguments
Applicant's arguments filed 11/22/2021 have been fully considered. Each of applicant’s remarks is set forth, followed by examiner’s response.
Regarding the rejection under 35 U.S.C. 103, in the Remarks, Applicant argues
(1) Neither Daniel nor Suplee, alone or in combination, disclose or suggest "caus[ing] a user intent corresponding to the first user input and the second user input to be determined," as required by claim 1. Claim 1 requires determining a user intent corresponding to both the first user input, which includes a media object, and the 
As to point (1), Examiner respectfully disagrees. In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of 
(2) Neither Daniel nor Suplee, alone or in combination, disclose or suggest "obtain[ing] a determination of whether the user intent requires extracting text from the media object," as required by claim 1. Applicant argues Daniel does not disclose a "first user input including a media object." Accordingly, Daniel would not determine that a user intent required extracting text from the media item. Suplee discloses triggering OCR either automatically upon receiving an image and/or upon an inference that text is present in an image. Neither automatic OCR nor OCR based on the presence of text is related to the user intent. Therefore, Suplee does not disclose or suggest "a determination of whether the user intent requires extracting text."
As to point (2), Examiner respectfully disagrees. In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). The combination of Daniel and Suplee discloses the system can determine an intent of the request from the user inputs including text or various multimedia content items as in the response of point (1) above. Further Daniel illustrates in Fig. 3B, detail elements are extracted from the user input 312a ([0059]-[0060]). Suplee discloses in Fig. 4, [0017] an obtained image is processed to locate at least one region having properties of a string of text or characters. The text string is analyzed using an optical character recognition algorithm to recognize text in the text string. A text pattern (e.g. an email, phone number, URL etc.) is identified that 
(3) Applicant further argues that Daniel is directed to a personal assistant service, where the personal assistant (agent) may be a person (i.e., a human individual with a client device) or may use artificial intelligence to "fulfill simple requests without the use of a person agent and assign more difficult requests to the person agent." (Id. at [0042], emphasis added.) Accordingly, even if Daniel did disclose a "first user input including a media object" (which Applicant does not concede), as Daniel clearly suggests the use of a human person for "more difficult requests," Daniel does not suggest determining whether text extraction is required-the human agent could simply read an image of text as text, thus eliminating any need for text extraction.
As to point (3), Examiner respectfully disagrees. Daniel also discloses in [0042] that artificial intelligence can aid a person agent in fulfilling requests by automatically determining an intent of a request, in addition to details associated with the request.
(4) Suplee discloses using "the type or pattern of the text"-that is, the text 
recognized from the image to determine "a function or application associated with the type of text." Accordingly, Suplee at most discloses first performing OCR, then determining a function based on the recognized text. In contrast, the claims require first determining a user intent, then "obtain[ing] a determination of whether the user intent requires extracting text from the media object." The order of operations suggested by Suplee is opposite to the order of operations required by the claims, as the 
As to point (4), Examiner respectfully disagrees. Suplee discloses in Fig. 4, [0017] an obtained image is processed to locate at least one region having properties of a string of text or characters. The text string is analyzed using an optical character recognition algorithm to recognize text in the text string. A text pattern (e.g. an email, phone number, URL etc.) is identified that corresponds to the recognized text. An application associated with the text pattern is determined and the recognized text is automatically extracted and provided to the application) provides the identified text to application associated with the identified text pattern.

Claims 1-30 remain rejected as below. No new references are cited.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4 and 25-30 are rejected under 35 U.S.C. 103 as being unpatentable over Daniel et al. (hereinafter Daniel), US 2017/0026318 A1, in view of Suplee, III et al. (hereinafter Suplee) , US 2013/0329023 A1.

Regarding independent claim 1, Daniel  teaches a non-transitory computer-readable storage medium storing one or more programs (Fig. 6, 604, 606), the one or more programs comprising instructions ([0127]), which when executed by one or more processors (Fig. 6, 602) of an electronic device (Fig. 6, 600; [0126]) with a display (Fig. 6, 608; [0130] “I/O interface 608 may include one or more devices for presenting output to a user, including, but not limited to, a graphics engine, a display (e.g., a display screen)”), cause the electronic device to ([0030] “the components can include computer instructions stored on a non-transitory computer-readable storage medium and executable by at least one processor of a computing device. When executed by the at least one processor, the computer-executable instructions can cause the computing device to perform the methods and processes described herein”; [0117]; [0119]):
display, on the display, a graphical user interface (GUI) having a plurality of previous messages between a user and a digital assistant, the plurality of previous messages presented in a conversational view (Fig. 3A; [0050] illustrates a messaging interface (i.e. GUI) including messages involving a user and an agent (i.e. digital assistant), the messaging interface 306 can include a messaging thread 310 (i.e. a conversational view) between the user and the agent, including a history of electronic messages exchanged between the user and the agent, the messaging thread 310 can populate with the instant messages in a chronological order of when the user/agent sent the messages);
receive a first user input ([0051] describes the messaging interface 306 receives a 1st message input from an account of the user);
in response to receiving the first user input, display the media object as a first message in the GUI (Fig. 3A, 312a; [0051] describes the messaging interface 306 displays the electronic message 312a received from the account of the user on the left side of the messaging interface 306);
receive a second user input including text (Fig. 3E, 312b; [0076] describes a second message received from the user including text indicating a time frame and a preferred budget);
in response to receiving the second user input, display the text as a second message in the GUI (Fig. 3E, 312b; Fig. 3E illustrates the second message displayed on the messaging interface 306);
cause a user intent corresponding to the first user input and the second user input to be determined ([0057] “after receiving the message from the user, the system 100 can determine an intent of the request in the message. In particular, the system 100 can analyze the text of the message to identify keywords, phrases, numbers, or other information that indicates the intent of the request, including specific details about the request. For example, FIG. 3B illustrates the agent user interface 302 in which the system 100 has analyzed the electronic message 312 a from the user and determined the intent of the request in the message”; [0076]-[0078] based on the information in the second message 312 b from the user, the system can determine the time range and an estimated budget for the reservation; Fig. 3E shows the determination of the intent of Book a restaurant reservation with all details derived from the first user input 312a and the second user input 312b);
obtain a determination of whether the user intent requires extracting text from the first user input ([0021]-[0022] the system can determine the user intent using natural language processing to identify words and/or phrases in the electronic message that indicate whether the user would like specific goods or services); and
in response to obtaining a determination that the user intent requires extracting text from the first user input:
extract text from the first user input ([0033] “the language processor 106 can use natural language processing techniques to parse text to identify words and phrases that indicate the type of request and purpose of the request”; [0057] “the system 100 can analyze the text of the message to identify keywords, phrases, numbers, or other information that indicates the intent of the request, including specific details about the request”; Fig. 3B, 320; [0059]-[0060] describes detail elements are extracted from the user input 312a);
perform, using the extracted text, a task in accordance with the user intent ([0033] “the language processor 106 can determine a broad category of a request, a narrow category of a request, specific details associated with a request, or other information associated with the request based on identified words and phrases in one or more messages.”; Fig. 3B; [0057] “the system 100 has analyzed the electronic message 312 b from the user and determined the intent of the request in the message … the system 100 can determine that the user wants to book a restaurant reservation”); and
the message 334b is a message from the agent including one or more options associated with one or more restaurants that meet the user's criteria available to the user displayed in the messaging thread 310).
Daniel does not explicitly disclose the first user input including a media object.
However, in the same field of endeavor, Suplee teaches a user input including a media object (Fig. 1A; [0013] showing an event flyer 106 is captured by a user using a portable computing device 202. The captured image (i.e. media object) is displayed on the screen 104. The event flyer 106 contains a physical address of the event location and, in this example, the user is seeking directions to that location. A string indicating the presence of a physical address is identified. A map application to be opened and directions to the address displayed; Fig. 2A; [0014]-[0015] showing a business card is captured by a user using a portable computing device 202. The captured image (i.e. media object) is displayed on the screen 204. Text in the captured image of the business card can be located and the type or pattern of the text can be identified such as an email address, phone number, URL etc. A function or application (e.g. calling a number, opening an internet browser, etc.) associated with the type of text can be determined (i.e. an intent); Fig. 4; [0017] describes image or image information (e.g., a video stream) is obtained 402. The obtained image information is processed to locate at least one region having properties of a string of text or characters 404. The text string is analyzed using an optical character recognition algorithm to recognize text in the text string 406. A text pattern is identified that corresponds to the recognized text 408. An application associated with the text pattern determined 410 and the recognized text is automatically provided to the application).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of a computing device capable of taking the image and processing it to recognize, identify, and/or isolate the text in order to forward the text to an application or function. The application or function can then utilize the text to perform an action in substantially real-time as suggested in Suplee into Daniel’s system because both of these systems are addressing the approaches to derive the intent of the user through received input. This modification would have been motivated by the desire to improve the user experience and save time (Suplee, [0001]).

Regarding dependent claim 2, the combination of Daniel and Suplee teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. Daniel further teaches wherein the one or more programs comprise further instructions, which when executed by the one or more processors of the electronic device, cause the electronic device to:
in accordance with the user intent, populate the extracted text into a text field of an application of the electronic device (Fig. 3B; Fig. 3E; [0057]; [0062]; [0066]-[0068] illustrates with the intent “Book a restaurant reservation”, the extracted text “3 people”, Hayes Valley, SF”, “Japanese”, “7:30 PM”, etc. are populated into third party elements 326 corresponding to a restaurant finder, a search engine, and a review site in the browser interface 308 from a third-party service).

Regarding dependent claim 3, the combination of Daniel and Suplee teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. Suplee further teaches wherein the user intent comprises creating, using the media object, a contact entry in a contacts application of the electronic device (Figs 2A-2B; [0014]-[0015] from  the captured business card image, an intent can be to save contact information to an address book application).

Regarding dependent claim 4, the combination of Daniel and Suplee teaches all the limitations as set forth in the rejection of claim 3 that is incorporated. Suplee further teaches wherein:
the media object is an image depicting contact information of an entity (Fig. 2A; [0014] illustrated the captured image of the business card displayed on the screen 204 depicting the contact information of John Doe);
the extracted text includes the contact information ([0014] the device 202, or service in communication with the device, locates text in a captured image of a business card, identifies the type or pattern of the text (e.g. an email address, phone number, URL etc.)); and
performing the task in accordance with the user intent further comprises populating a text field of the contact entry with the extracted text, the contact entry associated with the entity ([0014] “the device 202, or service in communication with the device, locates text in a captured image of a business card, identifies the type or pattern of the text (e.g. an email address, phone number, URL etc.), determines a function or 

Regarding dependent claim 25, the combination of Daniel and Suplee teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. Daniel further teaches wherein the one or more programs comprise further instructions, which when executed by the one or more processors of the electronic device, cause the electronic device to:
after displaying the media object as the first message and before receiving the second user input, display, as a sixth message in the GUI, a request for additional information regarding the media object (Fig. 3D, 334a; [0072] describes a message 334a is generated to the user to ask the user if the user has a time frame and budget in mind (i.e. additional information regarding the request input 312a)).

Regarding dependent claim 26, the combination of Daniel and Suplee teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. Daniel further teaches wherein causing the user intent to be determined comprises causing a domain among a plurality of domains of an ontology to be determined based on the first user input and the second user input ([0058] “if the user is requesting restaurant reservations, the agent user interface 302 can display a plurality of detail elements 320  [0059] “if the request has a determined intent to book a flight, the agent user interface 302 can provide detail elements 320 for the number of people, a departure date, a return date, approximate times, the departure location, the arrival location, seat information, luggage information, etc.”).

Regarding independent claim 27, it is a device claim that corresponding to the medium of claim 1. Therefore, it is rejected for the same reason as claim 1 above. 

Regarding independent claim 28, it is a method claim that corresponding to the medium of claim 1. Therefore, it is rejected for the same reason as claim 1 above. 

Regarding dependent claim 29, the combination of Daniel and Suplee teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. The combination of Daniel and Suplee teaches wherein the user intent corresponding to the first user input and the second input is determined based on the combination of the media object and the text (Daniel discloses the determination of the intent with all details derived from the first user input and the second user input (Figs. 3A-3E, [0057] and [0076]-[0078]). Suplee teaches a user input including a media object. Text in the captured image can be located and the type or pattern of the text can be identified. A function or application (e.g. calling a number, opening an internet browser, etc.) 

Regarding dependent claim 30, the combination of Daniel and Suplee teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. Suplee further teaches wherein obtaining the determination of whether the user intent requires extracting text from the media object is performed after the user intent is determined (Fig. 4; [0017] describes an obtained image is processed to locate at least one region having properties of a string of text or characters. The text string is analyzed using an optical character recognition algorithm to recognize text in the text string. A text pattern (e.g. an email, phone number, URL etc.) is identified that corresponds to the recognized text. An application associated with the text pattern is determined and the recognized text is automatically extracted and provided to the application).

Claims 5-6 are rejected under 35 U.S.C. 103 as being unpatentable over Daniel, in view of Suplee as applied in claim 1, further in view of Hazen et al. (hereinafter Hazen), US 20150278199 A1.

Regarding dependent claim 5, the combination of Daniel and Suplee teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. The combination of Daniel and Suplee teaches the user intent is determined using the media object and text is extracted from the media object (see the rejection of claim 1 set forth above). The combination of Daniel and Suplee does not explicitly disclose wherein the 
However, in the same field of endeavor, Hazen teaches the user intent comprises creating, using the extracted text, a calendar entry in a calendar application of the electronic device (Fig. 4; [0044]-[0047] describes a user intent of creating a calendar entry is determined using text extracted from a received natural language expression.[0044] “Method 400 begins at operation 402 where a natural language expression is received”; [0045] “When a natural language expression is received at the extraction module, flow proceeds to operation 404 where one or more slots in the text of the natural language expression that indicate a calendar event are identified using a first grammar module and a second grammar module. The one or more slots in the text may include at least one of a date, time, date/time, subject, location, duration, and availability query”; Fig. 2; [0040] illustrated a user intent to create a calendar entry in a calendar application from the extracted text “6:30 tomorrow”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of identifying a user intent of creating a calendar entry from extracted text from a received natural language expression as suggested in Hazen into Daniel and Suplee’s system because both of these systems are addressing the approaches to derive the intent of the user through received input. This modification would have been motivated by the desire to facilitate entering calendar events into electronic calendars using information received in various communication applications (Hazen, [0001]).

 dependent claim 6, the combination of Daniel, Suplee and Hazen teaches all the limitations as set forth in the rejection of claim 5 that is incorporated.
Suplee teaches the media object is an image depicting event information (Fig. 4, 402; [0017] “an image or image information (e.g., a video stream) is obtained 402”; Fig. 1A, 106; [0013] “FIG. 1 illustrates example situation 100 showing a user holding a portable computing device 102 above an event flyer 106 … upon obtaining an image and/or identifying one or more portions of the image having properties that indicate the presence of text, an application on the device 102 automatically runs an optical character recognizing (OCR) algorithm to recognize the imaged text of the flyer … Any identified strings are analyzed to further identify patterns that would indicate the presence of interested data objects or types”. The event flyer includes the date and time “Sat. 7pm”);
Hazen teaches 
the extracted text includes the event information ([0024] “When the extraction module 120 receives the one or more natural language expressions from application 110, the extraction module 120 detects at least one calendar event from text of the natural language expression. In one embodiment, the extraction module 120 may detect at least one calendar event from the text by identifying one or more slots in the text related to a calendar event. The one or more slots in the text may include at least one of a date, time, date/time, subject, location, duration, and availability query”; Fig. 2; [0040] “6:30 tomorrow night” indicating an event information); and
performing the task in accordance with the user intent further comprises populating a text field of the calendar entry with the extracted text (Fig. 2; [0041] “a .

Claims 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over Daniel, in view of Suplee as applied in claim 1, further Mikutel et al. (hereinafter Mikutel), US 2014/0358521 A1.

Regarding dependent claim 7, the combination of Daniel and Suplee teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. The combination of Daniel and Suplee does not explicitly disclose wherein the user intent comprises creating, using the media object, a reminder entry in a reminder application of the electronic device.
However, in the same field of endeavor, Mikutel teaches wherein the user intent comprises creating, using the media object, a reminder entry in a reminder application of the electronic device ([0027] describes content can be captured where the content can be voice, text or multimedia messages; Fig. 2; [0036] “A user can send a message via a SMS/MMS application 230 at a client 200 to a message server 210 such as provided by a host messaging server 240 (or servers). The message received by the host messaging server 240 can be parsed in a parser 245, and the parsed message 250 provided to a capture service 220 for additional processing. The capture service 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of identifying a user intent of creating a task list in a notebook application from a user input through a communication channel as suggested in Mikutel into Daniel and Suplee’s system because both of these systems are addressing the approaches to derive the intent of the user through received input. This modification would have been motivated by the desire for capturing content for a note through various communication channels (Mikutel, [0003]; [0023]).

 dependent claim 8, the combination of Daniel, Suplee and Mikutel teaches all the limitations as set forth in the rejection of claim 7 that is incorporated. Mikutel further teaches wherein:
the media object is an image depicting a reminder task (Figs. 4A-4C, 5A-5C; [0044] illustrates a grocery list input in communication channel which can be text, a photograph or other multi-media input);
the extracted text includes the reminder task ([0074] “The recognized entity “list” may indicate that the intent/purpose of at least some of the content in the message is to be a list and the existence of this recognized entity may result in a determination that the presentation form for the content includes a list”; [0075] “the recognized entity “todo” may further indicate that the intent/purpose of the content relates to tasks and, in some cases, the existence of this recognized entity may result in a determination that the presentation form for the content includes a presentation suitable for tasks. For example, a calendar or timing related arrangement may be presented and/or augmentations relating to scheduling may be included with the tasks”); and
performing the task in accordance with the user intent further comprises populating a text field of the reminder entry with the extracted text ([0075] “Having one or both of “list” and “todo” may include a presentation form that shows identified entities in a tabular manner with one entity (or string) for each line/row along with a checkbox or other enhanced functionality.”).

Claims 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Daniel, in view of Suplee as applied in claim 1, further in view of Cuthbert et al. (hereinafter Cuthbert), US 2015/0134323 A1.

Regarding dependent claim 9, the combination of Daniel and Suplee teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. The combination of Daniel and Suplee does not explicitly disclose wherein the user intent comprises translating text of a first language in the media object to text of a second language.
However, in the same field of endeavor, Cuthbert teaches the user intent comprises translating text of a first language in the media object to text of a second language (Fig. 2; [0067] “depicts screen shots 200A-200C of example user interfaces for capturing an image and presenting a language translation of text depicted by the image”; [0068] “In the screen shot 200B, a user interface 230 depicts an image 232 captured by the user device … The example image 232 includes several portions of text in Chinese characters that have been identified by the user device”; [0069] describes the Chinese text is translated to English; Fig. 4; [0078] “The screen shot 400A is similar to the screen shot 200B, and includes a user interface 430 that presents an image 432 that includes a first portion of text 434 located near the top left corner of the image 432, a second portion of text 336 located near the center of the image 432, and a third portion of text 438 located near the bottom right of the image 432 … a user interface 450 depicted in the screen shot 400B presents the image 432 and overlays 454-458 over the image 432 that each includes a translation of text depicted by the image 432. 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of presenting a language translation of text based on a presentation context associated with a captured  image as suggested in Cuthbert into Daniel and Suplee’s system because both of these systems are addressing the approaches to derive information through received input and perform a task based on user intent. This modification would have been motivated by the desire to use the method of language translation from a captured image in Cuthbert to assist user in Daniel and Suplee (Cuthbert, [0002]).

Regarding dependent claim 10, the combination of Daniel, Suplee and Cuthbert teaches all the limitations as set forth in the rejection of claim 9 that is incorporated.
Cuthbert further teaches wherein:
the media object is an image depicting the text of the first language (Fig. 2, 200B; [0068] depicts an image 232 captured by the user device. The example image 232 includes several portions of text in Chinese characters that have been identified by the user device; Fig. 4; [0078] depicts a captured image 432 including text portions 434, 436 and 438 in Chinese language);
the extracted text includes the text of the first language ([0068] the Chinese text portions 234, 236 and 238 are extracted; Fig. 2, 252; [0069] depicts the extracted text of Chinese language; [0078] text portions 434, 436 and 438 in Chinese language are identified);
performing the task in accordance with the user intent further comprises obtaining the text of the second language corresponding to the text of the first language (Fig. 2, 254; [0069] the Chinese text portions 234, 236 and 238 are translated to a second English language depicted in 254; [0078] the Chinese text portions 434, 436 and 438 are translated to a second English language depicted in 454, 456 and 458); and
the displayed response includes the text of the second language (Fig. 2, 254; [0069] illustrated the translated text in English; Fig. 4, 432; [0078] illustrated the translated text in English).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Daniel, in view of Suplee as applied in claim 1, further in view of GAD (US 2008/0039120 A1.

Regarding dependent claim 11, the combination of Daniel and Suplee teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. 
The combination of Daniel and Suplee teaches display, as a fourth message in the GUI, a response indicative of the user intent being satisfied, the response based on the information associated with the media object (Fig. 3F, 334b; [0080]-[0081] illustrates a message showing the result of performing the user intent).

in response to obtaining a determination that the user intent does not require extracting text from the media object, obtain a determination of whether the user intent requires performing image recognition on the media object; and
in response to obtaining a determination that the user intent requires performing image recognition on the media object:
cause image recognition on the media object to be performed;
obtain, based on the image recognition, information associated with the media object.
However, in the same field of endeavor, GAD teaches wherein the one or more programs comprise further instructions, which when executed by the one or more processors of the electronic device, cause the electronic device to ([0027]-[0028]; Fig. 3; [0045]):
in response to obtaining a determination that the user intent does not require extracting text from the media object (Fig. 6, 104->NO; [0063] “The method then continues at decision step 104, where it is determined if textual information is present on the image. If the determination at decision step 104 is negative, then control proceeds to step 112”), obtain a determination of whether the user intent requires performing image recognition on the media object (Fig. 6, 112; [0066] “The transmitted image is referenced against other image databases, e.g., one or more of the image database 40, point-of-interest service 44, and the other databases 46”); and

cause image recognition on the media object to be performed ([0066] “The transmitted image is referenced against other image databases, e.g., one or more of the image database 40, point-of-interest service 44, and the other databases 46”);
obtain, based on the image recognition, information associated with the media object (Fig. 6, 118; [0067] information associated with the image is obtained).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of evaluating a captured image to determine using OCR to process text in the image or using other databases to process image as suggested in GAD into Daniel and Suplee’s system because both of these systems are addressing the approaches to derive information through received input and perform a task based on user intent. This modification would have been motivated by the desire to use the method of image recognition in GAD to assist user in Daniel and Suplee.

Claims 12 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Daniel, in view of Suplee as applied in claim 11, in view of GAD, further in view of Yalniz et al. (hereinafter Yalniz), US 9,691,161 B1.

Regarding dependent claim 12, the combination of Daniel, Suplee and GAD teaches all the limitations as set forth in the rejection of claim 11 that is incorporated.
 does not explicitly disclose wherein the media object depicts a retail object, and wherein the information associated with the media object includes price information of the retail object.
However, in the same field of endeavor, Yalniz teaches wherein the media object depicts a retail object (Fig. 1A; Col 2, lines 26-56 describes an image 102 (i.e. media object) including a representation 104 of a woman wearing a dress (i.e. a retail object)), and wherein the information associated with the media object includes price information of the retail object (Fig. 1B, 156, 158 depicts the information associated with the interested dress including the price information).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of identifying an object of interest from a captured  image and presenting information about the recognized object as suggested in Yalniz into Daniel, Suplee and GAD’s system because both of these systems are addressing the approaches to derive information through received input and perform a task based on user intent. This modification would have been motivated by the desire to use the method of recognizing an object of interest from a captured image in Yalniz to assist user in Daniel, Suplee and GAD.

Regarding dependent claim 14, the combination of Daniel, Suplee and GAD teaches all the limitations as set forth in the rejection of claim 11 that is incorporated. 
The combination of Daniel, Suplee and GAD does not explicitly disclose wherein the media object depicts an entity, and wherein the information associated with the media object includes an identity of the entity.
describes an image 102 (i.e. media object) including a representation 104 depicting a dress (i.e. an entity)), and wherein the information associated with the media object includes an identity of the entity (Fig. 1B, 156, 158 depicts the information associated with the interested dress).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of identifying an object of interest from a captured  image and presenting information about the recognized object as suggested in Yalniz into Daniel, Suplee and GAD’s system because both of these systems are addressing the approaches to derive information through received input and perform a task based on user intent. This modification would have been motivated by the desire to use the method of recognizing an object of interest from a captured image in Yalniz to assist user in Daniel, Suplee and GAD.

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Daniel, in view of Suplee, in view of GAD as applied in claim 11, further in view of Kraft et al. (hereinafter Kraft), US 9,342,930 B1.

Regarding dependent claim 13, the combination of Daniel, Suplee and GAD teaches all the limitations as set forth in the rejection of claim 11 that is incorporated. 
The combination of Daniel, Suplee and GAD does not explicitly disclose wherein the media object depicts a location, and wherein the information associated with the media object includes an identity of the location.
 teaches wherein the media object depicts a location, and wherein the information associated with the media object includes an identity of the location (Fig. 4A; Col 13, lines 29-32 describes an image (i.e. media object) of a building 418 (i.e. a location) is captured; Fig. 4B; col 13, lines 35-41 a dialog box or informational overlay 438 may be displayed on the display element 408 identifying the Teipei 101 building, a Taiwanese national landmark).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of identifying a location from a captured  image and presenting information about the recognized location as suggested in Kraft into Daniel, Suplee and GAD’s system because both of these systems are addressing the approaches to derive information through received input and perform a task based on user intent. This modification would have been motivated by the desire to use the method of recognizing an location from a captured image in Kraft to assist user in Daniel, Suplee and GAD.

Claims 15-18 and 22-24 are rejected under 35 U.S.C. 103 as being unpatentable over Daniel, in view of Suplee, in view of GAD as applied in claim 11, further in view of Tran, US 20140280757 A1.

Regarding dependent claim 15, the combination of Daniel, Suplee and GAD teaches all the limitations as set forth in the rejection of claim 11 that is incorporated. 
The combination of Daniel, Suplee and GAD does not explicitly disclose

in response to obtaining a determination that the user intent does not require performing image recognition on the media object, obtain a determination of whether the user intent requires performing audio processing on the media object; and
in response to obtaining a determination that the user intent requires performing audio processing on the media object:
cause audio processing on the media object to be performed;
obtain, based on the audio processing, information associated with the media object; and
display, as a fifth message in the GUI, a response indicative of the user intent being satisfied, the response based on the information associated with the media object.
However, in the same field of endeavor, Tran teaches wherein the one or more programs comprise further instructions, which when executed by the one or more processors of the electronic device, cause the electronic device to:
in response to obtaining a determination that the user intent does not require performing image recognition on the media object, obtain a determination of whether the user intent requires performing audio processing on the media object (Fig. 2, 202; [0024] describes the server 104 to receive request from the user of the computing device 102. different types of input data request which may be accessed or utilized by the computing device 102 may include, but are not limited to, voice input, text input, location information coming from sensors or location-based systems, time information from clocks on client devices, automobile control systems, clicking and menu selection, or any other input; Fig. 4B; [0050] “The user in communication with the computing device 102 may provide a voice request such as “Remind me to pick up the milk tomorrow” such as shown at 406”); and
in response to obtaining a determination that the user intent requires performing audio processing on the media object ([0025] “the voice data described herein may be provided such as from mobile devices, mobile telephones, tablets, computers with microphones, Bluetooth headsets, automobile voice control systems, over the telephone system, recordings on answering services, audio voicemail on integrated messaging services, consumer applications with voice input such as clock radios, telephone station, home entertainment control systems, game consoles, or any other wireless communication application”):
cause audio processing on the media object to be performed (Fig. 2, 204; [0027] “at 204, the method 200 may allow the server 104 to determine semantics of the user request. The server 104 may be configured to interpret the user voice request such as to determine the semantics of the user request. In an embodiment, the server 104 may be configured to match the word, phrase, or syntax such as to identify at least one task, at least one domain, and at least one parameter of the user request such as shown at 206”; Fig. 4B; [0050] “The server 104 may recognizes the user request and calls the calendar API such as to respond to the user by performing operations or actions related to the user voice request”);

display, as a fifth message in the GUI, a response indicative of the user intent being satisfied, the response based on the information associated with the media object (Fig. 2, 212; [0034] “at 212, the method 200 may allow the server 104 provide response to the user … the server 104 may provide options or information as the response to the user request”; Fig. 4B; [0051] “the computing device 102 in communication with the server 104 may be configured to restate the operation or function for confirmation from the user. For example, the computing device 102 may present the user to confirm the reminder, which is scheduled on behalf of the user in accordance with the user voice request such as shown at 408.”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of identifying a user intent e.g. creating a reminder entry in a reminder application from a voice request as suggested in Tran into Daniel, Suplee and GAD’s system because both of these 

Regarding dependent claim 16, the combination of Daniel, Suplee, GAD and Tran teaches all the limitations as set forth in the rejection of claim 15 that is incorporated. Tran further teaches wherein causing audio processing on the media object to be performed further comprises:
causing speech-to-text recognition to be performed on the media object to obtain text corresponding to speech in the media object ([0038] “The server 104 may be configured to receive user voice request such as to provide assistance to the user. In an embodiment, the server 104 may be configured to include or implement layers such as a speech-to-text analyzer, a grammar analyzer, and a set of service providers. The speech-to-text analyzer described herein may be a piece of software that takes audio and turns it into text. The system may integrate with the speech-to-text and natural language understanding technology that may be constrained by a set of explicit models of domains, tasks, services, and dialogs”; [0039] “the computing device 102, in communication with the server 104 may be configured to use the speech-to-text analyzer such as to convert the user voice request into written text”).

Regarding dependent claim 17, the combination of Daniel, Suplee, GAD and Tran teaches all the limitations as set forth in the rejection of claim 16 that is incorporated. Tran further teaches wherein the information is obtained using the text 

Regarding dependent claim 18, the combination of Daniel, Suplee, GAD and Tran teaches all the limitations as set forth in the rejection of claim 16 that is incorporated. Tran further teaches wherein the text corresponding to the speech in the media object is stored in association with an application of the electronic device in accordance with the user intent (Fig. 4B; [0050]-[0051] illustrates a voice request such as “Remind me to pick up the milk tomorrow” such as shown at 406. The server 104 may recognizes the user request and calls the calendar API such as to respond to the user by performing operations or actions related to the user voice request. The computing device 102 in communication with the server 104 may be configured to schedule reminders on the calendar application of the computing device 102 via the calendar API. The text “pick up the milk tomorrow” is saved in the calendar application as shown in the confirmation 408).

Regarding dependent claim 22, the combination of Daniel, Suplee, GAD and Tran teaches all the limitations as set forth in the rejection of claim 15 that is incorporated. GAD further teaches wherein the second user input defines an attribute related to the media object, the attribute not explicitly indicated in the media object (Fig. describes the server interprets the image and locates the nearest point-of-interest of the selected type, i.e., the street sign 28, or several such points of interest in proximity to the pedestrian's location. The pedestrian 12 may select one of the points of interest (i.e. the user input indicating that the user want to save the points of interest) using an interface offered by the wireless device 14), and wherein the one or more programs comprise further instructions, which when executed by the one or more processors of the electronic device, cause the electronic device to:
in response to obtaining a determination that the user intent does not require performing audio processing on the media object, store data that associates the attribute to the media object ([0033] “Some wireless networks may have facilities for approximating the location of a wireless device. For example, it may be known in what city or telephone area code the pedestrian 12 is located simply by identifying the location of a receiving element 32 in the network 18 that was contacted by the wireless device 14. Such information can be exploited by the map server 16 and may enable the exclusion of many candidate points of interest. Once its processing has been completed, the map server 16 stores the location of the point-of-interest, i.e., the street sign 28, and hence the drugstore 26” ).

 dependent claim 23, the combination of Daniel, Suplee, GAD and Tran teaches all the limitations as set forth in the rejection of claim 22 that is incorporated. GAD further teaches wherein the attribute describes a relationship between the user and the media object [0033] describes the server interprets the image and locates the nearest point-of-interest of the selected type, i.e., the street sign 28, or several such points of interest in proximity to the pedestrian's location. The pedestrian 12 may select one of the points of interest (i.e. the user input indicating that the user want to save the points of interest) using an interface offered by the wireless device 14).

Regarding dependent claim 24, the combination of Daniel, Suplee, GAD and Tran teaches all the limitations as set forth in the rejection of claim 22 that is incorporated. GAD further teaches wherein the one or more programs comprise further instructions, which when executed by the one or more processors of the electronic device, cause the electronic device to:
store, based on the attribute, the media object in association with an application of the electronic device ([0033] “the map server 16 stores the location of the point-of-interest, i.e., the street sign 28, and hence the drugstore 26”).

Claims 19-21 are rejected under 35 U.S.C. 103 as being unpatentable over Daniel, in view of Suplee, in view of GAD, in view of Tran as applied in claim 15, further in view of Swierczek, US 2006/0004640 A1.

 dependent claim 19, the combination of Daniel, Suplee, GAD and Tran teaches all the limitations as set forth in the rejection of claim 15 that is incorporated. The combination of Daniel, Suplee, GAD and Tran does not explicitly disclose  wherein causing audio processing on the media object to be performed further comprises:
causing audio recognition to be performed using the media object to obtain text identifying the media object.
However, in the same field of endeavor, Swierczek teaches causing audio processing on the media object to be performed (Fig. 4, 50; [0023] a segment of music has been recorded 50 (i.e. media object). The music segment is used to search in an automated music identification database 16 (Fig. 3, 16)) further comprises:
causing audio recognition to be performed using the media object to obtain text identifying the media object (Fig. 4, 54; [0023] “The automated database 16 uses a central processing unit and search stored information as known in the art to analyze the music segment and compare it to stored works until a match, matches or near matches are found and the music segment is identified 54 … Once the music segment is identified 54, the information related to the song, i.e. title, artist, etc., could be supplied to the customer 56 directly or entered into the automated database where the information, and any specified related information is supplied to the customer 56”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of allowing the user to submit a recorded music segment, process the segment and identify the song from the segment as suggested in Swierczek into Daniel, Suplee, GAD and Tran’s 

Regarding dependent claim 20, the combination of Daniel, Suplee, GAD, Tran and  Swierczek teaches all the limitations as set forth in the rejection of claim 19 that is incorporated. Swierczek further teaches wherein the information is obtained using the text identifying the media object ([0023] “Once the music segment is identified 54, the information related to the song, i.e. title, artist, etc., could be supplied to the customer 56 directly or entered into the automated database where the information, and any specified related information is supplied to the customer 56”).

Regarding dependent claim 21, the combination of Daniel, Suplee, GAD, Tran and  Swierczek teaches all the limitations as set forth in the rejection of claim 19 that is incorporated. Swierczek further teaches wherein the one or more programs comprise further instructions, which when executed by the one or more processors of the electronic device, cause the electronic device to:
in response to detecting a user selection of the fifth message in the GUI, cause retail information related to the media object to be displayed (Fig. 4, 60; [0025] “the automated database 16 may also provide the cost and/or location of the identified or selected music for purchase 60”).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMY P HOANG whose telephone number is (469)295-9134. The examiner can normally be reached M-TH 8:30-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JENNIFER WELCH can be reached on 571-272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.






/AMY P HOANG/Examiner, Art Unit 2143                                                                                                                                                                                                        
/JENNIFER N WELCH/Supervisory Patent Examiner, Art Unit 2143