Detailed Action
NOTICE OF PRE-AIA  OR AIA  STATUS
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
RESPONSE TO AMENDMENT
This Final Office action is responsive to the communication filed under 37 C.F.R. § 1.111 on October 15, 2021 (hereafter “Response”). The amendments to the claims are acknowledged and have been entered.
Claims 1, 8, and 12 are now amended.
Claim 10 is now canceled.
Claims 1–9, 11, 12, 14, and 15 are pending in the application. 
Rejections Withdrawn
The rejection of claims 1–11 under 35 U.S.C. § 112(a) or 35 U.S.C. § 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement, is hereby withdrawn responsive to the Applicant’s amendment to claim 1 removing the new matter identified by the rejection.
The rejection of claim 8 under 35 U.S.C. § 112(b) for indefiniteness is hereby withdrawn responsive to the Applicant’s amendment to claim 8 resolving its indefinite scope.
The rejection of claims 1–8, 10–12, and 15 under 35 U.S.C. § 102(a)(1) as being anticipated by U.S. Patent Application Publication No. 2011/0282906 A1 (“Wong”) is hereby withdrawn, responsive to the Applicant’s amendments to independent claims 1 and 12 narrowing the scope of the main query in each of those claims to further require text that was converted from a user’s voice input.
Since the foregoing rejection is withdrawn, so too are the rejections under 35 U.S.C. § 103 of claims 9 and 14, as those rejections rely in part on Wong teaching all of the elements in their respective parent claims.
New Grounds of Rejection
The change in scope to claim 1 also changes the scope of its dependent claim 3 by virtue of inheritance, and this change in scope necessitates the new ground of rejection under 35 U.S.C. § 112(a), the details of which are set forth in the rejection below.
The change in scope to claim 1 further necessitates a new ground of rejection for claim 2 under 35 U.S.C. § 112(d) or pre-AIA  35 U.S.C. § 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends. Again, the details of this rejection are set forth herein.
The change in scope to claims 1 and 12 further necessitates a new ground of rejection for claims 1–8 and 11 under 35 U.S.C. § 103 as being unpatentable over U.S. Patent Application Publication No. 2015/0339098 A1 (hereafter “Lee”) in view of Wong. Similarly, claim 9 is rejected under 35 U.S.C. § 103 as being unpatentable over Lee in view of Wong as applied to claim 1, and further in view of U.S. Patent Application Publication No. 2016/0328270 A1 (hereafter “Bikkula”), claims 12 and 15 are rejected under 35 U.S.C. § 103 as being unpatentable over Wong in view of Lee, and claim 14 is rejected under 35 U.S.C. § 103 as being unpatentable over Wong in view of Lee as applied to claim 12 above, and further in view of U.S. Patent Application Publication No. 2020/0160124 A1 (“Fu”).
The Applicant’s arguments concerning Wong’s failure to disclose the newly added limitations are no longer relevant, since each ground of rejection is based at least in part on the newly cited Lee reference teaching those limitations.
Accordingly, as each claim stands rejected, the Applicant’s request for a notice of allowance (Response 11) is respectfully denied.
CLAIM OBJECTIONS
Claim 9 is objected to for having the following informality. 
The written description fails to disclose instructions that cause the electronic device, after receiving the second response, to have at least part of the sequence of states, using at least part of the second text, as set forth in claim 9.
results of those states. According to the written description, the sequence of states are essentially a set of steps that “allow[] the electronic device to perform the task requested by the user.” (Spec. ¶ 35). The specification does not disclose instructions that cause the electronic device to have the sequence of states “after receiving the second response,” but rather, after receiving the first response. (Spec ¶ 121) (“the first response may include a path rule including a sequence of states of the user terminal 610 and a parameter for executing an action for having the states.”).
Rather, as FIGS. 12, 13, 15, and 16 clearly show, after receiving the second response at 9-6 (a.k.a. 6-4 in FIG. 13), the instructions (i.e. step 11) cause the electronic device to have the results of the sequence of states (recall that phrase “path rule” in step 11 is synonymous with “sequence of states, per the definition at paragraph 35).
Claim 9 should therefore be amended to more accurately recite what is disclosed in the specification:
	9.  The electronic device of claim 1, wherein the first response further includes a sequence of states of the electronic device for performing the task, and 
	wherein the instructions cause the processor to:
	after receiving the second response, cause the electronic device to have results of at least part of the sequence of states, using at least part of the second text.
Appropriate correction is required.
CLAIM REJECTIONS – 35 U.S.C. § 112(A)
The following is a quotation of 35 U.S.C. § 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most 
 The following is a quotation of 35 U.S.C. § 112 (pre-AIA ), first paragraph:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claim 3 is rejected under 35 U.S.C. § 112(a) or 35 U.S.C. § 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for pre-AIA  the inventor(s), at the time the application was filed, had possession of the claimed invention.
There is at least one embodiment of claim 3 that is not disclosed by the written description. In this embodiment, reading claim 3 together with the limitations it inherits from claim 1, the electronic device transmits two images to the second external server: first, the electronic device “generates information about a region including the at least one object in the image . . . by analyzing the image through the second external server” (claim 3), and then later, it “transmit[s] the region image extracted from the image and the first text to [the] second external server.” 
The specification never discloses such an arrangement. As shown in FIGS. 12, 13, 15, and 16, only one image is ever sent to the second external server (disclosed as “vision server 630”). In the FIG. 13 embodiment, the electronic device transmits the separated region to vision server 630 at step 6-3 and receives search results in response. And, in all of the embodiments where the vision server 630 detects the region including the at least one object (FIGS. 12, 15, and 16), the vision server simply performs the search using the detected region of interest (steps 9-2 and 9-4 in all three figures) without transmitting the region of interest back to the electronic device. In contrast, claim 3 (when read together with its limitations from claim 1) requires the electronic device to first query the second external server for the region of interest in the image, and then, send the region of interest back to second external server to perform the final object search.
Accordingly, since the specification fails to disclose this arrangement, claim 3 is rejected for reciting new matter.
CLAIM REJECTIONS – 35 U.S.C. § 112(D)
The following is a quotation of 35 U.S.C. § 112(d) :
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.
The following is a quotation of pre-AIA  35 U.S.C. § 112 , fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. § 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.
Claim 2 rejected under 35 U.S.C. § 112(d) or pre-AIA  35 U.S.C. § 112, 4th paragraph , as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends. The claim provides for the image of claim 1 to be an image “in which a region including the at least one object is separated,” but claim 1 is already limited in this way, because claim 1 provides for the electronic device to “extract a region image of at least one object associated with the third text from the image.” Once this happens, the image of claim 2 is necessarily “an in which a region .
Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements
CLAIM REJECTIONS – 35 U.S.C. § 103
The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned at the time any inventions covered therein were effectively filed absent any evidence to the contrary. Applicant is advised of the obligation under 37 C.F.R. § 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned at the time a later invention was effectively filed in order for the examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.
I.	LEE AND WONG TEACH CLAIMS 1–8 AND 11.
Claims 1–8 and 11 are rejected under 35 U.S.C. § 103 as being unpatentable over U.S. Patent Application Publication No. 2015/0339098 A1 (hereafter “Lee”) in 
Claim 1
Lee teaches: 
An electronic device comprising: a housing;
“FIG. 1 is a block diagram of a display apparatus according to an exemplary embodiment . . . . Herein, the display apparatus 100 may be implemented in various electronic devices, such as TVs, electronic boards, electronic tables, Large form Displays (LFDs), smart phones, tablet PCs, desktop PCs, and laptops.” Lee ¶ 55. It is understood that at least some of the foregoing examples of apparatus 100—particularly the TV, smart phone, tablet PC, and laptop embodiments—each describe or at least suggest an all-in-one device in which all of the components are housed within a single structure.
a speaker positioned at a first portion of the housing;
Though Lee does not explicitly disclose a speaker per se, Lee at least teaches that the apparatus 100 is capable of outputting “an audio signal.” Lee ¶ 79. Moreover, as will be shown below, an apparatus comprising a speaker, microphone, and touch screen integrated as one device was known and obvious to combine prior to the effective filing date of the claimed invention.
a microphone positioned at a second portion of the housing;
The apparatus 100 further includes a “recognizer 120,” which “may include an input device configured to receive an input of the user voice . . . when a microphone is included therein.” Lee ¶ 58.
a touch screen display positioned at a third portion of the housing;
“Referring to FIG. 1, a display apparatus 100 includes a display 110.” Lee ¶ 55. Lee does not explicitly say whether the display 110 may include a touch screen, but Lee’s disclosure that the apparatus may be embodied as “electronic boards, See Lee ¶ 55. Moreover, as will be discussed below, Wong explicitly teaches a single electronic device comprising all of the claimed hardware components, including the touch screen.
a communication circuit positioned inside the housing or attached to the housing;
The apparatus 100 further includes a “communicator 140,” Lee ¶ 55, which “may be implemented as a hardware component.” Lee ¶ 57. Thus, as shown in FIG. 3, “display apparatus 100 may include an interactive client module which,” due in part to the communication hardware, “is interoperable with [a] voice recognition apparatus 310 and [a] server apparatus 300.” Lee ¶ 88.
a processor positioned inside the housing and operatively connected to the speaker, the microphone, the display, and the communication circuit;
As shown in FIG. 1, the apparatus 100 further includes a processor 130 that is connected to the display 110, recognizer 120, and communicator 140. Lee FIG. 1.
and a memory positioned inside the housing and operatively connected to the processor, wherein the memory stores instructions that, when executed, cause the processor to: 
Processor 130 is responsible for controlling the device, see Lee ¶ 64, and in order to do so, Lee contemplates configuring processor 130 with computer program instructions stored in a memory that are to be executed by the processor. See Lee ¶¶ 199–203. 
display an image including at least one object on the display;
“Referring to FIG. 2, the display 200 of the display apparatus 100 may display a plurality of items, and the indicator 210 is marked on one item.” Lee ¶ 79.
receive a voice input from the microphone, wherein the voice input includes a request for performing a task associated with at least one object on the image;

transmit the voice input to a first external server via the communication circuit;
“The processor 130 may implement the interactive client module when the user voice is recognized through the recognizer 120, and perform a control operation corresponding to the voice input. Specifically, the processor 130 may transmit the user voice to the voice recognition apparatus 310.” Lee ¶ 88.
receive a first response from the first external server via the communication circuit, wherein the first response includes a first text extracted from text data converted from the voice input and corresponding to the at least one object and a third text associated with a category of the at least one object;
In response, “the voice recognition apparatus 310 may convert the voice command into texts and provide the texts to the display apparatus 100 in response to input of a voice command spoken by a user.” Lee ¶ 91. “Thus, in response to receiving texts corresponding to ‘What are the other dramas or movies in which this drama's main actor acts?’, the processor 130 may analyze the information regarding the drama in order to perform a function corresponding to the texts and extract keywords related to the main actor.” Lee ¶ 92.
extract 
Processor 130 also extracts keywords related to the content in the selected item. Lee ¶ 68. For example, “in response to a motion of an indicator selecting one object among the displayed video images and in response to a user speaking a voice command stating that ‘What is this?’, the processor 130 may extract keywords related to the selected object by analyzing the selected movie video.” Lee ¶ 71.
transmit the data] extracted from the image and the first text to a second external server via the communication circuit;
and the information regarding the content to the server apparatus 320.” Lee ¶ 94 (emphasis added).
receive a second response from the second external server via the communication circuit, wherein the second response includes a second text associated with performing at least part of the task;
In response, the server apparatus 320 uses the transmitted text input and information regarding the content to search an internal database or other server apparatus, Lee ¶ 95, and subsequently “feedback the search results to the display apparatus 100.” Lee ¶ 96.
and provide at least part of the second text via the display or the speaker.
The processor 130 “receive[s] search results corresponding to the texts from the server apparatus and display[s] the same.” Lee ¶ 97.
Lee does not explicitly say what type of information it extracts from the content in order to query the server apparatus 320, and therefore, does not explicitly disclose extracting and transmitting the “region image” from the source image.
Wong, however, teaches:
An electronic device comprising: a housing; 
As shown in FIG. 2, Wong provides for a media device 200, Wong ¶ 37, which, as shown in FIG. 3, may be embodied as any one of the illustrated unitary user equipment devices 302–306, and therefore, all of its components are understood to be contained within (or attached to) the housing. See Wong ¶ 43 (describing exemplary embodiments of media device 200 to include “a laptop, a tablet, a personal computer television (PC/TV), a PC media server, a PC media center . . . a personal digital assistant (PDA), a mobile telephone, a smartphone, a portable video player, a portable music player, [or] a portable gaming machine”). 
a speaker positioned at a first portion of the housing;

a microphone positioned at a second portion of the housing; 
“A user may control the control circuitry 204 using user input interface 210,” which may include a “voice recognition interface.” Wong ¶ 41. Such a voice recognition interface may include or be embodied as a microphone. See Wong ¶ 28.
a touch screen display positioned at a third portion of the housing;
“Controller 104 may include a user input interface, such as a . . . touch screen.” Wong ¶ 26; see also Wong ¶ 28 (describing the act of “touching a touch-sensitive display screen on media device 102”). 
a communication circuit positioned inside the housing or attached to the housing; 
“I/O path 202 may connect control circuitry 204 (and specifically processing circuitry 206) to one or more communications paths (described below).” Wong ¶ 37.
a processor positioned inside the housing and operatively connected to the speaker, the microphone, the display, and the communication circuit;
As mentioned earlier, each of the foregoing components of the media device 200 are connected to the media device’s processing circuitry 206. See Wong ¶¶ 28, 37, and 41. “Control circuitry 204 may be based on any suitable processing circuitry 206 such as processing circuitry based on one or more microprocessors, microcontrollers, digital signal processors, programmable logic devices, etc.” Wong 
and a memory positioned inside the housing and operatively connected to the processor, wherein the memory stores instructions 
“In some embodiments, control circuitry 204 executes instructions for media content stored in memory (e.g., storage 208).” Wong ¶ 38.
that, when executed, cause the processor to:

Putting it all together, Wong discloses a single media device 200 that performs all of the functionality disclosed for media device 102 and controller 104. Said functionality will now be discussed with respect to the claimed invention.
display an image including at least one object on the display; 
“A user viewing media content on media device 102 may wish to perform a search based on the media content being viewed.” Wong ¶ 27.
receive a voice input from the microphone, wherein the voice input includes a request for performing a task associated with at least one object on the image; 
“To initiate the search, the user captures a snapshot image of the on-screen media content and uses the captured snapshot image as a search entry. The user captures a snapshot image that contains the items or features of the media content that the user wishes to use for the search.” Wong ¶ 28. “The user may capture the snapshot image using controller 104 by . . . speaking a command to a voice recognition interface (e.g., a microphone) on controller 104.” Wong ¶ 28.
extract a region image of at least one object associated with the third text from the image; 
The search query further includes “search images,” see Wong ¶ 33, which are obtained by cropping the “search images” from the captured image. Wong ¶¶ 64 and 139. Importantly, while some embodiments of Wong’s disclosure call for the processor 106 to crop the search images, the claimed invention reads on an embodiment of Wong’s disclosure wherein “the processor allows the user to manually select the targeted features. For example, the processor may present the snapshot image to the user and allow the user to select features of the snapshot image to target.” Wong ¶ 138. For this reason, there is at least one embodiment of Wong’s disclosure in which media device 102 and/or controller 104—either of which being mapped to the overall electronic device claimed for claim 1—is the entity configured to perform the claimed instruction of extracting the region. 
transmit the region image extracted from the image and the first text to a second external server via the communication circuit; 
“The search query created by processor 106 is sent to search engine 108.” Wong ¶ 34.
receive a second response from the second external server via the communication circuit, 
Search engine 108—illustrated as search engine 500 in FIG. 5, see Wong ¶ 65—gathers search results using search results assembly 514, and then, “[t]he search results gathered by search results assembly 514 are collated and sent to the user. The search results may be sent to a media device (e.g., media device 102 of FIG. 1) to be displayed. The search results may be sent to a controller (e.g. controller 104 of FIG. 1) of a media device to be displayed.” Wong ¶ 79 (referring to FIG. 5).
wherein the second response includes a second text associated with performing at least part of the task; 
The search results include text descriptors that sufficiently match the query. See Wong ¶ 73; see also Wong FIG. 6C and ¶¶ 88 and 91–92 (illustrating and describing the results that may be sent to the user, which include text results).
and provide at least part of the second text via the display or the speaker.
“The search results may then be presented to the user on media device 102.” Wong ¶ 35.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to substitute Lee’s server apparatus 320 with Wong’s search engine 108, or perhaps to simply improve server apparatus 320 
Claim 2
Lee, as combined with Wong, teaches the electronic device of claim 1, 
wherein the image is an image in which a region including the at least one object is separated.
The “search images” may be cropped from the captured image. Wong ¶¶ 64 and 139.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to substitute Lee’s server apparatus 320 with Wong’s search engine 108, or perhaps to simply improve server apparatus 320 by supplementing its functionality with the functionality of Wong’s search engine 108. In this combination, Lee’s apparatus 100 would transmit a region extracted from the content to server apparatus 320 for the search query, rather than merely using information about the content as the query. One would have been motivated to improve Lee with Wong because Wong explicitly identified that “it would be desirable to provide an image-based search system that is capable of performing an efficient search based on an image captured from media content.” Wong ¶ 3.
Claim 3
Lee, as combined with Wong, teaches the electronic device of claim 1, wherein the instructions cause the processor to:
generate information about a region including the at least one object in the image by directly analyzing the 
Processor 130 extracts keywords related to the content in the selected item. Lee ¶ 68. For example, “in response to a motion of an indicator selecting one object among the displayed video images and in response to a user speaking a voice command stating that ‘What is this?’, the processor 130 may extract keywords related to the selected object by analyzing the selected movie video.” Lee ¶ 71.
Processor 130 does not explicitly “separate a region including the at least one object in the image, using the generated information.”
Wong, however, teaches instructions that cause a processor to:
generate information about a region including the at least one object in the image by directly analyzing the image in the electronic device or by analyzing the image through the second external server; 
“In some embodiments, the processor allows the user to manually select the targeted features. For example, the processor may present the snapshot image to the user and allow the user to select features of the snapshot image to target. The processor may indicate all recognizable features and allow the user to select features to target from among the recognizable features, or the processor may present an unmodified copy of the snapshot image and allow the user to create targets for any features.” Wong ¶ 138.
and separate a region including the at least one object in the image, using the generated information.
“In some embodiments, the processor isolates each targeted feature from the snapshot image and creates a single cropped search image for each targeted feature in the snapshot image.” Wong ¶ 139.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to substitute Lee’s server apparatus 320 with Wong’s search engine 108, or perhaps to simply improve server apparatus 320 by supplementing its functionality with the functionality of Wong’s search engine 108. In this combination, Lee’s apparatus 100 would transmit a region extracted from 
Claim 4
Lee, as combined with Wong, teaches the electronic device of claim 1, 
wherein the task further includes obtaining information associated with the at least one object included in the image.
The server apparatus 320 uses the transmitted text input and information regarding the content to search an internal database or other server apparatus, Lee ¶ 95, and subsequently “feedback the search results to the display apparatus 100.” Lee ¶ 96.
Although unnecessary to reach a conclusion of obviousness, the Examiner observes that Wong provides an overlapping teaching: “A user viewing media content on media device 102 may wish to perform a search based on the media content being viewed. For example, a user may want to find the name of a character in a scene, identify the media content being viewed, identify an unknown source of a video, locate retail locations and prices for an item in a commercial or movie, access more episodes of a certain television series, or perform any other suitable search based on the media content.” Wong ¶ 27. It should be understood that the claimed “task” corresponds to a search of the foregoing, which is performed in response to a voice command as explained in the rejection of claim 1. See Wong ¶ 28.
Claim 5
Lee, as combined with Wong, teaches the electronic device of claim 1, 
wherein the first text further includes information indicating the at least one object.

Although unnecessary to reach a conclusion of obviousness, the Examiner observes that Wong provides an overlapping teaching: “The text descriptors may indicate the types of features recognized in the snapshot image.” Wong ¶ 33.
Claim 6
Lee, as combined with Wong, teaches the electronic device of claim 1, 
wherein the second text further includes at least one of model information, function information, price information, manufacturer information, or seller information of a corresponding product when the at least one object is a product.
Lee at least teaches the claimed “seller information of a corresponding product when the at least one object is a product.” Specifically, “when the displayed video is related to travel to Spain, and one restaurant in Spain is displayed in the video images, the processor 130 may analyze the video images to execute the user voice command stating that ‘What is this?’ and extract the name of the restaurant in Spain,” Lee ¶ 71, so that “the processor 130 may provide information related to the restaurant from the external server or another web site to a user based on the name of the Spanish restaurant.” Lee ¶ 72.
Additionally, Wong further teaches that the results from FIG. 8A–8C (i.e. second text) from performing a search related to car 802 include “the model, manufacturer, and year of car 802 and also includes a link to the official page for car 802 on the manufacturer's website.” Wong ¶ 110.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to substitute Lee’s server apparatus 320 with Wong’s search engine 108, or perhaps to simply improve server apparatus 320 by supplementing its functionality with the functionality of Wong’s search engine 
Claim 7
Lee teaches the electronic device of claim 1, but does not appear to explicitly disclose the remaining elements of claim 7.
Wong, however, teaches an electronic device further comprising: 
a camera, 
“Controller 104 may also include an integrated camera for capturing digital images.” Wong ¶ 26.
wherein the image is a preview image using the camera.
“The user may capture the snapshot image using controller 104 by . . . capturing a digital image using a camera on controller 104.” Wong ¶ 28.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to substitute Lee’s server apparatus 320 with Wong’s search engine 108, or perhaps to simply improve server apparatus 320 by supplementing its functionality with the functionality of Wong’s search engine 108. In this combination, Lee’s apparatus 100 would transmit a region extracted from the content to server apparatus 320 for the search query, rather than merely using information about the content as the query. One would have been motivated to improve Lee with Wong because Wong explicitly identified that “it would be desirable to provide an image-based search system that is capable of performing an efficient search based on an image captured from media content.” Wong ¶ 3.
Claim 8
Lee, as combined with Wong, teaches the electronic device of claim 7, wherein the instructions cause the processor to: 
when receiving the first response, capture a preview image displayed on the display to store the captured image as a still image, wherein the extracted region image is extracted from the stored still image.
“In some embodiments, the processor allows the user to manually select the targeted features. For example, the processor may present the snapshot image to the user and allow the user to select features of the snapshot image to target.” Wong ¶ 138. When the processor creates the search query to send to a search engine, “the processor isolates each targeted feature from the snapshot image and creates a single cropped search image for each targeted feature in the snapshot image.” Wong ¶ 139.
Claim 11
Lee, as combined with Wong, teaches the electronic device of claim 1, wherein the instructions cause the processor to: 
transmit the second text to a display device via the communication circuit to provide at least part of the second text through a display included in the display device.
The processor 130 “receive[s] search results corresponding to the texts from the server apparatus and display[s] the same.” Lee ¶ 97.
Although unnecessary to reach a conclusion of obviousness, the Examiner observes that Wong provides an overlapping teaching: search engine 108—illustrated as search engine 500 in FIG. 5, see Wong ¶ 65—gathers search results using search results assembly 514, and then, “[t]he search results gathered by search results assembly 514 are collated and sent to the user. The search results may be sent to a media device (e.g., media device 102 of FIG. 1) to be displayed. The search results may be sent to a controller (e.g. controller 104 of FIG. 1) of a media device to be displayed.” Wong ¶ 79 (referring to FIG. 5). “The search results may then be presented to the user on media device 102.” Wong ¶ 35.
.
Claim 9 is rejected under 35 U.S.C. § 103 as being unpatentable over Lee in view of Wong as applied to claim 1, and further in view of U.S. Patent Application Publication No. 2016/0328270 A1 (hereafter “Bikkula”).
Claim 9
Lee and Wong teach the electronic device of claim 1, but do not appear to explicitly disclose the remaining elements of claim 9.
Bikkula, however, teaches an electronic device configured with instructions to receive a responses from external servers, 
wherein the first response further includes a sequence of states of the electronic device for performing the task, 
“At operation 404, the received input is sent to a server. The server may be one or more servers for performing services such as speech recognition. The server or servers are also responsible for retrieving and populating a task frame after determining the appropriate task from the input received at operation 402.” Bikkula ¶ 45.
and wherein the instructions cause the processor to: after receiving the second response, cause the electronic device to have at least part of the sequence of states, using at least part of the second text.
“At operation 404, a task frame is received from the server. Where the input received at operation 402 was to request a new task to be initiated, the received task frame is specific to the requested task.” Bikkula ¶ 46.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to implement Lee’s arrangement of sending voice input to an external server for recognition by utilizing Bikkula’s “task frame” paradigm. One would have been motivated to combine Bikkula with Lee and Wong because “users are employing an increasing variety of devices to access digital 
III.	WONG AND LEE TEACH CLAIMS 12 AND 15.
Claims 12 and 15 are rejected under 35 U.S.C. § 103 as being unpatentable over Wong in view of Lee.
Claim 12
Wong teaches:
A server for processing an image, the server comprising: a network interface; a processor operatively connected to the network interface; and a memory operatively connected to the processor and including at least one database in which information associated with an object is stored, wherein the memory stores instructions that, when executed, cause the processor to: 
Reference is initially made to system 100 of FIG. 1, which illustrates processor 106 and search engine 108. FIGS. 4 and 5 respectively illustrate a detailed versions of the processor 106 as processor 400, Wong ¶ 58, and the search engine 108 as search engine 400. Wong ¶ 65. In an embodiment that corresponds to claim 12, processor 106 is “integrated with . . . search engine 108.” Wong ¶ 29. Accordingly, this rejection will refer to the operations shown in both FIGS. 4 and 5—which Wong explicitly discloses as being integrated into a single server in at least one embodiment—in order to illustrate obviousness of the claimed invention.
receive first data associated with an image including a plurality of objects and a first text from an external electronic device via the network interface, 
“Input to processor 400, including images, supplemental data, and user input, is received by parser 402 and separated into an image component and a data component.” Wong ¶ 58.
wherein the first text includes information identifying at least one object 

recognize the plurality of objects included in the image; 
“Feature detection 404 receives the image component from parser 402 and analyzes the image to recognize features that may be relevant to the search being performed. Feature detection 404 recognizes features, such as faces, logos, objects, and landmarks, that are likely to be common search targets of a user.” Wong ¶ 59.
determine a category for the plurality of objects included in the image; 
Processor 400 (or 106) generates a search query 414 and passes it to search engine 108 (or 500), which will now be discussed with respect to FIG. 5. See Wong ¶ 67 (“A search query entered to search engine 500 is received by identifier 502 at interpreter 506. The search query received by identifier 502 may have substantially the same features as search query 414 described in the above discussion with respect to FIG. 4”).
The search query 414 “is received by identifier 502 at interpreter 506,” and interpreter 506 “analyzes images and reads text descriptors in the received search query to determine how to target and perform the desired search.” Wong ¶ 67. “For example, if the snapshot image is captured from a television show and contains an actor's face, interpreter 506 may use the targeted face and supplemental information to determine that a search for an actor in a specific show is desired.” Wong ¶ 70 (emphasis added). In this example, the “specific show” is the determined category. 
Note, however, that this is one example of a category, and Wong’s interpreter 506 is clearly not limited to television shows. More generally, Wong discloses the technique of categorizing the search query so that “[a]n efficient search may be small subset of data,” i.e., the claimed category, “rather than trying to identify potential matches from all available searchable data.” Wong ¶ 67.
obtain information associated with the plurality of objects from the database associated with the determined category; 
“By analyzing the search query, interpreter 506 may be able to select the small subset of data in order to facilitate the search.” Wong ¶ 67. So, continuing with the television example, “[i]nterpreter 506 may then use the show title, episode number, scene ID, or cast list from the supplemental information to identify a data subset [510] of all actors who have appeared in the show or, if possible, all actors appearing in the specific episode or scene being viewed. Identifier 502 is then able to identify the face using the small subset of actors in the show rather than searching through all actors included in searchable data 508.” Wong ¶ 70.
select information associated with the at least one object based on the first text; 
“Identifier 502 may enter an image from the search query and an image retrieved from searchable data 508 into the pattern matching algorithm to obtain a similarity measure for the images. The image retrieved from searchable data 508 may be selected based on supplemental data in the search query, data associated with the retrieved image, the types of recognizable features contained in the retrieved image, any other suitable criteria, or any suitable combination thereof.” Wong ¶ 72.
obtain a second text, using the selected information and the first text; 
After a match acceptance component 512 selects the best matches from the above search, see Wong ¶ 73, it sends those matches to search results assembly 514 as “identification information,” which search assembly 514 uses to “gather information, media content, Internet content, and any other suitable content to present to the user as search results.” Wong ¶ 74. This includes textual results. See, e.g., Wong ¶ 76 (“search results assembly 514 may retrieve a biography of the actor or ordering information for 
and transmit the obtained second text to the external electronic device.  
“The search results gathered by search results assembly 514 are collated and sent to the user. The search results may be sent to a media device (e.g., media device 102 of FIG. 1) to be displayed.” Wong ¶ 79.
Wong does not explicitly disclose whether the data sent to the search engine includes “text data converted from a voice input of the external electronic device.”
Lee, however, teaches:
A server for processing an image, the server comprising:
Reference is made to the “server apparatus 320” in FIG. 3, which FIG. 4 illustrates in greater detail as “server apparatus 400.” Lee ¶¶ 87 and 103.
a network interface; a processor operatively connected to the network interface; and a memory operatively connected to the processor and including at least one database in which information associated with an object is stored, wherein the memory stores instructions that, when executed, cause the processor to:
“Referring to FIG. 4, the server apparatus 400 includes a server communicator 410, a server controller 420, and a database 430.” Lee ¶ 103. In general, the server controller 420 is programmed to perform the operations that follow, and is connected to both the server communicator 410 and the database 430. See Lee FIG. 4.
receive first data associated with an image including a plurality of objects 
“The server communicator 410 may receive . . . information regarding the content from the display apparatus [100].” Lee ¶ 104. Importantly, this limitation does not require the first data to be image data; it is merely “associated with” an image. Lee’s “information regarding the content from the display apparatus” is associated 
“For example, when the indicator selects the drama video and when a user speaks a voice command stating, ‘What are other movies or dramas in which this drama's main actor acts?’, the processor 130 may extract keywords related to the drama main actor by analyzing the selected drama video.” Lee ¶ 70 (emphasis added). As another example, “while the content (e.g., movie) is playing . . . and one restaurant in Spain is displayed in the video images, the processor 130 may analyze the video images to execute the user voice command stating that ‘What is this?’ and extract the name of the restaurant in Spain.” Lee ¶ 71.
and a first text from an external electronic device via the network interface, 
The server communicator 410 further receives “texts corresponding to the user voice.” Lee ¶ 104. 
wherein the first text includes information identifying at least one object and text data converted from a voice input of the external electronic device;
For example, the text may be provided “in response to input of a voice command spoken by a user stating that ‘What are the other dramas or movies in which this drama's main actor acts?’” while a current drama content item is playing. Lee ¶ 91.
obtain a second text, using the selected information and the first text; and
“The server controller 420 may search the database 430 for the text and information corresponding to the content transmitted through the server communicator 410.” Lee ¶ 106.
transmit the obtained second text to the external electronic device.
“Accordingly, the search results may be provided to the display apparatus 100.” Lee ¶ 106.
See Lee ¶¶ 8–9.
Claim 15
Wong, as combined with Lee, teaches the server of claim 12, 
wherein information associated with the plurality of objects includes list information in which a text and an image are included.
“Interpreter 506 may then use the show title, episode number, scene ID, or cast list from the supplemental information to identify a data subset [510] of all actors who have appeared in the show or, if possible, all actors appearing in the specific episode or scene being viewed. Identifier 502 is then able to identify the face using the small subset of actors in the show rather than searching through all actors included in searchable data 508.” Wong ¶ 70.
IV.	WONG, LEE, AND FU TEACH CLAIM 14.
Claim 14 is rejected under 35 U.S.C. § 103 as being unpatentable over Wong in view of Lee as applied to claim 12 above, and further in view of U.S. Patent Application Publication No. 2020/0160124 A1 (“Fu”).
Claim 14
Wong and Lee teach the server of claim 12, wherein the memory includes at least one or more databases associated with the category. See Wong ¶ 66 (explaining that the search engine’s identifier 502 “includes a collection of searchable data 508 that may be used to perform a search based on the received query”).

Fu, however, teaches a device wherein:
the category includes an upper category and a lower category included in the upper category,
“In some applications, the task of the image recognition is to determine a fine-grained category of the object in the image 170. For example, in the example of FIG. 1, the recognizing result 180 of the image recognition module 122 may include the text of ‘This is a red-bellied woodpecker’ to indicate a specific species of the bird included in the image 170,” in contrast to “image recognition of a general category (in which only a general category of ‘bird’ is recognized).” Fu ¶ 21.
wherein the memory includes at least one or more databases associated with the category, and
“The memory 120 can include an image recognition module 122 which is configured to perform functions of implementations of the subject matter described herein.” Fu ¶ 16. Specifically, as shown in FIG. 2, “the system includes a plurality of stacked learning networks 210 and 220,” which are “implemented as one or more functional sub-modules in the image recognition module 122.” Fu ¶ 27.
wherein the instructions cause the processor to: determine the upper category and the lower category sequentially.
“The image of the finer scale to be processed by the learning network 220 is dependent on the determination at the learning network 210.” Fu ¶ 27.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to improve Wong’s databases with Fu’s stacked learning networks, such that the overall combined system would determine both an upper and lower category of an object in an image sequentially. One would have been motivated to combine Fu with Wong because the use of upper and lower categories leads to more accurate results when classifying an object. See Fu ¶ 3 (“Through this solution, it is possible to localize an image region at a finer scale . 
CONCLUSION
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Justin R Blaufeld whose telephone number is (571)272-4372. The examiner can normally be reached M-F, 9:00am-4:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kavita Stanley can be reached on (571) 272-8352. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent 

Justin R. Blaufeld
Primary Examiner
Art Unit 2176



/Justin R. Blaufeld/Primary Examiner, Art Unit 2176