Detailed Action
NOTICE OF PRE-AIA  OR AIA  STATUS
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
RESPONSE TO AMENDMENT AND ARGUMENTS
This FINAL Office action is responsive to the communication filed under 37 C.F.R. § 1.111 on December 9, 2020 (hereafter “Response”). The amendments to the claims are acknowledged and have been entered.
Claims 9, 12, and 13 are now amended.
Claims 1–15 are pending in the application. 
Request to Cite Fewer References — Denied
Regarding the citation of references, respectfully, the Applicant is incorrect that “the alternative rejections that are presented in the § 103 rejections improperly disregard the instruction of MPEP § 904.03.” (Response 6). 
The Applicant’s claims necessitated the dual grounds of rejection with the breadth of claims 7–9, which the Applicant will observe is the only ground of rejection not repeated across both references. Specifically, the Examiner found that the King-Huang combination was the best combination for claims 7 and 8, whereas the King-Bikkula combination was the best combination for claim 9. Since all three claims depend from claim 1, claim 1 must be rejected twice: once to establish obviousness of the dependency chain from claim 1 to claims 7 and 8, and a second time to establish obviousness of the dependency chain from claim 1 to claim 9.
Consider what would happen if the Examiner followed the Applicant’s directive, and only rejected claim 1 over King and Huang. Later, in order to reject claim 9, the Examiner would need to reject claim 9 over King in view of Huang and further in view of Bikkula. Clearly, under these circumstances, a two-reference 
Likewise, in the case of claims 2–6, 10, and 11, the Examiner did not multiply the references; the Examiner mapped each of those claims to the same reference, King. Thus, in addition to the Office Action being compliant with the MPEP, it also appears the Applicant’s allegation about creating an undue “burden” are unfounded, since any argument directed to King’s disclosure of any element would automatically apply to both grounds of rejection. (See Response 9 and 10) (addressing the King-Bikkula rejection with a single sentence).
Therefore, the Applicant’s request to “avoid such multiplication of references” will be disregarded. 
Objection and Rejection under 35 U.S.C. § 112 Withdrawn
The amendment cures the defects raised in the last Office Action with respect to the objection and the rejection under 35 U.S.C. § 112(b), and therefore, those grounds are hereby withdrawn.
Claims 1–11 Remain Obvious Over King & Huang and King & Bikkula 
Claims 1–8, 10, and 11 stand rejected under 35 U.S.C. § 103 as being unpatentable over U.S. Patent Application Publication No. 2014/0337730 A1 (hereafter “King”) in view of U.S. Patent Application Publication No. 2010/0260426 A1 (“Huang”). Claims 1–6 and 9–11 are rejected under 35 U.S.C. § 103 as being unpatentable over King in view of U.S. Patent Application Publication No. 2016/0328270 A1 (hereafter “Bikkula”). The Applicant’s traversal of these rejections have been considered, but are not persuasive. 
The Examiner is not persuaded by the Applicant’s argument that both references fail to teach “a microphone positioned at a second portion of the housing.” At the top of page 12 of the Non-Final Office Action, paragraph 53, the Examiner provided an explicit quote from Huang’s disclosure where Huang says “[m]obile device 130 can further include a user input interface (e.g., a keypad, a microphone, and the like) that can receive user-entered text or auditory information.” Huang ¶ 32 
The Examiner is also unpersuaded by the Applicant’s argument that King’s paragraph 32 fails to disclose receiving a first response from the first external server via the communication circuit, wherein the first response includes a first text associated with the at least one object, as recited in Claim 1” (Response 8–9) for two reasons. First, apart from reproducing a copy of paragraph 32, the Applicant never actually explains why it believes paragraph 32 is deficient. The Applicant merely added an underline to selected portions of the claim language. The Applicant is reminded that “[i]n order to be entitled to further examination, the applicant or patent owner must . . . distinctly and specifically point[] out the supposed errors in the examiner’s action.” 37 C.F.R. § 1.111(b). Statements that say nothing other than a mere change of font choice for certain portions of claim language “without specifically pointing out how the language of the claims patentably distinguishes them from the references” fail to comply with 37 C.F.R. § 1.111(b). Since it is unclear whether the Applicant’s failure to comply was deliberate, this requirement will be held in abeyance until the next reply. The Examiner is prohibited from allowing claims “where no attempt is made to point out the patentable novelty.” MPEP § 714.04.
But more importantly, King explicitly teaches this limitation. The claim requires an electronic device that receives a first response from a first external server. Likewise, King discloses that information presentation module 135—executing on user client 100—“receives context information from the context identification system 115.” King ¶ 32. User client 100 is an electronic device, and as shown in FIG. 1, context identification system 115 is external to user client 100. There is no deficiency here. 
minimum, “definition information” and/or “one or more links” (or “some combination thereof”)—which King discloses are explicitly associated with the portion of the media content item that the user selected in paragraph 31—falls within the scope of “first text associated with the at least one object,” because it is text, and it is associated with the object that the user selected earlier. Since the Applicant has not explained why it believes these disclosures are deficient, there is nothing else the Examiner can provide as a response. Accordingly, the Examiner is unpersuaded that King fails to teach this claim element.
The Examiner is also unpersuaded by the Applicant’s argument that King fails to disclose receiving a second response from the second external server via the communication circuit, wherein the second response includes a second text associated with performing at least part of the task” (Response 9–10) for two reasons. First, apart from reproducing a copy of the cited paragraphs from King, the Applicant never actually explains why it believes those paragraphs are deficient. The Applicant merely added an underline to selected portions of the claim language. Once again, the Applicant’s response here is non-compliant. “In order to be entitled to further examination, the applicant or patent owner must . . . distinctly and specifically point[] out the supposed errors in the examiner’s action,” and statements that say nothing other than a mere change of font choice for certain portions of claim language “without specifically pointing out how the language of the claims patentably distinguishes them from the references fail to comply with this section.” 37 C.F.R. § 1.111(b). Since it is unclear whether the Applicant’s failure to comply was deliberate, this requirement will be held in abeyance until the next reply. The Examiner is prohibited from allowing claims “where no attempt is made to point out the patentable novelty.” MPEP § 714.04.
But more importantly, King explicitly teaches this limitation. Specifically, King teaches that it uses the link it received from the context identification system 115 to retrieve second information from another source (the claimed second external 
For these reasons, the rejections under 35 U.S.C. § 103 stand.
Claims 12, 13, and 15 Remain Anticipated by Huang
Claims 12, 13, and 15 stand rejected under 35 U.S.C. § 102(a)(1) as being anticipated by U.S. Patent Application Publication No. 2010/0260426 A1 (“Huang”). The Applicant’s argument for this rejection has been considered, but is not persuasive.
The Applicant argues that claim 12 is now amended to include subject matter that is “based on” claim 4, and that King fails to teach claim 12. (Response 11). However, the subject matter amended to claim 12 is far broader than that which claim 4 adds to claim 1, and more importantly, King is not relied upon for showing anticipation of claim 12. Accordingly, this ground of rejection stands.
For the reasons explained above, none of the claims are in condition for allowance. The Applicant’s request for an allowance (Response 12) is therefore respectfully denied.
INFORMATION DISCLOSURE STATEMENT
The information disclosure statement(s) filed on January 4, 2021 complies with the provisions of 37 C.F.R. § 1.97, 1.98, and MPEP § 609, and therefore has been placed in the application file. The information referred to therein has been considered as to the merits.
CLAIM REJECTIONS – 35 U.S.C. § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. § 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –


Claims 12, 13, and 15 are rejected under 35 U.S.C. § 102(a)(1) as being anticipated by U.S. Patent Application Publication No. 2010/0260426 A1 (“Huang”).
Claim 12
Huang discloses:
A server processing an image, the server comprising:
“Reference will now be made to FIG. 3 to illustrate an exemplary configuration of a back-end 300 of image recognition system 120, including remote server 140 and wireless services provider 150.” Huang ¶ 47.
a network interface;
“In one implementation, back-end 300 can include wireless services provider 150 with a receiver 310 that receives one or more signals from one or more mobile devices (e.g., mobile device 130 as shown in FIG. 1, etc.) through receive antennas 306, and a transmitter 322 that transmits one or more signals modulated by modulator 320 to the mobile devices through transmit antennas 308.” Huang ¶ 47.
a processor operatively connected to the network interface; and
“A processor 314 can analyze demodulated symbols and information provided by demodulator 312.” Huang ¶ 47.
a memory operatively connected to the processor 
Back-end 300 includes three memories coupled to processor 314, including memory 316, 342, and 362. See Huang FIG. 3 and ¶¶ 49–51.
and including at least one database in which information associated with an object is stored, wherein the memory stores instructions that, when executed, cause the processor to:

receive first data associated with an image including at least one object and a first text from an external electronic device via the network interface, 
“In 510, remote server 140 (as shown in FIGS. 1 and 3) in back-end 300 can receive a visual search query via wireless connection 132 and wireless services provider 150.” Huang ¶ 67. The visual search query can include “an image that contains at least one object of interest, and metadata and/or contextual data associated with the image.” Huang ¶ 67.
wherein the first text includes information identifying the at least one object;
For example, the “metadata and/or contextual data associated with the image” can include any of the following: “metadata indicating that the image is associated with BREW GAMING™, and contextual data indicating that the image was acquired at a particular GPS location.” Huang ¶ 67.
recognize the at least one object included in the image;
“Next, in 515, remote server 140 can recognize or otherwise identify the object of interest in the image based on the visual search query.” Huang ¶ 68.
obtain information about the recognized at least one object from the database;
“In 520, remote server 140 can generate a visual search result, including information content, based on the recognized object of interest in response the visual search query.” Huang ¶ 70. Specifically, “remote server 140 can execute search 
generate a second text, using the obtained information and the first text; and
Performing the semantic search causes the search engine 344 to “retrieve relevant information content, such as product information (e.g., product brand 650 and product type 660 as shown in FIG. 6D), directed links to the product information (e.g., information links 670), related products (e.g., related product 690), links to online retailers for comparison shopping, to save to a wish list, to share with a friend, or to purchase instantly (e.g., purchase link 680), etc., or any combination thereof.” Huang ¶ 71.
transmit the generated second text to the external electronic device.
“Next, in 525, remote server 140 can communicate or otherwise provide the visual search result, including the relevant information content, to mobile device 130 via wireless connection 132 and wireless services provider 150.” Huang ¶ 72.
Claim 13
Huang discloses the server of claim 12, wherein the instructions cause the processor to:
determine a category for the at least one object included in the image;
At step 515 mentioned above, the remote server 140 may “execute image recognition software 364 stored in image recognition server 146 to perform a one-to-one matching of the image with image data (e.g., image raster data, image coefficients, or the like) stored in image data and coefficient library 366,” Huang ¶ 69, but, importantly “[i]mage coefficient library 366 can index the trained images according to categories of the objects in the trained images,” which are “used by image recognition software 364 to recognize one or more objects within image 100.” Huang ¶ 50 (emphasis added).

“In 520, remote server 140 can generate a visual search result, including information content, based on the recognized object of interest in response the visual search query.” Huang ¶ 70. Specifically, “remote server 140 can execute search engine 344 stored in content server 144 to perform a semantic search for information content stored in information content database 346.” Huang ¶ 71.
wherein the second text is based on the obtained information, using the first text.
“Remote server 140 can utilize metadata and/or contextual data associated with the image”—remember, this is the same metadata and contextual data mapped to the claimed “first text” in the rejection of claim 12, above—“to assist in recognizing the object of interest, which enables remote server 140 to focus or otherwise limit the scope of the visual search.” Huang ¶ 68. 
Claim 15
Huang discloses the server of claim 12, 
wherein information associated with the object includes list information in which a text and an image are included.
Performing the semantic search causes the search engine 344 to “retrieve relevant information content, such as product information (e.g., product brand 650 and product type 660 as shown in FIG. 6D), directed links to the product information (e.g., information links 670), related products (e.g., related product 690), links to online retailers for comparison shopping, to save to a wish list, to share with a friend, or to purchase instantly (e.g., purchase link 680), etc., or any combination thereof.” Huang ¶ 71.
CLAIM REJECTIONS – 35 U.S.C. § 103
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was 
The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
I.	KING AND HUANG TEACH CLAIMS 1–8, 10, AND 11.
Claims 1–8, 10, and 11 are rejected under 35 U.S.C. § 103 as being unpatentable over U.S. Patent Application Publication No. 2014/0337730 A1 (hereafter “King”) in view of Huang.
Claim 1
King teaches:
An electronic device comprising:
As shown in FIG. 1, a “user client 100” is provided as part of an overall system, King ¶ 20, and the user client 100 may be implemented in hardware as the computer 200 shown in FIG. 2. King ¶ 35
a housing;
“A user client 100 might be, for example, a personal computer, a tablet computer, a smart phone, a laptop computer, a dedicated e-reader, or other type of 
a speaker positioned at a first portion of the housing;
The user client 100 includes a media player 130 “configured to present media items of different media formats,” including “videos, images, audio files, etc.” King ¶ 30. 
a touch screen display positioned at a third portion of the housing;
As shown in FIG. 2, the computer 200 that implements the user client 100 includes an input interface 214 such as “a touch-screen interface.” King ¶ 36.
a communication circuit positioned inside the housing or attached to the housing;
The hardware of computer 200 further includes a “network adapter 216 [that] couples the computer 200 to one or more computer networks.” King ¶ 36.
a processor positioned inside the housing and operatively connected to the speaker, the microphone, the display, and the communication circuit; and
“The computer 200 includes at least one processor 202 coupled to a chipset 204,” which in turn is coupled to the remaining components of the computer 200. King ¶ 35.
a memory positioned inside the housing and operatively connected to the processor, 
“The memory 206 holds instructions and data used by the processor 202.” King ¶ 36.
wherein the memory stores instructions that, when executed, cause the processor to:
“In one embodiment, program modules are stored on the storage device 208, loaded into the memory 206, and executed by the processor 202.” King ¶ 37. Therefore, the program modules will now be discussed. The program modules See King ¶ 39.
display an image including at least one object on the display;
“The media player 130 presents media items to a user of the user client 100. The media player 130 may be configured to present media items of different media formats. Media formats may include, for example, e-books, videos, images, audio files, etc.” King ¶ 30.
receive a first user input through at least one of the display or the microphone, wherein the first user input includes a request for performing a task associated with at least one object on the image;
“The media player 130 includes an information presentation module 135 in one embodiment. The information presentation module 135 receives a selection of a portion of a media item presented on the user client 100.” King ¶ 31.
transmit first data associated with the first user input to a first external server via the communication circuit;
“The information presentation module 135 generates and sends a context request to the context identification system 115 for context information associated with the selected portion of the media item.” King ¶ 31.
receive a first response from the first external server via the communication circuit, 
“The information presentation module 135 receives context information from the context identification system 115.” King ¶ 32.
wherein the first response includes a first text associated with the at least one object; 
“Context information is information that in some way describes and/or is associated with a portion of a media item. Context information may include definition information, image information, geographic information, one or more links to 
transmit second data associated with the image and the first text to a second external server via the communication circuit; 
“In some embodiments, the information presentation module 135 may retrieve context information from the media context source 110 and/or the local context source using the one or more links provided by the context identification system 115.” King ¶ 32.
receive a second response from the second external server via the communication circuit, wherein the second response includes a second text associated with performing at least part of the task; 
“In embodiments, where the received context information includes one or more links to one or more media context sources 110,” the information presentation module 135 executes its context identification module 315 to “retrieve context information from the sources using the received one or more links.” King ¶ 44.
and provide at least part of the second text via the display or the speaker.
“The information presentation module 135 generates one or more context presentation cards using the received context information. A context presentation card presents context information for a portion of a media item and responds to commands from a user of the user client 100.” King ¶ 34. “The context presentation card displays context information, such as textual or graphical information within its borders, as if the information were written on the card.” King ¶ 46.
King does not appear to explicitly disclose “a microphone positioned at a second portion of the housing.”
Huang, however, teaches:
An electronic device comprising: a housing; 
As shown in FIGS. 1–3, a mobile device 130 is provided, and all of its components are provided within the same housing, separate from other remote See Huang ¶ 25; see also Huang ¶ 30 (“Examples of mobile device 130 can include any mobile electronic device, such as, without any limitations, a cellular telephone (‘cell phone’), a personal digital assistant (PDA), a digital camera, or a wireless telephone”).
a speaker positioned at a first portion of the housing;
“For example, the information content can be transmitted to mobile device 130 to be presented to the user, such as . . . on audio speakers.” Huang ¶ 34.
a microphone positioned at a second portion of the housing; 
“Mobile device 130 can further include a user input interface” for “a microphone.” Huang ¶ 32
a communication circuit positioned inside the housing or attached to the housing; 
As shown in FIGS. 2 and 3, the mobile device 130 further includes a circuit that starts with processor 208 connecting to a modulator 216, which sends signals to a transmitter 218 for transmission via antenna 202, and likewise, antenna 202 receives signals and passes them to demodulator 206, which demodulates the signals and sends them back to processor 208. Huang ¶ 35.
a processor positioned inside the housing and operatively connected 
“Processor 208 can analyze information received by antenna 202 and/or a user input interface (not depicted) of mobile device 130, and/or generate information for transmission by a transmitter 218 via a modulator 216.” Huang ¶ 36.
and a memory positioned inside the housing and operatively connected to the processor, wherein the memory stores instructions that, when executed, cause the processor to:
“In one implementation, mobile device 130 includes memory 210 to store computer-readable data (e.g., image 100 as shown in FIG. 1, an image coefficient library 262, and the like) and computer-executable software instructions (e.g., image detection/recognition software 270, runtime environment 212, set of applications 214, 
display an image including at least one object on the display; 
“In 410, mobile device 130 can initiate visual searching and image recognition by acquiring an image (e.g., image 100 as shown in FIG. 1, image 600 as shown in FIG. 6A, etc.).” Huang ¶ 55.
receive a first user input through at least one of the display or the microphone, wherein the first user input includes a request for performing a task associated with at least one object on the image; 
“Next, in 425, mobile device 130 can receive input from the user to select at least one of the highlighted objects, such as selected pattern 610 as shown in FIG. 6B.” Huang ¶ 60. Recall that the user input interface includes a microphone. Huang ¶ 32.
transmit first data associated with the first user input to a first external server via the communication circuit;
“Next, in 445, mobile device 130 can generate a visual search query based on the acquired image and communicate the visual search query to back-end 300 of image recognition system 120.” Huang ¶ 64.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to add a microphone to King’s device as a means for input when searching an image, as taught by Huang. One would have been motivated to combine Huang with King because a microphone could provide additional information that might make the search more accurate. See Huang ¶ 63. Additionally, a microphone adds greater convenience and flexibility for using the device, e.g., when the user’s hand is wet or dirty, and therefore cannot use the touch screen.
Claim 2
King, as combined with Huang, teaches the electronic device of claim 1, 

“The context selection module 310 receives a selection of a portion of a media item being presented to a user by the user client 100,” King ¶ 41, e.g., with “adjustable markers presented to the user that operate to bound a selected portion of a media item.” King ¶ 42. 
Claim 3
King, as combined with Huang, teaches the electronic device of claim 1, wherein the instructions cause the processor to: 
generate information about a region including the at least one object in the image by directly analyzing the image in the electronic device or by analyzing the image through the second external server; 
“In some embodiments, where the media item is a video or an image, the image analysis module 330 analyses a selected portion of the media item to identify context information. The image analysis module 330 may analyze images via, for example, optical character recognition, facial recognition, location recognition, or some other process.” King ¶ 43.
and separate a region including the at least one object in the image, using the generated information.
“For example, a user may select a portion of the media item displaying a road sign. The context selection module 310 then performs optical character recognition on the selected portion to identify the text of the sign.” King ¶ 43.
Claim 4
King, as combined with Huang, teaches the electronic device of claim 1, 
wherein the task further includes obtaining information associated with the at least one object included in the image.
“The information presentation module 135 generates and sends a context request to the context identification system 115 for context information associated 
Claim 5
King, as combined with Huang, teaches the electronic device of claim 1 
wherein the first text further includes information indicating the at least one object.
“Context information is information that in some way describes and/or is associated with a portion of a media item.” King ¶ 24.
Claim 6
King, as combined with Huang, teaches the electronic device of claim 1, 
wherein the second text further includes at least one of model information, function information, price information, manufacturer information, or seller information of a corresponding product when the at least one object is a product.
When a claim “is directed towards conveying a message or meaning to a human reader independent of the supporting product,” the claimed message or meaning does not receive any patentable weight. MPEP § 2111.05. Accordingly, since none of the forgoing information has any functional relationship with the claimed computing system, it is sufficient that King teaches disclosing information in its second text.
Nevertheless, the Examiner observes that Huang teaches a method of searching an image at an external server and receiving search results comprising “information content associated with the selected object in the acquired image,” such as “product information (e.g., a product brand 650 and a product type 660 as shown in FIG. 6D), directed links to the product information (e.g., information links 670), related products (e.g., a related product 690 and an advertisement 695), links to online retailers for comparison shopping, to save to a wish list, to share with a friend, or to 
Claim 7
King teaches the electronic device of claim 1, but does not explicitly disclose whether or not its known device includes a camera.
Huang’s electronic device, however, further comprises:
a camera,
“Mobile device 130 comprises a portable image sensor (e.g., an image sensor 200 as shown in FIG. 2, etc.), which can be any electronic device capable of generating image 100. For example, the portable image sensor can comprise either a charge coupled device (CCD) sensor or a complementary metal-oxide-semiconductor (CMOS) sensor, and a set of optical lenses to convey a light pattern onto the sensor and thereby generate image 100.” Huang ¶ 31.
wherein the image is a preview image using the camera.
“In 410, mobile device 130 can initiate visual searching and image recognition by acquiring an image (e.g., image 100 as shown in FIG. 1, image 600 as shown in FIG. 6A, etc.). For example, a user of mobile device 130 can point image sensor 200 of mobile device 130 in a general direction of a target, and mobile device 130 can capture, generate, acquire, or otherwise replicate an image that is representative of the target.” Huang ¶ 55.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to add a camera to King’s known mobile device, thereby allowing the user to search for images acquired via the camera. One would have been motivated to combine Huang with King because “many consumers now rely on their mobile communication devices, such as cellular phones, to take pictures and shoot videos, exchange messages in their social network, make purchase decisions, conduct financial transactions, and carry out other activities.” Huang ¶ 3.
Claim 8 
King, as combined with Huang, teaches the electronic device of claim 7, wherein the instructions cause the processor to:
when receiving the second response, capture a preview image displayed on the display to store the captured image as a still image; and
“In 410, mobile device 130 can initiate visual searching and image recognition by acquiring an image (e.g., image 100 as shown in FIG. 1, image 600 as shown in FIG. 6A, etc.). For example, a user of mobile device 130 can point image sensor 200 of mobile device 130 in a general direction of a target, and mobile device 130 can capture, generate, acquire, or otherwise replicate an image that is representative of the target.” Huang ¶ 55.
transmit the second data associated with the stored still image and the first text to the second external server.
“Next, in 445, mobile device 130 can generate a visual search query based on the acquired image and communicate the visual search query to back-end 300 of image recognition system 120.” Huang ¶ 64. “In one implementation, a visual search query can include the acquired image or a sub-image extracted from the acquired image based on the selected object, and the metadata and/or contextual data associated with the acquired image or extracted sub-image.” Huang ¶ 64.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to add a microphone to King’s device as a means for input when searching an image, as taught by Huang. One would have been motivated to combine Huang with King because a microphone could provide additional information that might make the search more accurate. See Huang ¶ 63. Additionally, a microphone adds greater convenience and flexibility for using the device, e.g., when the user’s hand is wet or dirty, and therefore cannot use the touch screen.
Claim 10
King, as combined with Huang, teaches the electronic device of claim 1, 

“Context information is information that in some way describes and/or is associated with a portion of a media item. Context information may include definition information, image information, geographic information, one or more links to locations where the context information resides or may be determined, or some combination thereof.” King ¶ 24.
and wherein the instructions cause the processor to: transmit the second data associated with the third text to the second external server, as well as the image and the first text.
“In some embodiments, the information presentation module 135 may retrieve context information from the media context source 110 and/or the local context source using the one or more links provided by the context identification system 115.” King ¶ 32.
Claim 11
King teaches the electronic device of claim 1, wherein the instructions cause the processor to: 
transmit the second text to a display device via the communication circuit to provide at least part of the second text through a display included in the display device.
“In one embodiment, the user interface module 325 generates the user interface 400A, and similarly, user interfaces 400B-400D, and 500 described below. The user interface 400A includes . . . a portion of a context presentation card 420.” King ¶ 60.
II.	KING AND BIKKULA TEACH CLAIMS 1–6 AND 9–11.
Claims 1–6 and 9–11 are rejected under 35 U.S.C. § 103 as being unpatentable over King in view of U.S. Patent Application Publication No. 2016/0328270 A1 (hereafter “Bikkula”).
Claim 1
King teaches:
An electronic device comprising:
As shown in FIG. 1, a “user client 100” is provided as part of an overall system, King ¶ 20, and the user client 100 may be implemented in hardware as the computer 200 shown in FIG. 2. King ¶ 35
a housing;
“A user client 100 might be, for example, a personal computer, a tablet computer, a smart phone, a laptop computer, a dedicated e-reader, or other type of network-capable device such as a networked television or set-top box.” King ¶ 28. Each of the foregoing are understood to inherently self-house all of their respective electronic components within a single housing. 
a speaker positioned at a first portion of the housing;
The user client 100 includes a media player 130 “configured to present media items of different media formats,” including “videos, images, audio files, etc.” King ¶ 30. 
a touch screen display positioned at a third portion of the housing;
As shown in FIG. 2, the computer 200 that implements the user client 100 includes an input interface 214 such as “a touch-screen interface.” King ¶ 36.
a communication circuit positioned inside the housing or attached to the housing;
The hardware of computer 200 further includes a “network adapter 216 [that] couples the computer 200 to one or more computer networks.” King ¶ 36.
a processor positioned inside the housing and operatively connected to the speaker, the microphone, the display, and the communication circuit; and
“The computer 200 includes at least one processor 202 coupled to a chipset 204,” which in turn is coupled to the remaining components of the computer 200. King ¶ 35.

“The memory 206 holds instructions and data used by the processor 202.” King ¶ 36.
wherein the memory stores instructions that, when executed, cause the processor to:
“In one embodiment, program modules are stored on the storage device 208, loaded into the memory 206, and executed by the processor 202.” King ¶ 37. Therefore, the program modules will now be discussed. The program modules include the media player 130 of FIG. 1, its information presentation module 135, and all of the information presentation modules sub modules 305–330 of information presentation module 135 shown in FIG. 3. See King ¶ 39.
display an image including at least one object on the display;
“The media player 130 presents media items to a user of the user client 100. The media player 130 may be configured to present media items of different media formats. Media formats may include, for example, e-books, videos, images, audio files, etc.” King ¶ 30.
receive a first user input through at least one of the display or the microphone, wherein the first user input includes a request for performing a task associated with at least one object on the image;
“The media player 130 includes an information presentation module 135 in one embodiment. The information presentation module 135 receives a selection of a portion of a media item presented on the user client 100.” King ¶ 31.
transmit first data associated with the first user input to a first external server via the communication circuit;
“The information presentation module 135 generates and sends a context request to the context identification system 115 for context information associated with the selected portion of the media item.” King ¶ 31.
receive a first response from the first external server via the communication circuit, 

wherein the first response includes a first text associated with the at least one object; 
“Context information is information that in some way describes and/or is associated with a portion of a media item. Context information may include definition information, image information, geographic information, one or more links to locations where the context information resides or may be determined, or some combination thereof.” King ¶ 24.
transmit second data associated with the image and the first text to a second external server via the communication circuit; 
“In some embodiments, the information presentation module 135 may retrieve context information from the media context source 110 and/or the local context source using the one or more links provided by the context identification system 115.” King ¶ 32.
receive a second response from the second external server via the communication circuit, wherein the second response includes a second text associated with performing at least part of the task; 
“In embodiments, where the received context information includes one or more links to one or more media context sources 110,” the information presentation module 135 executes its context identification module 315 to “retrieve context information from the sources using the received one or more links.” King ¶ 44.
and provide at least part of the second text via the display or the speaker.
“The information presentation module 135 generates one or more context presentation cards using the received context information. A context presentation card presents context information for a portion of a media item and responds to commands from a user of the user client 100.” King ¶ 34. “The context presentation card displays 
King does not appear to explicitly disclose “a microphone positioned at a second portion of the housing.”
Bikkula, however, teaches:
An electronic device comprising: a housing;
“FIGS. 8A and 8B illustrate a mobile computing device 800, for example, a mobile telephone, a smart phone, wearable computer (such as a smart watch), a tablet computer, a laptop computer, and the like, with which examples of the disclosure may be practiced.” Bikkula ¶ 72.
a speaker positioned at a first portion of the housing;
The computing device 800 includes “an audio transducer 825 (e.g., a speaker),” Bikkula ¶ 72, shown in FIG. 8 as positioned at the top left portion of the device’s housing.
a microphone positioned at a second portion of the housing; 
“For example, in addition to being coupled to the audio transducer 825, the audio interface 874 may also be coupled to a microphone to receive audible input, such as to facilitate a telephone conversation or capture speech for speech recognition.” Bikkula ¶ 77.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to add a microphone to King’s device as taught by Bikkula. One would have been motivated to combine Bikkula with King because a microphone adds greater convenience and flexibility for using the device, e.g., when the user’s hand is wet or dirty, and therefore cannot use the touch screen.
Claim 2
King, as combined with Bikkula, teaches the electronic device of claim 1, 
wherein the image is an image in which a region including the at least one object is separated.

Claim 3
King, as combined with Bikkula, teaches the electronic device of claim 1, wherein the instructions cause the processor to: 
generate information about a region including the at least one object in the image by directly analyzing the image in the electronic device or by analyzing the image through the second external server; 
“In some embodiments, where the media item is a video or an image, the image analysis module 330 analyses a selected portion of the media item to identify context information. The image analysis module 330 may analyze images via, for example, optical character recognition, facial recognition, location recognition, or some other process.” King ¶ 43.
and separate a region including the at least one object in the image, using the generated information.
“For example, a user may select a portion of the media item displaying a road sign. The context selection module 310 then performs optical character recognition on the selected portion to identify the text of the sign.” King ¶ 43.
Claim 4
King, as combined with Bikkula, teaches the electronic device of claim 1, 
wherein the task further includes obtaining information associated with the at least one object included in the image.
“The information presentation module 135 generates and sends a context request to the context identification system 115 for context information associated with the selected portion of the media item.” King ¶ 31. “Context information is 
Claim 5
King, as combined with Bikkula, teaches the electronic device of claim 1 
wherein the first text further includes information indicating the at least one object.
“Context information is information that in some way describes and/or is associated with a portion of a media item.” King ¶ 24.
Claim 6
King, as combined with Bikkula, teaches the electronic device of claim 1, 
wherein the second text further includes at least one of model information, function information, price information, manufacturer information, or seller information of a corresponding product when the at least one object is a product.
When a claim “is directed towards conveying a message or meaning to a human reader independent of the supporting product,” the claimed message or meaning does not receive any patentable weight. MPEP § 2111.05. Accordingly, since none of the forgoing information has any functional relationship with the claimed computing system, it is sufficient that King teaches disclosing information in its second text.
Claim 9
King teaches the device of claim 1, but does not appear to explicitly disclose the remaining elements of claim 9.
Bikkula, however, teaches a device wherein:
the first response further includes a sequence of states of the electronic device for performing the task, and
“At operation 404, the received input is sent to a server. The server may be one or more servers for performing services such as speech recognition. The server or 
wherein the instructions cause the processor to: after receiving the second response, cause the electronic device to have at least part of the sequence of states, using at least part of the second text.
“At operation 404, a task frame is received from the server. Where the input received at operation 402 was to request a new task to be initiated, the received task frame is specific to the requested task.” Bikkula ¶ 46.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to offload voice-command processing onto another server, thereby having the server provide at least part of the sequence of states, using at least part of the second text. One would have been motivated to combine Bikkula with King because “users are employing an increasing variety of devices to access digital assistant applications, making it desirable to provide an efficiently designed framework to communicate with each of these devices.” Bikkula ¶ 17.
Claim 10
King, as combined with Bikkula, teaches the electronic device of claim 1, 
wherein the first response further includes a third text associated with the at least one object, wherein the third text includes category information of an object included in the image, 
“Context information is information that in some way describes and/or is associated with a portion of a media item. Context information may include definition information, image information, geographic information, one or more links to locations where the context information resides or may be determined, or some combination thereof.” King ¶ 24.
and wherein the instructions cause the processor to: transmit the second data associated with the third text to 
“In some embodiments, the information presentation module 135 may retrieve context information from the media context source 110 and/or the local context source using the one or more links provided by the context identification system 115.” King ¶ 32.
Claim 11
King teaches the electronic device of claim 1, wherein the instructions cause the processor to: 
transmit the second text to a display device via the communication circuit to provide at least part of the second text through a display included in the display device.
“In one embodiment, the user interface module 325 generates the user interface 400A, and similarly, user interfaces 400B-400D, and 500 described below. The user interface 400A includes . . . a portion of a context presentation card 420.” King ¶ 60.
III.	HUANG AND FU TEACH CLAIM 14.
Claim 14 is rejected under 35 U.S.C. § 103 as being unpatentable over Huang as applied to claim 13 above, and further in view of U.S. Patent Application Publication No. 2020/0160124 A1 (“Fu”).
Claim 14
Huang teaches the server of claim 13, 
wherein the memory includes at least one or more databases associated with the category, 
“Image coefficient library 366 can index the trained images according to categories of the objects in the trained images,” which are accordingly “used by 
Huang does not appear to explicitly disclose whether or not “the category includes an upper category and a lower category included in the upper category.”
wherein the category includes an upper category and a lower category included in the upper category,
“In some applications, the task of the image recognition is to determine a fine-grained category of the object in the image 170. For example, in the example of FIG. 1, the recognizing result 180 of the image recognition module 122 may include the text of ‘This is a red-bellied woodpecker’ to indicate a specific species of the bird included in the image 170,” in contrast to “image recognition of a general category (in which only a general category of ‘bird’ is recognized).” Fu ¶ 21.
wherein the memory includes at least one or more databases associated with the category, and
“The memory 120 can include an image recognition module 122 which is configured to perform functions of implementations of the subject matter described herein.” Fu ¶ 16. Specifically, as shown in FIG. 2, “the system includes a plurality of stacked learning networks 210 and 220,” which are “implemented as one or more functional sub-modules in the image recognition module 122.” Fu ¶ 27.
wherein the instructions cause the processor to: determine the upper category and the lower category sequentially.
“The image of the finer scale to be processed by the learning network 220 is dependent on the determination at the learning network 210.” Fu ¶ 27.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to improve Huang’s information content database with Fu’s stacked learning networks, such that the overall combined system would determine both an upper and lower category of an object in an image sequentially. One would have been motivated to combine Fu with Huang because the use of upper and lower categories leads to more accurate results when classifying an object. See Fu ¶ 3 (“Through this solution, it is possible to localize an image region at . 

CONCLUSION
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Justin R Blaufeld whose telephone number is (571)272-4372.  The examiner can normally be reached on M-F, 9:00am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Scott Baderman can be reached on (571) 272-3644.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  


Justin R. Blaufeld
Primary Examiner
Art Unit 2142



/Justin R. Blaufeld/Primary Examiner, Art Unit 2142