DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on December 22, 2021 has been entered.
 
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-21 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the 

Claims 1, 7-8, 10-11, 17-18, and 20-21 is/are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over a combination of Lanfermann et al. (US 2008/0229363), Pratt et al. (US 2011/0131605), Moon et al. (US 2008/0320546), El-Saban et al. (US 2011/0295851), and McCoskey et al. (US 2003/0028889).

Regarding claim 1, Lanfermann teaches a system comprising:
a memory ([0026], [0043]); and
a hardware processor ([0043], Fig. 1) that, when executing computer-executable instructions stored in the memory, is configured to:
receive an image from a video content item ([0037], “When the user 105 is interested in a particular scene or objects in a scene, the object recognizer 101 receives a command from the user 105 to ‘freeze’ the TV scene. This can e.g. be done by pressing a ‘HyperInfo’ button on the remote control. The objects in the scene, which are recognized by the object recognizer 101, are then highlighted for the user 105. The user can now choose and select one or more of these highlighted objects, where after a virtual channel or additional information relating to the selected objects are made available to the user e.g. by displaying the video content, music or music play-list, or the additional information via the virtual channel to the user on the TV screen.”);
determine that the image includes a plurality of items displayed within the image ([0037], “The object recognizer 101 analyzes the objects 
transmit a plurality of search queries to one or more sources that search for known items that each correspond to one of the plurality of items displayed in the image while concurrently providing an initial search indication, for each of the plurality of items displayed in the image, indicating a search status for that item ([0037], “When the user 105 is interested in a particular scene or objects in a scene, the object recognizer 101 receives a command from the user 105 to ‘freeze’ the TV scene. This can e.g. be done by pressing a ‘HyperInfo’ button on the remote control. The objects in the scene, which are recognized by the object recognizer 101, are then highlighted for the user 105.” [0039], “By pressing a designated ‘HyperInfo’ button on the remote control, the objects 203, 204 which are recognized by the object recognizer 101 are highlighted as illustrated by the solid lines surrounding the cat and the dog in FIG. 2b.”);
determine, based on the search, that the plurality of items displayed in the image includes a first item having a known identity and a second item ([0037], “The object recognizer 101 analyzes the objects which are being displayed 104 for the user 105, in this example on a TV screen 106. … The objects in the scene, which are recognized by the object recognizer 101, are then highlighted for the user 105.” [0042]), and


Lanfermann does not expressly teach that the second item has an unknown identity. Lanfermann also does not expressly teach that the second item is associated with a plurality of potential identities. Lanfermann also does not expressly teach that the identification indication for the unknown second item includes a query to receive additional information that is used to identify the unknown second item. Lanfermann also does not expressly teach that the identification indication for the unknown second item includes a prompt for user selection of an identity from the plurality of potential identities. Lanfermann also does not expressly teach causing the image from the video content item to be presented while providing an auditory feedback of the first item having the known identifier and the second item having the plurality of potential identities.
Pratt teaches an item having an unknown identity ([0017], “When an item depicted in the media content (e.g., the second item 120) does not have an associated metadata tag, the user may select the item using the remote control device 132 or the input device 130. The set top box device 102 may capture an image 151 of the second 
Pratt also teaches a query to receive additional information that is used to identify the unknown second item ([0023], “The set top box device 102 may receive a third selection 144 indicating a request for information about an item that does not have an associated metadata tag, such as the second item 120. … After identifying the image 151, the identification server 104 may retrieve the information 152 associated with the identified image 151. For example, the identification server 104 may retrieve the information 152 from a database (not shown). The identification server 104 may send the information 152 to the set top box device 102 for display at the display device 110.” Figs. 1-2).
In view of Pratt’s teaching, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lanfermann to such that the second item may have unknown identity, and such that a query is presented with the second item to receive additional information. The modification would enable a user to request and receive additional information regarding unidentified objects in video content. The modification would serve to improve the overall user experience.

Moon teaches an identification indication for an item includes a query to receive additional information that is used to identify the item ([0008], “During the commercial, a graphical user interface (GUI) component 102 may appear, such as an overlay, to notify the TV viewer that he/she may capture a snapshot of the commercial.” [0009], “The snapshot search system may identify an object within the snapshot.” [0041], [0045], “snapshot search system 242 may provide search results to the user. The search results may be displayed on video display 214 and may include interactive content. In one implementation, the search results may include an image of the object (e.g, a picture or a video clip) and/or text information relating to the object. The text information may include, for example, where to purchase the object, price information, or details relating to the object (e.g., available colors, models, or features of the object).” [0049], “Snapshot search system 242 may include indicators identifying the objects that the TV 
In view of Moon’s teaching, it would have been obvious to one of ordinary skill in the art at the time the invention was made to modify the combination such that the identification indication for the unknown second item includes the query to receive additional information that is used to identify the unknown second item. The modification would serve to facilitate user identification and selection of unknown items. The modification would further serve to improve the user experience.
The combination teaches the limitations specified above; however, the combination does not expressly teach that the second item is associated with a plurality of potential identities. The combination also does not expressly teach that the identification indication for the unknown second item includes a prompt for user selection of an identity from the plurality of potential identities. The combination also does not expressly teach causing the image from the video content item to be presented while providing an auditory feedback of the first item having the known identifier and the second item having the plurality of potential identities.
El-Saban teaches:
an item is associated with a plurality of potential tags, and the item includes a prompt for user selection of a tag from the plurality of potential tags ([0057], [0058], “In 522, the display shows the captured media; here a still image of a tree. Tabs 524 allow a user to toggle between a view of suggested tags and 
Considering El-Saban with the references of the combination, it would have been obvious to one of ordinary skill in the art at the time the invention was made to modify the combination such that the second item may be associated with a plurality of potential identities, and such that the identification indication for the unknown second item includes a prompt for user selection of an identity from the plurality of potential identities. The modification would allow users to easily associate identities with objects, and would additionally serve to facilitate the search and retrieval of media objects (El-Saban: Abstract).
The combination teaches the limitations specified above; however, the combination does not expressly teach causing the image from the video content item to be presented while providing an auditory feedback of the first item having the known identifier and the second item having the plurality of potential identities.

In view of McCoskey’s teaching, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination causing the image from the video content item to be presented while providing an auditory feedback of the first item having the known identifier and the second item having the plurality of potential identities. The modification would allow a combined system to inform users regarding the status of a search via an audible notification. The modification would serve to improve the overall user experience.

Claim 11 is rejected over the same grounds of rejected presented with respect to claim 1.

Regarding claim 21, Lanfermann teaches a non-transitory computer-readable medium containing computer executable instructions ([0026], [0043], i.e., non-transitory computer readable media is inherent to a computer). The rejection of claim 1 is similarly applied to the remaining limitations of claim 21.



Regarding claims 8 and 18, the combination further teaches wherein the plurality of items displayed in the image comprise at least one of a human face, an object, and a scene (Lanfermann: Figs. 2a-d; Pratt: Figs. 1-2).

Regarding claims 10 and 20, the combination further teaches wherein the hardware processor is further configured to present a number of the plurality of items displayed within the image concurrently with the image (Lanfermann: [0037], “The object recognizer 101 analyzes the objects which are being displayed 104 for the user 105, in this example on a TV screen 106. … The objects in the scene, which are recognized by the object recognizer 101, are then highlighted for the user 105.” [0042], Fig. 2B, that is, a number of highlighted objects are presented within the image concurrently with the image. Pratt: Figs. 1-2).

Claims 2-3 and 12-13 is/are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over a combination of Lanfermann, Pratt, Moon, El-Saban, McCoskey, and Watanabe (US 2009/0262213).


Watanabe provides a teaching for a user interface wherein an initial visual indication is modified from an initial visual state indicating that a detection process is ongoing to a modified visual state indicating that the detection process has been completed ([0085], [0089], “in FIG. 6, the framed portions in solid lines 300a, 300b show successfully followed faces in the frames while the framed portion in dashed line 301 represents a framed region including an unsuccessfully followed face which is under the detection in Step S24 (detection is performed again).” [0088], “The display representing that the unsuccessfully followed face is being detected is kept on while the processing in Steps S21 to S28, that is, the face detection in the stored area including the unsuccessfully followed face is being performed.” [0092], [0085], “Unless the counter value R is zero (Step S24), the detection is performed in the region including the unsuccessfully followed face stored in Step S22 (Step S25).” Figs. 6-7). 
The examiner submits that Watanabe would have at least suggested to one having ordinary skill that, upon successful detection of the unsuccessfully followed face ([0085]-[0087], Figs. 6-7), the dashed line representing the unsuccessfully followed face would change to a solid line representing a successfully followed face.
In view of Watanabe, it would have been obvious to one of ordinary skill in the art at the time the invention was made to modify the combination such that the initial search 

Regarding claims 3 and 13, the combination teaches the limitations specified above; however, the combination does not expressly teach that the identification indication associated with the first item is different than the identification indication associated with the second item.
Watanabe provides a teaching for an identification indication associated with a first item being different than an identification indication associated with a second item ([0085], [0089], “in FIG. 6, the framed portions in solid lines 300a, 300b show successfully followed faces in the frames while the framed portion in dashed line 301 represents a framed region including an unsuccessfully followed face which is under the detection in Step S24 (detection is performed again).” [0088], “The display representing that the unsuccessfully followed face is being detected is kept on while the processing in Steps S21 to S28, that is, the face detection in the stored area including the unsuccessfully followed face is being performed.” [0092], [0085], “Unless the counter value R is zero (Step S24), the detection is performed in the region including the unsuccessfully followed face stored in Step S22 (Step S25).” Fig. 6, framed portions 300a and 300b are solid line frames, and framed portion 301 is a dashed line).
.

Claims 4-5 and 14-15 is/are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over a combination of Lanfermann, Pratt, Moon, El-Saban, McCoskey, and Bishop (US 2013/0054613).

Regarding claims 4 and 14, the combination further teaches:
wherein the identification indication associated with the first item includes a first highlight region positioned in connection with the first item; and the identification indication associated with the second item includes a second highlight region positioned in connection with the second item (Lanfermann: [0037], “The object recognizer 101 analyzes the objects which are being displayed 104 for the user 105, in this example on a TV screen 106. … The objects in the scene, which are recognized by the object recognizer 101, are then highlighted for the user 105.”).
However, the combination teaches the limitations specified above; however, the combination does not expressly teach the first highlight region is associated with a first 
Bishop provides a teaching for analyzing electronic documents (abstract), including video content (abstract, [0002], [0007], [0080], [0100], [0136]-[0148] Fig. 13), and highlighting key video content ([0034]). Bishop additionally teaches that highlighting may comprise using different colors based on a level of confidence of a match ([0058], “Also, the highlighting (e.g., using color(s) with regard to the text or the portion of the UI screen on which the text is displayed) or emphasizing (e.g., bolding, italicizing, or changing the size) of the item(s) of key-content can be varied (e.g., using different colors, different types of highlighting or emphasis), based at least in part on the level of confidence (e.g., green indicates high level of confidence or exact match, yellow indicates a medium level of confidence, and red indicates a low level of confidence) there is that the identified item(s) of key-content is associated with a tag word or tag phrase in the data store 314…”).
In view of Bishops teaching, it would have been obvious to one of ordinary skill in the art at the time the invention was made to modify the combination to modify the combination such that the first highlight region is associated with a first color to signify that the first item has the known identity, and the second highlight region is associated with a second color to signify that the second item has the unknown identity. The modification would serve to improve the system by providing an intuitive means for indicating to a user whether an object is known or unknown. The modification would thereby improve user convenience, and would additionally facilitate user operation.

the identification indication associated with the first item includes a first highlight region positioned in connection with the first item; the identification indication associated with the second item includes a second highlight region positioned in connection with the second item (Lanfermann: [0037], “The object recognizer 101 analyzes the objects which are being displayed 104 for the user 105, in this example on a TV screen 106. … The objects in the scene, which are recognized by the object recognizer 101, are then highlighted for the user 105.”).
However, the combination does not expressly teach that the second item having the unknown identity includes a plurality of known identities; the first highlight region is associated with a first color to signify that the first item has the known identity; and the second highlight region is associated with a second color to signify that the second item has the plurality of known identities.
Bishop provides a teaching for analyzing electronic documents (abstract), including video content (abstract, [0002], [0007], [0080], [0100], [0136]-[0148] Fig. 13), and highlighting key video content ([0034]). Bishop also teaches:
wherein a second item having an unknown identity includes a plurality of known identities ([0058], “Similarly, the EIMC 316 can identify potential keywords or keyphrases, even when misspelled, in the electronic document, and the potential keywords or keyphrases can be highlighted or emphasized to indicate that such potential keywords or keyphrases may be a match to a tag, but the level of confidence is lower because the potential keywords or keyphrases were not an exact match to a stored tag.”); 

the second highlight region is associated with a second color to signify that the second item has the plurality of known identities ([0058], “Also, the highlighting…can be varied (e.g., using different colors, different types of highlighting or emphasis), based at least in part on the level of confidence (e.g., green indicates high level of confidence or exact match, yellow indicates a medium level of confidence, and red indicates a low level of confidence) there is that the identified item(s) of key-content is associated with a tag word or tag phrase in the data store 314 or to differentiate one potential item(s) of key-content from another item(s) of key-content in the electronic document.” … Similarly, the EIMC 316 can identify potential keywords or keyphrases, even when misspelled, in the electronic document, and the potential keywords or keyphrases can be highlighted or emphasized to indicate that such potential keywords or keyphrases may be a match to a tag, but the level of confidence is lower because the potential keywords or keyphrases were not an exact match to a stored tag.” That is, highlights indicating a lower level of confidence additionally indicate that there are multiple potential matches.).

The modification would serve to improve the system by providing an intuitive means for indicating to a user whether an object is known or unknown. The modification would thereby improve user convenience, and would additionally facilitate user operation.

Claims 6 and 16 is/are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over a combination of Lanfermann, Pratt, Moon, El-Saban, McCoskey, and Kikinis (US 5929849).

Regarding claims 6 and 16, the combination further teaches wherein the hardware processor is further configured to:
receive a selection of the first item or the second item (Lanfermann: [0037], “The user can now choose and select one or more of these highlighted objects”. [0040], [0042]; Pratt: [0017], “the user may select the first item 118…” [0018]-[0023], [0036], Figs. 1-2);
retrieve information relating to the selected item (Lanfermann: [0037], “The user can now choose and select one or more of these highlighted objects, where after a virtual channel or additional information relating to the selected objects 
provide display data, on a display device, based on the retrieved information (Lanfermann: [0037], “The user can now choose and select one or more of these highlighted objects, where after a virtual channel or additional information relating to the selected objects are made available to the user e.g. by displaying the video content, music or music play-list, or the additional information via the virtual channel to the user on the TV screen.”).
The combination teaches the limitations specified above; however, the combination as combined does not expressly teach that the display data is provided in an overlay of the image of the video content item, and that the display data includes a link that directs the display device to supplemental information relating to the selected item.
Pratt teaches providing display data in an overlay of an image of a video content item ([0019], “For example, after the user selects an item, the user may be presented with a menu 126 that provides a set of options, such as purchasing the first item 118 or obtaining more information about the first item 118.” [0022], Fig. 1).
In view of Pratt, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination such that the display data is provided in an overlay of the image of the video content item. The modification would improve the combination by facilitating access to information for unknown items.

Kikinis provides a teaching wherein display data includes a link that directs a display device to supplemental information relating to a selected item (abstract; Col. 5, lines 17-27, “In embodiments of the present invention, individual images in TV presentations, such as persons, objects, and the like, are linked with Universal Resource Locators (URLs) in a manner that a viewer may select such images, and by so doing, invoke a linked URL, which leads to a WEB location providing information related to the image.” Col. 7, lines 56-67, “If the viewer is interested in additional information, he/she may manipulate the cursor to touch the region of emblem 57 and then actuate a selection signal, such as pressing one of the buttons 69 on the remote. On receipt of the selection signal with the cursor touching the BMW emblem, the system executes browser routines, accessing the WWW, and dials up the WEB server (see server 54 and modem 35 or 39, FIG. 1)”. Figs. 2A, C).
In view of Kikinis’ teaching, it would have been obvious to one of ordinary skill in the art at the time the invention was made to modify the combination such that the display data includes a link that directs the display device to supplemental information relating to the selected item. The modification would enhance the combined system by providing a convenient means by which users may access a website related to a selected item.

Claims 9 and 19 is/are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over a combination of Lanfermann, Pratt, Moon, El-Saban, McCoskey, and Hardacker et al. (US 2010/0037264).

Regarding claims 9 and 19, the combination further teaches wherein the hardware processor is further configured to
divide the image into a plurality of regions (Pratt: [0037], “When the user is interested in one or more items depicted in the media content 260 displayed at the display device 210, the display request 254 may instruct the display device 210 to display a grid 211.” [0038]),
detect features in each region of the plurality of regions, analyze detected features in each region (Pratt: [0037], “in the grid 211, the user may use the remote control device 232 or the input device 230 to input the grid coordinates 248 to select the second item 220 located at the coordinates B2.” [0038], “When the set top box device 202 receives the grid coordinates 248, the set top box device 202 may acquire the image 251 of the item (e.g., using a screen capture) and send the request 250 including the image 251 to the identification server 204. The identification server 204 may receive the request 250 and retrieve the information 252 from the database 270 based on the metadata tag 224 or based on the image 251. The identification server 204 may send the information 252 to the set top box device 202 for display at the display device 210.”).

Hardacker provides a teaching for identifying a text region within an image, and determining whether detected features comprise textual information ([0239], “A control device consistent with certain embodiments invokes a command to an access device that causes the access device to produce a frame of video containing text for display on a video display. A user interface permits a user to select text from the frame of video displayed on the video display. A program running on a processor extracts the selected text from the video frame containing text by optical character recognition (OCR) processing of the selected text from the video frame.”).
In view of Hardacker’s teaching, it would have been obvious to one of ordinary skill in the art at the time the invention was made to modify the combination to include identifying a text region within the image, and determining whether the detected features comprise textual information. The modification would provide a convenient means for users to search text in images, thereby enhancing users’ level of convenience.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Wong (US 2011/0282906) discloses a system for performing an image-based search using an image captured from media content ([0004]).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL R TELAN whose telephone number is (571)270-5940. The examiner can normally be reached 9:30AM-6:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nasser Goodarzi can be reached on (571) 272-4195. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL R TELAN/           Primary Examiner, Art Unit 2426