Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claims 1-20 have been considered but are moot because the arguments do not apply to the new rejection made below. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 5-7, 9-10, 12-13, 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Averbuch (US 20130212477) in view of Eide et al. (US 20080209480, hereinafter Eide) and Kim et al. (US 20140045484, hereinafter Kim.)
Regarding claim 1, “A device comprising: a processing system including a processor; and a memory accessible to the processor, the memory storing instructions that are executable by the processor to cause the processor to perform operations, the operations comprising” Averbuch teaches (¶0006) a system that integrates digital video content with object-oriented scripts (hotspots); (¶0017, ¶0023, and claim 11) a digital media reproduction device may be a computer (a computer implies processor, memory, and software.) 
As to “receiving media content” Averbuch teaches (¶0017 and ¶0030) digital media reproduction device 10 is capable of playing streamed media.
As to “obtaining an overlay file that includes overlay data… wherein the overlay data comprises first overlay data for a first resolution associated with a first version …of the media content” Averbuch teaches (¶0032) markers are defined a collection of data points that describe and facilitate the display of graphical markers on a display device in the context of 2D location within the dimensions (i.e., resolution) of the visualized on screen video content; (¶0037) the system can retrieve hotspot information, which contains the display information regarding to the item on the display device, such as the top-left correlating position and the width and height of the display window; (¶0028, ¶0046, ¶0032) hotspots/markers defining XY coordinates the location of the object in the frame/timestamp coordinates when the object first appears in the video; (¶0035) metadata layer includes coordinates; (¶0037) system retrieves the relevant FPID file and the metadata file from the storage and compares the media reproduction device’s timecode with the time code of the metadata, the system can also retrieve the hotspot (window display area/video embedded entity marker) information, which contains the display information regarding to the item on the display device, such as the top-left correlating position and the width and height of the display window; 
As to “receiving user-generated inputs relating to the first coordinate” Averbuch teaches (¶0038, ¶0028) when a user picks a video embedded entity (item), the authoring/playback software, based on the selection, searches the metadata layer and FPID files and retrieves the video embedded entity’s timestamp. If the timestamp matches, then the software gets the product ID and data-type.
Averbuch alone does not teach media content “wherein the media content is of a particular resolution associated with a particular version of multiple versions of the media content.” However, Eide teaches (¶0045) retrieve appropriate pixel grid map relevant to the video’s file format and resolution; pixel grid map is a transparent overlay on the video screen that identifies the X, Y coordinates of any object in a given video scene for a particular format and resolution.
Averbuch does not teach that the overlay file that includes “overlay data for the multiple versions of the media content” that the first resolution is associated with a first version “of the multiple versions of the media content and second overlay data associated with a second resolution of a second version of the multiple versions of the media content, and wherein the first overlay data comprises a first coordinate for an object in the first version of the media content and the second overlay data comprises a second coordinate for the object in the second version of the media content.” “However, Eide teaches (¶0045-¶0046) a pixel grid map 340 that is an overlay on the video screen that identifies the X, Y coordinates of any object in a given video scene, those coordinates are referenced by the database 220 to verify and track user selections of objects 650, and to appropriately track groups of related pixels that constitute a single object (i.e., entry associated with a particular object), such as a person or a vehicle; (¶0071) the system database 220 would maintain records of pixel grids (i.e., subset of entries/records) of multiple resolutions for any given video 510, for instance system can apply a pixel grid for 1024x768 resolution as well as a pixel grid for 320x240 resolution. Since, different pixel grid maps are for different resolutions and allow for the ‘same ability to interact with encoded objects’ the corresponding coordinates for a first and second resolution will inherently be different/scaled down.”
As to “selecting the first overlay data based on a determination that the particular resolution corresponds to the first resolution.” Eide teaches (¶0071) when a video is loaded in a media player a process queries the database which would identify whether an identical video has been registered in the database, if so the system would apply a known pixel grid to that video by recognizing the video’s screen resolution.  
As to “based on the selecting, providing the media content and first instructions to a display according to the first overlay data, wherein the display is communicatively coupled to the device, and wherein the display presents the first version of the media content on the display and presents the object according to the first coordinate of the first overlay data and the first instructions” Eide teaches (¶0071) and apply a pixel grid appropriate to the screen size, useful for technologies for portable video devices, such as iPod.RTM., cellular phones, PDAs, and other hand-held media players. Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the system that integrates digital video content with hotspots as taught by Averbuch with the pixel grids coordinate information for multiple resolutions/formats as taught by Eide in order to for overlay selections to be properly translated and supported across a plurality of devices, thus providing the service to a wider audience.
Averbuch and Eide do not teach “and based on the receiving the user-generated inputs relating to the first coordinate, causing a voice or video call connection to be established between the device and an external communication system and causing the display to present call identification information relating to the voice or video call connection.” However, ser may select the area where the phone number 510 for the automated order is displayed, the controller 240 may detect the phone number 510 for the automated order that is displayed on the screen through the coordinate value of the area that is designated by the input selection command, the controller 240 attempts an automated call connection to the corresponding home shopping merchant using the detected phone number 510 for the automated order. Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the system that integrates digital video content with hotspots as taught by Averbuch and Eide with the phone number selection as taught by Kim for the benefit of allowing the user to contact a merchant without having to manually attempt the connection/input numbers for the call.

Regarding claim 3, “The device of claim 1, wherein the operations further comprise providing second instructions to the display according to one of the user-generated inputs, and wherein the display highlights the object according to the second instructions.” Averbuch teaches (¶0033) that product display layer contains hotspots that allow the user to highlight the product(s) to obtain information (i.e., first user input performed regarding the object), to make a purchase selection (i.e., second user input performed regarding the object) and to perform other actions with the product (i.e., third+ user input performed regarding the object.) 

Regarding claim 5, “The device of claim 1, wherein the operations further comprise providing third instructions to the display according to one of the user-generated inputs, and wherein the display presents a menu of user actions according to the third instructions.” Averbuch teaches (¶0033) that product display layer contains hotspots that allow the user to highlight the product(s) to obtain information (i.e., first user input performed regarding the object), to make a purchase selection (i.e., second user input performed regarding the object) and to perform other actions with the product (i.e., third+ user input performed regarding the object); (¶0047) compact gallery with additional information.

Regarding claim 6, “The device of claim 5, wherein the user actions comprise social network actions, sharing a portion of the media content, transmitting an advertisement regarding the media content, or a combination thereof.” Averbuch teaches (¶0047) displayed additional information can be shared using social media. 

Regarding claim 7, “The device of claim 1, wherein the operations further comprise providing object information to the display according to one of the user-generated inputs, and wherein the display presents the object information.” Averbuch teaches (¶0047) displaying additional object information.

Regarding claim 9, “The device of claim 1, wherein the operations further comprise providing fourth instructions to the display according to one of the user-generated inputs, and wherein the display pauses playback of the media content according to the fourth instructions.” Averbuch teaches (¶0036) users may click on the “info” key, to pause the media reproduction device playback operation and another key to invoke the authoring/playback software, the program checks for an updated metadata file; (¶0033, ¶0035, and ¶0036) product display layer contains hotspots that allow a user to pause the video playback and highlight the products to obtain additional information.

Regarding claim 10, A non-transitory, machine-readable medium, comprising executable instructions that, when executed by a processing system including a processor, facilitate performance of operations, the operations comprising” Averbuch teaches (¶0006) a system that integrates digital video content with object-oriented scripts (hotspots); (¶0017, ¶0023, and claim 11) system includes a digital media reproduction device that may be a computer (a computer implies processor, memory, and software.)
As to “receiving media content,” Averbuch teaches (¶0017 and ¶0030) digital media reproduction device 10 is capable of playing streamed media.
As to “obtaining an overlay file that includes overlay data… wherein the overlay data comprises first overlay data for a first resolution associated with a first version …of the media content” Averbuch teaches (¶0032) markers are defined a collection of data points that describe and facilitate the display of graphical markers on a display device in the context of 2D location within the dimensions (i.e., resolution) of the visualized on screen video content; (¶0037) the system can retrieve hotspot information, which contains the display information regarding to the item on the display device, such as the top-left correlating position and the width and height of the display window; (¶0028, ¶0046, ¶0032) hotspots/markers defining XY coordinates the location of the object in the frame/timestamp coordinates when the object first appears in the video; (¶0035) metadata layer includes coordinates; (¶0037) system retrieves the relevant FPID file and the metadata file from the storage and compares the media reproduction device’s timecode with the time code of the metadata, the system can also retrieve the hotspot (window display area/video embedded entity marker) information, which contains the display information regarding to the item on the display device, such as the top-left correlating position and the width and height of the display window.
As to “obtaining user-generated inputs relating to the first coordinate” Averbuch teaches (¶0038, ¶0028) when a user picks a video embedded entity (item), the authoring/playback software, based on the selection, searches the metadata layer and FPID files and retrieves the video embedded entity’s timestamp. If the timestamp matches, then the software gets the product ID and data-type.
As to “providing second instructions to the display according to a first one of the user-generated inputs, wherein the display highlights the object according to the second instructions.” Averbuch teaches (¶0033) that product display layer contains hotspots that allow the user to highlight the product(s) to obtain information (i.e., first user input performed regarding the object), to make a purchase selection (i.e., second user input performed regarding the object) and to perform other actions with the product (i.e., third+ user input performed regarding the object.)
Averbuch alone does not teach media content “wherein the media content is of a particular resolution associated with a particular version of multiple versions of the media content.” However, Eide teaches (¶0045) retrieve appropriate pixel grid map relevant to the video’s file format and resolution; pixel grid map is a transparent overlay on the video screen that identifies the X, Y coordinates of any object in a given video scene for a particular format and resolution.
Averbuch does not teach that the overlay file that includes “overlay data for the multiple versions of the media content” that the first resolution is associated with a first version “of the multiple versions of the media content and second overlay data associated with a second resolution of a second version of the multiple versions of the media content, and wherein the first overlay data comprises a first coordinate for an object in the first version of the media content and the second overlay data comprises a second coordinate for the object in the second version of the media content.” “However, Eide teaches (¶0045-¶0046) a pixel grid map 340 that is an overlay on the video screen that identifies the X, Y coordinates of any object in a given video scene, those coordinates are referenced by the database 220 to verify and track user selections of objects 650, and to appropriately track groups of related pixels that constitute a single object (i.e., entry associated with a particular object), such as a person or a vehicle; (¶0071) the system database 220 would maintain records of pixel grids (i.e., subset of entries/records) of multiple resolutions for any given video 510, for instance system can apply a pixel grid for 1024x768 resolution as well as a pixel grid for 320x240 resolution. Since, different pixel grid maps are for different resolutions and allow for the ‘same ability to interact with encoded objects’ the corresponding coordinates for a first and second resolution will inherently be different/scaled down.”
As to “selecting the first overlay data based on a determination that the particular resolution corresponds to the first resolution.” Eide teaches (¶0071) when a video is loaded in a media player a process queries the database which would identify whether an identical video has been registered in the database, if so the system would apply a known pixel grid to that video by recognizing the video’s screen resolution.  
As to “based on the selecting, providing the media content and first instructions to a display according to the first overlay data, wherein the display is communicatively coupled to the processing system, and wherein the display presents the first version of the media content on the display and presents the object according to the first coordinate of the first overlay data and the first instructions” Eide teaches (¶0071) and apply a pixel grid appropriate to the screen size, useful for technologies for portable video devices, such as iPod.RTM., cellular phones, PDAs, and other hand-held media players. Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the system that integrates digital video content with hotspots as taught by Averbuch with the pixel grids coordinate information for multiple resolutions/formats as taught by Eide in order to for overlay selections to be properly translated and supported across a plurality of devices, thus providing the service to a wider audience.
Averbuch and Eide do not teach “and based on a second one of the user-generated inputs, causing a voice or video call connection to be established between the device and an external communication system and causing the display to present call identification information relating to the voice or video call connection.” However, ser may select the area where the phone number 510 for the automated order is displayed, the controller 240 may detect the phone number 510 for the automated order that is displayed on the screen through the coordinate value of the area that is designated by the input selection command, the controller 240 attempts an automated call connection to the corresponding home shopping merchant using the detected phone number 510 for the automated order. Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the system that integrates digital video content with hotspots as taught by Averbuch and Eide with the phone number selection as taught by Kim for the benefit of allowing the user to contact a merchant without having to manually attempt the connection/input numbers for the call.

Regarding claim 12, its rejection is similar to claim 5.

Regarding claim 13, its rejection is similar to claim 6.

Regarding claim 16, its rejection is similar to claim 9.

Regarding claim 17, its rejection is similar to claims 1 and 5.

Claims 2, 11, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Averbuch, Eide, and Kim in view of Vehovsky et al. (US 20140325359, hereinafter Vehosvsky)
Regarding claim 2, Averbuch, Eide, and Kim do not teach “The device of claim 1, wherein the operations further comprise based on the receiving the user-generated inputs relating to the first coordinate, causing the display to present social media information usable to effect transmission of at least a portion of the media content to a social network server; after the causing the display to present the social media information, receiving, at a particular playback time of the media content, a subsequent user-generated input that selects the social media information; and based on the receiving the subsequent user-generated input, generating a clip of the media content, wherein the clip includes only a portion of the media content that spans from a first playback time that is a first threshold time prior to the particular playback time to a second playback time that is a second threshold time after the particular playback time.” However, Vehovsky teaches (¶0047) the user may select a start time segment, e.g., 2:10, of the video playing and an end time segment, e.g., 2:25 seconds, of the playing video. The quote component 29 may grab a copy of the video between 2:10 seconds and 2:25 seconds and allow the user to share the grabbed video and share the selected video via the user's social media accounts. Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the system that integrates digital video content with hotspots as taught by Averbuch, Eide, and Kim with the sharing the selected video clip via social media accounts as taught by Vehovsky for the benefit of allowing the user to enjoy their favorite media moments with friends.

Regarding claim 11, its rejection is similar to claim 2.

Regarding claim 18, its rejection is similar to claim 2.

Claims 4, 14, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Averbuch, Eide, and Kim in view of Abecassis (US 20130163960.)
Regarding claim 4, “The device of claim 1, wherein the object is associated with a certain playback time in the first version of the multiple versions of the media content,.” Averbuch teaches (¶0032) the product display layer is a specific interface format of interface layered on top of and time-synched with the video content, which is also spatially and temporally synched with hotspots in the video content
Averbuch, Eide, and Kim do not teach “wherein the object is associated with the certain playback time in the second version of the multiple versions of the media content, and wherein the object is associated with a playback time in a third version of the multiple versions of the media content that is different from the certain playback time.” However, Abecassis teaches (¶0004, ¶0053, ¶0161, ¶0188) one of a plurality of different content versions (e.g., a director’s cut or unrated version and an “R” rated version) and “variable content video” a video characterized by a nonlinear architecture (if the sequences are nonlinear the items objects appearing in them are also nonlinear); (¶0060-¶0061) identification of objects/items on screen; (¶0058) teachings applied to a variable content video. Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the system that integrates digital video content with hotspots as taught by Averbuch, Eide, and Kim with the multiple version/variable content video as taught by Vehovsky for the benefit of providing both kids enjoying a PG rated version and adults enjoying an R rated version additional information.

Regarding claim 14, its rejection is similar to claim 4.

Regarding claim 19, its rejection is similar to claim 4.

Claims 8, 15, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Averbuch, Eide, and Kim in view of Cai (US 20120304065.)  
Regarding claim 8, Averbuch, Eide, and Kim do not teach “The device of claim 1, wherein the receiving of the user-generated inputs comprises at least: detecting a pointing device hovering at a particular location on the display for a time period; and determining that the time period is above a time threshold.” However, Cai teaches (¶0024 and ¶0025) performing a hovering action over a trigger point causes a popup layer to appear; (¶0027) that the popup layer includes content associated with the trigger point. Therefore, it would have been an obvious to a person of ordinary skill in the art before the effective filing date of the invention to combine the method of Averbuch, Eide, and Kim with the hovering action as taught by Cai in order allow users to browse the tagged information with ease by requiring less effort from user, thus improving the user’s experience.

Regarding claim 15, its rejection is similar to claim 8.

Regarding claim 20, its rejection is similar to claim 8.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Rakib et al. (US 20090327894) – (¶0080) the media router may be configured to send an edited or redacted version of the video program to the user's remote control. For example, a user may indicate a high interest in an object, and wish to only view video that contains the object. In this case the media router may receive the request, search its memory for video scenes where the associated video metadata indicates that the object is present, and send just these scenes

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FRANK J JOHNSON whose telephone number is (571)272-9629.  The examiner can normally be reached on 9:00AM-3:00PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Brian T. Pendleton can be reached on 571-272-7527.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Frank Johnson/Examiner, Art Unit 2425              

/Brian T Pendleton/Supervisory Patent Examiner, Art Unit 2425