DETAILED ACTIONNotice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 7, 11-13, 15-17, 19-21 and 24-26 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.

4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 11, 12, 15, 16, 17, 19, 20, 24-26 and 28-32 is/are rejected under 35 U.S.C. 103 as being unpatentable over Imamura et al (2016/0342388) (herein “Imamura”) in view of Visser et al (2018/0020312) (herein “Visser”) and further in view of Zalewski et al (9,911,290) (herein “Zalewski).	In regards to claim 1, Imamura teaches a method for providing sound to a user, the method comprising: capturing an image of one or more real-world objects in a field-of-view of the user using an image sensor (See; p[0028] for HMD 1 with image capture unit 10 including for example a camera); detecting a direction of an HMD using detection hardware (See; p[0028] for image capture unit 10 photographs real space inside a predetermined angular field of view); selecting an object from the one or more real-world objects in the image  (See; p[0029] for locating AR markers included in an image); performing image recognition on the selected object in the image using a processor to recognize the selected object (See; p[0029] for image recognition unit 11); receiving sensor data from a hardware sensor other than the image sensor (See; p[0028]-p[0032] for image capture unit capturing the AR marker for determining the outputted information. Further See; p[0046] for sensor 109 which can obtain the current position using for example, a GPS. Where the display unit may display object data in accordance with the sensed location obtained from the GPS instead of displaying object data in accordance with an AR marker); determining information related to the selected object based on the sensor data (See; p[0046] for sensor 109 which can obtain the current position using for example, a GPS. Where the display unit may display object data in accordance with the sensed location obtained from the GPS instead of displaying object data in accordance with an AR marker); using a user-customizable association between object information and a plurality of pre-stored sounds to select choosing a sound from the plurality of pre-stored sounds based on the information related to the selected object, wherein See; p[0029]-p[0031] for an object data management table 2 which contains object data such as audio data / information associated with the AR marker); retrieving a digital representation of the sound; and rendering the digital representation of the sound to the user (See; p[0031], p[0038] where audio data may be output to the user via a speaker). Imamura fails to explicitly teach capturing an image of two or more real-world objects in a field-of-view of the user using an image sensor, detecting a gaze direction of an eye of the user using gaze detection hardware; selecting an object from the two or more real-world objects in the image based on the gaze direction.	However, Visser teaches capturing an image of two or more real-world objects in a field-of-view of the user using an image sensor, detecting a gaze direction of an eye of the user using gaze detection hardware; selecting an object from the two or more real-world objects in the image based on the gaze direction (See; Fig. 3, p[0056]-p[0058], p[0060-p[0061] for a mixed reality scene from a viewpoint of a headset capturing at least two real world objects in trees 340 and 342. Selecting tree 342 based on the user’s gaze direction and generating a virtual sound (singing tree) to be perceived by the user as originating at the real world tree 342); obtaining information related to the selected object based on the recognizing (a singing tree is related to the recognized real world tree); using a user-customizable association between object information and a plurality of pre-stored sounds to select choosing a sound from the plurality of pre-stored sounds based on the information related to the selected object, wherein the object information includes information related to the object (See; p[0060] -p[0061] where the sound characteristics of the virtual sound may be determined by a table lookup operation). 	Therefore it would have been obvious to one of ordinary skill in the art at the time of the invention to modify Imamura with the gaze based sound selection technique of Visser so as to give the user more control over which objects to select for audio output when multiple objects are present in the user’s field of view.  The combination of Imamura and Visser fails to explicitly teach the sensor data See; Column 11, lines 54-57 for providing sounds to a user), detecting a gaze direction of an eye of the user using gaze detection hardware, selecting an object from two or more real world objects in the image based on the gaze direction (See; Fig. 56 and Column 13, lines 40-60 for selecting products based on a user’s gaze), receiving sensor data from a hardware sensor other than an image sensor, the sensor data based on a temperature, pressure, state of matter, mass or velocity of the selected object (See; Column 12, lines 30-63 where heat sensors, weight sensors, motion sensors, proximity sensors, etc. can be integrated in the system), determining information related to the selected object based on the sensor data and rendering a pre-stored sound to the user (See; Column 12, lines 30-63 where the product can be detected to have been lifted from the shelf (determined information) such as by a weight sensor. Further See Column 124, line 62 - Column 125, line 12 where when the specific product is being interacted with or moved, real time information can be sent to the user such as dietary information specific to the product held. Thus in this example a weight sensor detects the removal of the product and plays dietary information audio in response to the user).	Therefore it would have been obvious to one of ordinary skill in the art at the time of the invention to modify Imamura’s additional sensor to be a heat, weight (mass) or velocity sensor so as to collect further information about the selected object that cannot be identified via an image sensor, thus increasing the interactivity of the sensor network with the user.
	In regards to claim 11, Visser teaches wherein said rendering of the digital representation of the sound is performed to make the sound non-intrusive to the user (See; p[0037] where volumes and direction of arrival of virtual audio is set based on the distance of the object so as to be non-intrusive to the user).	In regards to claim 12, Visser teaches wherein said non-intrusive rendering of the digital representation of the sound is mixed to be a background sound with other sounds (See; p[0037] where volumes and direction of arrival of virtual audio is set based on the distance of the object so as to be non-intrusive to the user. Thus items which are in the background will have their volumes set accordingly to properly mix with sounds that are closer to the user).	In regards to claim 15, Imamura teaches wherein the sound is selected to convey the information to the user (See; p[0030]-p[0031] for audio information conveyed to the user).	In regards to claim 16, Imamura teaches wherein the information is related to something hidden from view of the user (See; p[0030]-p[0031] for audio information, where audio information is inherently hidden from view of the user. Further the type of information output does not have patentable weight and is a mere design choice based on the given device).	In regards to claim 17, Imamura teaches an article of manufacture comprising a tangible medium, that is not a transitory propagating signal, encoding computer-readable instructions that, when applied to a computer system, instruct the computer system to perform a method comprising: capturing an image of at least a portion of a field-of-view of the user using a sensor (See; p[0028] for HMD 1 with image capture unit 10); detecting a direction of an HMD using detection hardware (See; p[0028] for image capture unit 10 photographs real space inside a predetermined angular field of view); establishing respective positions of the one or more real-world objects in the image (See; p[0029] where the AR markers may have position data); selecting an object from the one or more real-world objects in the image based on the HMD direction and the respective portion of the one or more real-world objects See; p[0029] for locating and selecting AR markers included in an image); performing image recognition on the selected object in the image to recognize the selected object (See; p[0029] for image recognition unit 11); receiving sensor data from a hardware sensor other than the image sensor (See; p[0028]-p[0032] for image capture unit capturing the AR marker for determining the outputted information. Further See; p[0046] for sensor 109 which can obtain the current position using for example, a GPS. Where the display unit may display object data in accordance with the sensed location obtained from the GPS instead of displaying object data in accordance with an AR marker); determining information related to the selected object based on the sensor data (See; p[0046] for sensor 109 which can obtain the current position using for example, a GPS. Where the display unit may display object data in accordance with the sensed location obtained from the GPS instead of displaying object data in accordance with an AR marker); selecting a sound from a plurality of pre-stored sounds based on the information related to the selected object (See; p[0029]-p[0031] for an object data management table 2 which contains object data such as audio data / information associated with the AR marker); and playing the selected sound to the user (See; p[0031], p[0038] where audio data may be output to the user via a speaker). Imamura fails to explicitly teach detecting a gaze direction of an eye of the user using gaze detection hardware; establishing respective positions of two or more real-world objects in the image.	However, Visser teaches detecting a gaze direction of an eye of the user using gaze detection hardware; establishing respective positions of two or more real-world objects in the image, selecting an object from the two or more real-world objects in the image based on the gaze direction and the respective positions of the two or more real-world objects (See; Fig. 3, p[0056]-p[0058], p[0060-p[0061] for a mixed reality scene from a viewpoint of a headset capturing at least two real world objects in trees 340 and 342. Selecting tree 342 based on the user’s gaze direction and generating a virtual sound (singing tree) to be perceived by the user as originating at the real world tree 342); obtaining information related to the selected object based on the recognizing (a singing tree is related to the recognized real world tree); selecting a sound from a plurality of pre-stored sounds based on the information related to the selected object and playing the selected sound (See; p[0060] -p[0061] where the sound characteristics of the virtual sound may be determined by a table lookup operation before being output to the user). 	Therefore it would have been obvious to one of ordinary skill in the art at the time of the invention to modify Imamura with the gaze based sound selection technique of Visser so as to give the user more control over which objects to select for audio output when multiple objects are present in the user’s field of view.  The combination of Imamura and Visser fails to explicitly teach the sensor data based on a temperature, pressure, state of matter, mass or velocity of the selected object.	However Zalewski teaches a method for providing sounds to a user (See; Column 11, lines 54-57 for providing sounds to a user), detecting a gaze direction of an eye of the user using gaze detection hardware, selecting an object from two or more real world objects in the image based on the gaze direction (See; Fig. 56 and Column 13, lines 40-60 for selecting products based on a user’s gaze), receiving sensor data from a hardware sensor other than an image sensor, the sensor data based on a temperature, pressure, state of matter, mass or velocity of the selected object (See; Column 12, lines 30-63 where heat sensors, weight sensors, motion sensors, proximity sensors, etc. can be integrated in the system), determining information related to the selected object based on the sensor data and rendering a pre-stored sound to the user (See; Column 12, lines 30-63 where the product can be detected to have been lifted from the shelf (determined information) such as by a weight sensor. Further See Column 124, line 62 - Column 125, line 12 where when the specific product is being interacted with or moved, real time information can be sent to the user such as dietary information specific to the product held. Thus in this example a weight sensor detects the removal of the product and plays dietary information audio in response to the user).	Therefore it would have been obvious to one of ordinary skill in the art at the time of the See; p[0027] for HMD) comprising: a display (See; Fig. 1 display unit 14); a structure, coupled to the display and adapted to position the display in a field-of-view (FOV) of the user (See; p[0027] for HMD); an HMD direction detection subsystem, coupled to the structure (See; p[0028] for image capture unit 10 photographs real space inside a predetermined angular field of view); a first sensor coupled to the structure (See; p[0029] for image recognition unit 11 which detects an AR marker) at least one sound reproduction device, coupled to the structure (See; p[0038] for speaker for outputting audio); and a processor, coupled to the display (See; Fig. 1 CPU 106), the HMD direction detection subsystem, the first sensor, and the at least one sound reproduction device, the processor configured to: receive first sensor data related to one or more real-world objects in a field-of-view of the user from the first sensor; establish respective positions of the one or more real-world objects using the first sensor data (See; p[0029] where image recognition unit 11 obtains position data of a detected AR marker); detecting a direction of an HMD using detection hardware (See; p[0028] for image capture unit 10 photographs real space inside a predetermined angular field of view); select an object from the one or more real-world objects in the image based on the HMD direction (See; p[0029] for locating AR markers included in an image) and the respective positions of the one or more real world objects (See; p[0046]); receive second sensor data from a second sensor different than the first sensor (See; p[0028]-p[0032] for image capture unit capturing the AR marker for determining the outputted information. Further See; p[0046] for sensor 109 which can obtain the current position using for example, a GPS. Where the display unit may display object data in accordance with the sensed location obtained from the GPS instead of displaying object data in accordance with an AR marker); determine information related to the selected object based on the second sensor data (See; p[0046] for sensor 109 which can obtain the current position using for example, a GPS. Where the display unit may display object data in accordance with the sensed location obtained from the GPS instead of displaying object data in accordance with an AR marker); choose a sound from a plurality of pre-stored sounds based on the information related to the selected object(See; p[0029]-p[0031] for an object data management table 2 which contains object data such as audio data / information associated with the AR marker); retrieve a digital representation of the chosen sound; and render the digital representation of the chosen sound to the user through the at least one sound reproduction device (See; p[0031], p[0038] where audio data may be output to the user via a speaker). Imamura fails to explicitly teach an eye gaze detection subsystem, coupled to the structure; detect a gaze direction of an eye of a wearer of the HMD using the eye gaze detection subsystem; select an object from the two or more real-world object in the image based on the gaze direction.	However, Visser teaches an eye gaze detection subsystem, coupled to the structure; detect a gaze direction of an eye of a wearer of the HMD using the eye gaze detection subsystem; select an object from the two or more real-world object in the image based on the gaze direction (See; Fig. 3, p[0056]-p[0058], p[0060-p[0061] for a mixed reality scene from a viewpoint of a headset capturing at least two real world objects in trees 340 and 342. Selecting tree 342 based on the user’s gaze direction and generating a virtual sound (singing tree) to be perceived by the user as originating at the real world tree 342); obtaining information related to the selected object (a singing tree is related to the recognized real world tree); choose a sound from a plurality of pre-stored sounds based on the information related to the selected object(See; p[0060] -p[0061] where the sound characteristics of the virtual sound may be determined by a table lookup operation); retrieve a digital representation of the chosen sound (singing tree) and render the digital representation of the chosen sound to the user through the at least one sound reproduction device (See; Fig. 3, p[0056]-p[0058], p[0060-p[0061] for overlaying the virtual sound of a singing tree over the real world tree and played back to the user via the HMD). 	Therefore it would have been obvious to one of ordinary skill in the art at the time of the invention to modify Imamura with the gaze based sound selection technique of Visser so as to give the user more control over which objects to select for audio output when multiple objects are present in the user’s field of view. The combination of Imamura and Visser fails to explicitly teach the sensor data based on a temperature, pressure, state of matter, mass or velocity of the selected object.	However Zalewski teaches a method for providing sounds to a user (See; Column 11, lines 54-57 for providing sounds to a user), detecting a gaze direction of an eye of the user using gaze detection hardware, selecting an object from two or more real world objects in the image based on the gaze direction (See; Fig. 56 and Column 13, lines 40-60 for selecting products based on a user’s gaze), receiving sensor data from a hardware sensor other than an image sensor, the sensor data based on a temperature, pressure, state of matter, mass or velocity of the selected object (See; Column 12, lines 30-63 where heat sensors, weight sensors, motion sensors, proximity sensors, etc. can be integrated in the system), determining information related to the selected object based on the sensor data and rendering a pre-stored sound to the user (See; Column 12, lines 30-63 where the product can be detected to have been lifted from the shelf (determined information) such as by a weight sensor. Further See Column 124, line 62 - Column 125, line 12 where when the specific product is being interacted with or moved, real time information can be sent to the user such as dietary information specific to the product held. Thus in this example a weight sensor detects the removal of the product and plays dietary information audio in response to the user).	Therefore it would have been obvious to one of ordinary skill in the art at the time of the invention to modify Imamura’s additional sensor to be a heat, weight (mass) or velocity sensor so as to collect further information about the selected object that cannot be identified via an image sensor, thus increasing the interactivity of the sensor network with the user.	In regards to claims 20, 28 and 30, Zalewski teaches wherein the second sensor is located remote from the HMD / user (See; Column 12, lines 30-63 where heat sensors, weight sensors, motion sensors, proximity sensors, etc. can be integrated into the shelves or floors, remote from the smart glasses) .

	In regards to claims 24 and 26, Imamura teaches the processor further configured to: use a user-customizable association between object information and the plurality of pre-stored sounds to choose the sound, wherein the object information includes the information related to the selected object (See; p[0029]-p[0031] for an object data management table 2 which contains object data such as audio data / information associated with the AR marker). Further, Visser also teaches the processor further configured to: use a user-customizable association between object information and the plurality of pre-stored sounds to choose the sound, wherein the object information includes the information related to the selected object (See; p[0060] -p[0061] where the sound characteristics of the virtual sound may be determined by a table lookup operation)

	In regards to claim 25, Imamura teaches wherein the first sensor comprises an image sensor, the first sensor data comprises an image of at least a portion of the field of view of the user showing the one or more real-world objects (See; p[0028] for HMD 1 with image capture unit 10 including for example a camera), and the processor is further configured to: perform image recognition on the selected object in the image to recognize the selected object.  Imamura fails to explicitly teach capturing two or more real world objects. 	However, Visser shows the first sensor data comprised an image of at least a portion of the field of view of the user showing the two or more real world objects( See; Fig. 3, p[0056]-p[0058], p[0060-p[0061] for a mixed reality scene from a viewpoint of a headset capturing at least two real world objects in trees 340 and 342. Selecting tree 342 based on the user’s gaze direction and generating a virtual sound (singing tree) to be perceived by the user as originating at the real world tree 342). Therefore it would have been obvious to one of ordinary skill in the art at the time of the invention to modify Imamura with the gaze based sound selection technique of Visser able to choose between multiple real world objects, so as to give the user more control over which objects to select for audio output when multiple objects are present in the user’s field of view.  	In regards to claims 29, 31 and 32, Zalewski teaches wherein the hardware sensor is physically coupled to the selected object (See; Column 12, lines 30-63 where heat sensors, weight sensors, motion sensors, proximity sensors, etc. can be integrated into the shelves and be in physical contact with the particular product so as to determine when the object is removed from the shelf).
	
Claims 7 and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Imamura et al (2016/0342388) (herein “Imamura”) in view of Visser et al (2018/0020312) (herein “Visser”) in view of Zalewski et al (9,911,290) (herein “Zalewski) and further in view of Vaziri (2017/0330042).	In regards to claims 7 and 27, Visser fails to explicitly teach further comprising: detecting a predetermined eye gesture performed by the user; and starting or stopping said rendering of the digital representation of the sound in response to said detection of the first predetermined eye gesture.	However, Vaziri teaches an eye tracking system which detects a predetermine eye gesture performed by the user to start and stop a given action (See; p[0101], p[0107].
Claims 13 and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Imamura et al (2016/0342388) (herein “Imamura”) in view of Visser et al (2018/0020312) (herein “Visser”) in view of Zalewski et al (9,911,290) (herein “Zalewski) and further in view of HUANG (2018/0246698)	In regards to claim 13, the combination of Imamura and Visser fail to explicitly teach wherein said rendering of the digital representation of the sound is presented to the user as a directional sound originating at the object 	However HUANG teaches wherein said rendering of the digital representation of the sound is presented to the user as a directional sound originating at the object (See; p[0077] for an augmented reality system which employs a spatialized audio system that renders and presents spatialized audio corresponding to virtual objects with the known virtual locations and orientations in real and physical three-dimensional (3D) space, making it appear to the end user 50 that the sounds are originating from the virtual locations of the real objects). Therefore it would have been obvious to one of ordinary skill in the art at the time of the invention to modify Imamura to have the spatialized audio system of HUANG so as to increase the clarity and realism of the sounds as emanating from the marked objects. 	In regards to claim 21, the combination of Imamura and Visser fail to explicitly teach the at least one sound reproduction device comprising two or more sound reproduction devices (See; p[0011], p[0062] for a plurality of speakers), wherein the sound is rendered to have a position in a 3D audio landscape such that the sound emanates from the direction of the real-world object (See; p[0077] for an augmented reality system which employs a spatialized audio system that renders and presents spatialized audio corresponding to virtual objects with the known virtual locations and orientations in real and physical three-dimensional (3D) space, making it appear to the end user 50 that the sounds are originating from the virtual locations of the real objects). Therefore it would have been obvious to one of ordinary skill in the art at the time of the invention to modify Imamura to have the spatialized audio .  
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN A BOYD whose telephone number is (571)270-7503.  The examiner can normally be reached on Mon - Fri 8:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on (571) 272-2976.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/JONATHAN A BOYD/Primary Examiner, Art Unit 2627