DETAILED ACTION
	Claims 1-20 are pending in this application, with claims 1 and 16 being independent.
Notice AIA  Status
 	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
Priority
 	Receipt is acknowledged of certified copies of papers submitted under 35 U.S.C. 119(a)-(d), which papers have been placed of record in the file.
Specification
 	The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed.
Drawings
	 The drawings were received on November 15, 2019.  These drawings are acceptable. 
 Claim Objections
 	Claim 3 is objected to because of the following informalities:  “the user image” (line 3 of claim 3) lacks proper antecedent basis (only “the image of the user” has “an image of the user” in line 4 of claim 2).  Appropriate correction is required.
  	Claim 12 is objected to because of the following informalities:  “the user image” (in both lines 3 and 4 of claim 12) lacks proper antecedent basis (only “the image of the user” has proper antecedent basis in the claims; see “an image of the user” in line 4 of claim 2).  Appropriate correction is required.
Claim Rejections - 35 USC § 102
 	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

 	Claims 1-3, 9 and 16-18 are rejected under 35 U.S.C. 102 (a)(1) and/or 102 (a)(2) as being anticipated by PEREZ et al. (US 2013/0050432, hereinafter “PEREZ”).
	Regarding claim 1, PEREZ discloses an extended reality (XR) device (¶ [0002]: “a see-through, near-eye, augmented reality display device”; ¶ [0052]: “a near eye display device such as a head mounted display device in the form of eyeglasses.”  ¶ [0054]: “a mixed reality display system.” ) comprising: 
 	a transparent display (¶ [0003]: “a see-through, near-eye, augmented reality display system comprising a see-through, near-eye, augmented reality display device. For each eye, the device comprises a respective display optical system positioned to be seen through by the respective eye. At least one image generation unit is attached to the see-through display device for generating an image and having an optical alignment with at least one of the display optical systems. The at least one image generation unit has a variable focal length.”  ¶ [0054]: “System 10 includes a see-through display device as a near-eye, head mounted display device 2”  ¶ [0055]: “Head mounted display device 2, which in one embodiment is in the shape of eyeglasses in a frame 115, is worn on the head of a user so that the user can see through a display, embodied in this example as a display optical system 14 for each eye, and thereby have an actual direct view of the space in front of the user.”  ¶ [0063]: “a see-through, near-eye, mixed reality display system embodied in a set of eyeglasses 2. What appears as a lens for each eye represents a display optical system 14 for each eye, e.g. 14r and 14l. A display optical system includes a see-through lens, e.g. 118 and 116 in FIGS. 3A-3D, as in an ordinary pair of glasses, but also contains optical elements (e.g. mirrors, filters) for seamlessly fusing virtual content with the actual and direct real world view seen through the lenses 118, 116. A display optical system 14 has an optical axis which is generally in the center of the see-through lens 118, 116 in which light is generally collimated to provide a distortionless view. For example, when an eye care professional fits an ordinary pair of eyeglasses to a user's face, a goal is that the glasses sit on the user's nose at a position where each pupil is aligned with the center or optical axis of the respective lens resulting in generally collimated light reaching the user's eye for a clear or distortionless view.”  ¶ [0073]”: The display device 2 provides an image generation unit which can create one or more images including one or more virtual objects. In some embodiments, a microdisplay may be used as the image generation unit. A microdisplay assembly 173 comprises light processing elements and a variable focus adjuster 135. An example of a light processing element is a microdisplay unit 120. Other examples include one or more optical elements such as one or more lenses of a lens system 122 and one or more reflecting elements such as surfaces 124a and 124b in FIGS. 3A and 3B or 124 in FIGS. 3C and 3D. Lens system 122 may comprise a single lens or a plurality of lenses.”  ¶ [0097]: “Lightguide optical element 112 transmits light from microdisplay 120 to the eye of the user wearing head mounted display device 2. Lightguide optical element 112 also allows light from in front of the head mounted display device 2 to be transmitted through lightguide optical element 112 to the user's eye thereby allowing the user to have an actual direct view of the space in front of head mounted display device 2 in addition to receiving a virtual image from microdisplay 120.”); 
 	a sensing unit (¶ [0003]: “gaze detection elements”; ¶ [0062]: “gaze detection elements”;  ¶ [0065]: “a visible light camera” … “may be the sensor  ¶ [0072]: “inertial sensors”) configured to sense a relative position (¶ [0072]: “inertial sensors are for sensing position, orientation, and sudden accelerations of head mounted display device 2. From these movements, head position may also be determined.”  ¶ [0136]: “a relative position”;  ¶ [0136]: “a position of user from the one or more real objects based on the one or more relative positions” and/or  ¶ [0201]: “the position of the mole or freckle”) and gaze direction of a user with respect to the transparent display (¶ [0002]: “determining a current user focal region within a current user field of view based on a point of gaze determined from gaze vectors.”  ¶ [0003]: “For each display optical system, there is a respective arrangement of gaze detection elements including illuminators for generating glints and a detection area of at least one sensor for detecting glints and generating eye data.”  ¶ [0005]: “A gaze vector is determined for each user eye based on an arrangement of gaze detection elements in fixed positions with respect to each other on a respective display optical system for each eye of the display device. A current user focal region is determined based on the gaze vectors within the current user field of view.”   ¶ [0052]: “The user's field of view, which is a portion of the environment or space that the user may observe at a current head position, is determined”;  ¶ [0052]: “A current user focal region is determined based on a point of gaze which is determined based upon a gaze vector from each eye.”  ¶ [0062]: “As noted above, in some embodiments, gaze detection of each of a user's eyes is based on a three dimensional coordinate system of gaze detection elements on a near-eye, mixed reality display device like the eyeglasses 2 in relation to one or more human eye elements such as a cornea center, a center of eyeball rotation and a pupil center. Examples of gaze detection elements which may be part of the coordinate system including glint generating illuminators and at least one sensor for capturing data representing the generated glints. As discussed below (see FIG. 21 discussion, a center of the cornea can be determined based on two glints using planar geometry. The center of the cornea links the pupil center and the center of rotation of the eyeball, which may be treated as a fixed location for determining an optical axis of the user's eye at a certain gaze or viewing angle.” ¶ [0063]: “FIG. 1C illustrates an exemplary arrangement of positions of respective sets of gaze detection elements in a gaze detection system for each eye positioned facing each respective eye by a see-through, near-eye, mixed reality display system embodied in a set of eyeglasses 2.”   ¶ [0064]: “In the example of FIG. 1C, a detection area 139r, 139l of at least one sensor is aligned with the optical axis of its respective display optical system 14r, 14l so that the center of the detection area 139r, 139l is capturing light along the optical axis. If the display optical system 14 is aligned with the user's pupil, each detection area 139 of the respective sensor 134 is aligned with the user's pupil. Reflected light of the detection area 139 is transferred via one or more optical elements to the actual image sensor 134 of the camera, in this example illustrated by dashed line as being inside the frame 115.”  ¶ [0065]: “In one example, a visible light camera also commonly referred to as an RGB camera may be the sensor, and an example of an optical element or light directing element is a visible light reflecting mirror which is partially transmissive and partially reflective.”  ¶ [0066]: “The visible light camera provides image data of the pupil of the user's eye, while IR photodetectors 152 capture glints which are reflections in the IR portion of the spectrum. If a visible light camera is used, reflections of virtual images may appear in the eye data captured by the camera. An image filtering technique may be used to remove the virtual image reflections if desired. An IR camera is not sensitive to the virtual image reflections on the eye.”  ¶ [0067]: “In other examples, the at least one sensor 134 is an IR camera or a position sensitive detector (PSD) to which the IR radiation may be directed. For example, a hot reflecting surface may transmit visible light but reflect IR radiation. The IR radiation reflected from the eye may be from incident radiation of the illuminators 153, other IR illuminators (not shown) or from ambient IR radiation reflected off the eye. In some examples, sensor 134 may be a combination of an RGB and an IR camera, and the optical light directing elements may include a visible light reflecting or diverting element and an IR radiation reflecting or diverting element. In some examples, a camera may be small, e.g. 2 millimeters (mm) by 2 mm. An example of such a camera sensor is the Omnivision OV7727. In other examples, the camera may be small enough, e.g. the Omnivision OV7727, e.g. that the image sensor or camera 134 may be centered on the optical axis or other location of the display optical system 14. For example, the camera 134 may be embedded within a lens of the system 14. Additionally, an image filtering technique may be applied to blend the camera into a user field of view to lessen any distraction to the user.” ¶ [0072]: “Inside, or mounted to temple 102, are ear phones 130, inertial sensors 132 and temperature sensor 138. In one embodiment inertial sensors 132 include a three axis magnetometer 132A, three axis gyro 132B and three axis accelerometer 132C (See FIG. 4). The inertial sensors are for sensing position, orientation, and sudden accelerations of head mounted display device 2. From these movements, head position may also be determined.”  ¶ [0122]: “A GPS transceiver 965 utilizing satellite-based radio navigation to relay the position of the user applications is enabled for such service.”  ¶ [0126]: “FIG. 7 is a block diagram of a system embodiment for determining positions of objects within a user field of view of a see-through, near-eye display device. This embodiment illustrates how the various devices may leverage networked computers to map a three-dimensional model of a user field of view and the real and virtual objects within the model. An application 456 executing in a processing unit 4,5 communicatively coupled to a display device 2 can communicate over one or more communication networks 50 with a computing system 12 for processing of image data to determine and track a user field of view in three dimensions. The computing system 12 may be executing an application 452 remotely for the processing unit 4,5 for providing images of one or more virtual objects. Either or both of the applications 456 and 452 working together may map a 3D model of space around the user. A depth image processing application and skeletal tracking application 450 detects objects, identifies objects and their locations in the model. An object may be a person or a thing. Additionally, the depth image processing application performs skeletal tracking of at least humans. The application 450 may perform its processing based on depth image data from depth camera like 20A and 20B, two-dimensional or depth image data from one or more front facing cameras 113, and GPS metadata associated with objects in the image data obtained from a GPS image tracking application 458.”  ¶ [0136]: “FIG. 8B is a flowchart of a method embodiment for identifying one or more real objects in a user field of view. This embodiment may be used to implement step 512. Each of the implementing examples in FIGS. 8B, 8D and 8E may be used separately or in conjunction with one another to identify the location of objects in the user field of view. In step 520, a location of user wearing the display device 2 is identified. For example, GPS data via a GPS unit 965 in the mobile device 5 or GPS transceiver 144 on the display device 2 may identify the location of the user. In step 522, one or more processors, retrieve one or more images of the location from a database (e.g. 470), and uses pattern recognition in step 524 to select one or more images matching image data from the one or more front facing cameras. In some embodiments, steps 522 and 524 may be performed remotely by a more powerful computer, e.g. hub 12, having access to image databases. Based on GPS data, in step 526 the one or more processors determines a relative position of one or more objects in front facing image data to one or more GPS tracked objects 528 in the location, and determines in step 529 a position of user from the one or more real objects based on the one or more relative positions.”   ¶ [0201]: “In one embodiment, an eye camera may capture about 5 to 10 mm of area around the visible eyeball portion of the cornea bulge, eye white, iris and pupil so as to capture part of an eyelid and eyelashes. A positionally fixed facial feature like a mole or freckle on skin such as an eyelid or on the bottom rim of the skin encasing the lower eyeball may also be present in the image data of the eye. In image samples, the position of the mole or freckle may be monitored for a change in position. If the facial feature has moved up, down, right or left, a vertical or horizontal shift can be detected. If the facial feature appears larger or smaller, a depth change in the spatial relationship between eye and display device 2 can be determined. There may be a criteria range in the change of position to trigger recalibration of the training images due to things like camera resolution, etc.”  ¶ [0202]: “In another example, although lighting is a factor which changes the size of the pupil and the ratio of pupil area to visible iris area within the circumference or perimeter of the iris, the size of the perimeter or circumference of the iris does not change with gaze change or lighting change; hence, the perimeter or circumference is a fixed characteristic of the iris as a facial feature. Through ellipse fitting of the iris, processor 210 or a processor of the processing unit 4,5 of the display device 2 can determine whether the iris has become larger or smaller in image data in accordance with criteria. If larger, the display device 2 with its illuminators 153 and at least one sensor 134 has moved closer in depth to the user's eye; if smaller, the display device 2 has moved farther away. A change in a fixed characteristic can trigger an IPD alignment check.”    ¶ [0204]: “In some examples, comparisons between current sensed data and training images is to determine a closest match and interpolate where the current pupil position data fits between training data sets to estimate a gaze vector. Particularly when using training data for comparison, movement of the gaze detection coordinate system is a cause for recalibrating the training images. One may periodically redetermine the positions of the cornea center and fixed center of rotation to determine whether there has been a change in the spatial relationship between them and the illuminators and at least one sensor. A periodic check may also determine whether a lighting change in accordance with a criteria has occurred triggering generation of new training gaze data sets.”); and 
 	a processor (¶ [0004]: “One or more processors”; ¶ [0054]: “processing unit 4”) configured to recognize at least one real-world external object (¶ [0002]: “One or more real or virtual objects of interest are identified.”) that is located in a forward direction of the transparent display and is visible to the user through the transparent display (¶ [0055]: “Head mounted display device 2, which in one embodiment is in the shape of eyeglasses in a frame 115, is worn on the head of a user so that the user can see through a display, embodied in this example as a display optical system 14 for each eye, and thereby have an actual direct view of the space in front of the user. The use of the term "actual direct view" refers to the ability to see real world objects directly with the human eye, rather than seeing created image representations of the objects.”  ¶ [0071]: “The data from the sensors may be sent to the processing unit 4,5 which may process them but which may also send to a computer system over a network or hub computing system 12 for processing. The processing identifies and maps the user's real world field of view.”  ¶ [0125]: “References to front facing image data are referring to image data from one or more front facing camera like camera 113 in FIGS. 1A and 1B. In these embodiments, the field of view of the front facing cameras 113 approximates the user field of view as the camera is located at a relatively small offset from the optical axis 142 of each display optical system 14. The offset may be taken into account in the image data.”  ¶ [0138]: “FIG. 8D is a flowchart of a method embodiment for identifying one or more objects in a user field of view based on depth data transmitted to the see-through, mixed reality display device 2.”), 
based on the relative position (e.g., ¶ [0072]: “inertial sensors are for sensing position, orientation, and sudden accelerations of head mounted display device 2. From these movements, head position may also be determined.”  ¶ [0134]: “the user's current location,”  ¶ [0136]: “a relative position”;  ¶ [0136]: “a position of user from the one or more real objects based on the one or more relative positions” and/or ¶ [0201]: “the position of the mole or freckle”) and gaze direction of the user sensed by the sensing unit (¶ [0002]: “One or more real or virtual objects of interest are identified. For example, the user intent to interact with an object may be determined from a gaze duration with respect to the object.”   ¶ [0005]: “The method comprises determining a current user field of view of a user wearing the see-through, near-eye, mixed reality device. The field of view includes one or more real objects. A gaze vector is determined for each user eye based on an arrangement of gaze detection elements in fixed positions with respect to each other on a respective display optical system for each eye of the display device. A current user focal region is determined based on the gaze vectors within the current user field of view. One or more images is displayed including one or more virtual objects appearing at a respective focal region in the current user field of view for a natural sight view.”  ¶ [0006]: “An object of interest is identified by determining a user intent to interact with the object in the current user focal region. An optimized image is generated based on the object of interest. The optimized image is displayed to the user via the see-through display device.”  ¶ [0052]: “The user's field of view, which is a portion of the environment or space that the user may observe at a current head position, is determined and real objects in the user field of view are identified.”   ¶ [0087]: “When the system renders a scene for the augmented reality display, it takes note of which real-world objects are in front of which virtual objects. If a virtual object is in front of a real-world object, then the opacity should be on for the coverage area of the virtual object. If the virtual is (virtually) behind a real-world object, then the opacity should be off, as well as any color for that pixel, so the user will only see the real-world object for that corresponding area (a pixel or more in size) of real light. Coverage would be on a pixel-by-pixel basis, so the system could handle the case of part of a virtual object being in front of a real-world object, part of the virtual object being behind the real-world object, and part of the virtual object being coincident with the real-world object.”   ¶ [0124]: “For a see-through mixed reality display device, the gaze vectors are determined to identify a point of gaze in a three-dimensional (3D) user field of view which includes both real objects, typically not under computer control, and virtual objects generated by an application. The gaze vectors may intersect at an object 10 feet away or at a distance effectively at infinity. The following figures briefly discuss embodiments for determining a 3D user field of view.”   ¶ [0126]: “FIG. 7 is a block diagram of a system embodiment for determining positions of objects within a user field of view of a see-through, near-eye display device. This embodiment illustrates how the various devices may leverage networked computers to map a three-dimensional model of a user field of view and the real and virtual objects within the model. An application 456 executing in a processing unit 4,5 communicatively coupled to a display device 2 can communicate over one or more communication networks 50 with a computing system 12 for processing of image data to determine and track a user field of view in three dimensions. The computing system 12 may be executing an application 452 remotely for the processing unit 4,5 for providing images of one or more virtual objects. Either or both of the applications 456 and 452 working together may map a 3D model of space around the user. A depth image processing application and skeletal tracking application 450 detects objects, identifies objects and their locations in the model. An object may be a person or a thing. Additionally, the depth image processing application performs skeletal tracking of at least humans. The application 450 may perform its processing based on depth image data from depth camera like 20A and 20B, two-dimensional or depth image data from one or more front facing cameras 113, and GPS metadata associated with objects in the image data obtained from a GPS image tracking application 458.”  ¶ [0128]: “The GPS image tracking application 458 identifies images of the user's location in one or more image database(s) 470 based on GPS data received from the processing unit 4,5 or other GPS units identified as being within a vicinity of the user, or both. Additionally, the image database(s) may provide accessible images of a location with metadata like GPS data and identifying data uploaded by users who wish to share their images. The GPS image tracking application provides distances between objects in an image based on GPS data to the depth image processing application 450. Additionally, the application 456 may perform processing for mapping and locating objects in a 3D user space locally and may interact with the GPS image tracking application for receiving distances between objects. Many combinations of shared processing are possible between the applications by leveraging network connectivity.”  ¶ [0135]: “FIG. 8A is a flowchart of a method embodiment for determining a three-dimensional user field of view. In step 510, one or more processors of the control circuitry 136, the processing unit 4,5, the hub computing system 12 or a combination of these receive image data from one or more front facing cameras, and in step 512 identify one or more real objects in front facing image data. Data from the orientation sensor 132, e.g. the three axis accelerometer 132C and the three axis magnetometer 132A, can also be used with the front facing camera 113 image data for mapping what is around the user, the position of the user's face and head in order to determine which objects, real or virtual, he or she is likely focusing on at the time. Based on an executing application, the one or more processors in step 514 identify virtual object positions in a user field of view which may be determined to be the field of view captured in the front facing image data. In step 516, a three-dimensional position is determined for each object in the user field of view. In other words, where each object is located with respect to the display device 2, for example with respect to the optical axis 142 of each display optical system 14.”  ¶ [0136]: “FIG. 8B is a flowchart of a method embodiment for identifying one or more real objects in a user field of view. This embodiment may be used to implement step 512. Each of the implementing examples in FIGS. 8B, 8D and 8E may be used separately or in conjunction with one another to identify the location of objects in the user field of view. In step 520, a location of user wearing the display device 2 is identified. For example, GPS data via a GPS unit 965 in the mobile device 5 or GPS transceiver 144 on the display device 2 may identify the location of the user. In step 522, one or more processors, retrieve one or more images of the location from a database (e.g. 470), and uses pattern recognition in step 524 to select one or more images matching image data from the one or more front facing cameras. In some embodiments, steps 522 and 524 may be performed remotely by a more powerful computer, e.g. hub 12, having access to image databases. Based on GPS data, in step 526 the one or more processors determines a relative position of one or more objects in front facing image data to one or more GPS tracked objects 528 in the location, and determines in step 529 a position of user from the one or more real objects based on the one or more relative positions.” ¶ [0138]: “Data from the orientation sensor 132 may also be sent for identifying face or head position.”  ¶ [0138]: “In step 542, the display device 2 receives data identifying one or more objects in a field of view for the user and their positions in a 3D model of a space. The image data from the one or more front facing cameras 113 approximates the user field of view, so the hub system 12 identifies the object in the front facing image data, for example through image recognition or pattern recognition software. Orientation data may also be used with the front facing image data to refine the user field of view and identify objects tracked by the computer system 12 falling within the user field of view. (The hub system 12 also aligns the front facing image data when received from two or more cameras 113 for identifying the user field of view.) The processing unit 4,5 in step 544 receives a position of the user in the 3D model of the space, and in step 546 the processing unit 4,5, or the processor 210 of the control circuitry 136 or both determines a position of one or more objects in the user field of view based on the positions of the user and the one or more objects in the 3D model of the space.”  ¶ [0151]: “FIG. 10 is a flowchart of an embodiment of a process for generating a natural sight display view of virtual objects with real objects. In step 552, the system determines the current user field of view. That is, the system determines a portion of the environment or space within the user's vision and identifies the real and virtual objects therein as discussed above. For determining at what the user is specifically looking during a time period, a current user focal region of the user's depth of focus in the field of view is determined. Gaze data indicates what is the focal point or point of gaze which defines the current user focal region within the field of view. The Panum's fusional area can be calculated based on a focal point on a focal curve, the Horopter, within the Panum's fusional area. The Panum's fusional area is the area of single vision for binocular stereopsis used by the human eyes.”  ¶ [0152]: “In step 554, a gaze vector is determined for each eye based on the geometry of one or more gaze detection elements. FIGS. 3A through 3D illustrate some embodiments of arrangements of gaze detection elements for each respective display optical system 14 of a display device 2. Determination of gaze is discussed in more detail with respect to FIGS. 16 through 24. In step 556, a current user focal region is determined based on the gaze vectors within the current user field of view.”   ¶ [0158]: “FIG. 12 is a flowchart of an embodiment of a method for enhancing the display view of the one or more objects of interest in a see-through, mixed reality display device. In step 568, one or more processors of the augmented reality system determine one or more objects of interest to a user in the current user focal region. By identifying the objects of interest to a user, more relevant information may be targeted to a user, and the display view decluttered. In step 570, the one or more processors controls the different display elements for enhancing the display view of the one or more objects of interest. Both real and virtual objects of interest may be enhanced. For example, FIGS. 26B and 26C provide examples of how a zoom function may be implemented for a virtual object and for a real object for enhancing the display view of the one or more objects.”  ¶ [0170]: “FIG. 17 is a flowchart of a method embodiment for determining gaze in a see-through, near-eye mixed reality display system and provides an overall view of how a near-eye display device can leverage its geometry of optical components to determine gaze and a depth change between the eyeball and a display optical system. One or more processors of the mixed reality system such as processor 210 of the control circuitry, that in processing unit 4, the mobile device 5, or the hub computing system 12, alone or in combination, determine in step 602 boundaries for a gaze detection coordinate system. In step 604, a gaze vector for each eye is determined based on reflected eye data including glints, and in step 606 a point of gaze, e.g. what the user is looking at, is determined for the two eyes in a three-dimensional (3D) user field of view. As the positions and identity of objects in the user field of view are tracked, for example, by embodiments like in FIGS. 8A-8F, in step 608, any object at the point of gaze in the 3D user field of view is identified. In many embodiments, the three-dimensional user field of view includes displayed virtual objects and an actual direct view of real objects. The term object includes a person.”  ¶ [0205]: “FIGS. 25A through 26D describe embodiments for identifying an object of interest based a user's intent to interact with the object and an optimizing an image or a display view for that interaction.”  ¶ [0206]: “FIG. 25A is a flowchart describing one embodiment of a process for identifying an object of interest based on a user's intent to interact with the object in the user's focal region. For example, the process of FIG. 25A is one example implementation of step 568 of FIG. 12. FIG. 25A describes a process by which a user's intent to interact with one or more objects in the user's focal region is determined based on detecting the user's eye gaze patterns in the user's focal region and determining the duration of the user's gaze on one or more objects being viewed by the user in the user's focal region.”  ¶ [0208]: “In step 693, it is determined if the user is viewing one or more objects. For example, the locations of points of gaze along the scanpath may be used to detect if the user is viewing one or more objects.”  ¶ [0209]: “If it is determined that the user is viewing one or more objects, then the objects being viewed by the user are identified in step 694. For example, the objects may be identified as a wall clock, a round shiny table, John Doe, a green leather couch, etc. In step 696, the duration of the user's gaze on the one or more objects being viewed is determined. In one example, the duration of the user's gaze is determined based on determining the duration of the user's gaze (or fixation) on the objects within a time window.”   ¶ [0201]: “In one embodiment, an eye camera may capture about 5 to 10 mm of area around the visible eyeball portion of the cornea bulge, eye white, iris and pupil so as to capture part of an eyelid and eyelashes. A positionally fixed facial feature like a mole or freckle on skin such as an eyelid or on the bottom rim of the skin encasing the lower eyeball may also be present in the image data of the eye. In image samples, the position of the mole or freckle may be monitored for a change in position. If the facial feature has moved up, down, right or left, a vertical or horizontal shift can be detected. If the facial feature appears larger or smaller, a depth change in the spatial relationship between eye and display device 2 can be determined. There may be a criteria range in the change of position to trigger recalibration of the training images due to things like camera resolution, etc.”  ¶ [0202]: “In another example, although lighting is a factor which changes the size of the pupil and the ratio of pupil area to visible iris area within the circumference or perimeter of the iris, the size of the perimeter or circumference of the iris does not change with gaze change or lighting change; hence, the perimeter or circumference is a fixed characteristic of the iris as a facial feature. Through ellipse fitting of the iris, processor 210 or a processor of the processing unit 4,5 of the display device 2 can determine whether the iris has become larger or smaller in image data in accordance with criteria. If larger, the display device 2 with its illuminators 153 and at least one sensor 134 has moved closer in depth to the user's eye; if smaller, the display device 2 has moved farther away. A change in a fixed characteristic can trigger an IPD alignment check.”    ¶ [0204]: “In some examples, comparisons between current sensed data and training images is to determine a closest match and interpolate where the current pupil position data fits between training data sets to estimate a gaze vector. Particularly when using training data for comparison, movement of the gaze detection coordinate system is a cause for recalibrating the training images. One may periodically redetermine the positions of the cornea center and fixed center of rotation to determine whether there has been a change in the spatial relationship between them and the illuminators and at least one sensor. A periodic check may also determine whether a lighting change in accordance with a criteria has occurred triggering generation of new training gaze data sets.”     ¶ [0228]: “FIG. 26D is a flowchart describing one embodiment of a process for displaying additional augmented content for an object, based on determining the user's intent to interact with the object. In step 734, augmented content related to the one or more objects is retrieved. In one example, the augmented content may include user-specific information retrieved from the user profile database 472. In another example, the augmented content may include user-specific information that is retrieved in real time from one or more data sources such as the user's social networking sites, address book, email data, Instant Messaging data, user profiles or other sources on the Internet.”  ¶ [0229]: “In step 736, audio content related to the identified objects is extracted. Step 736 is optional. For example, if the user is looking at a wall clock in the user's living room and it is determined that the user intends to interact with the wall clock object then audio information about the time may be heard by the user. In step 738, the augmented content is projected over or next to the one or more objects in the user's focal region. In one example, the augmented content is a virtual image including one or more virtual objects or virtual text that is displayed to the user. In another example, the augmented content may include a virtual object such as a menu with one or more choices. In step 740, one or more optimized images are displayed to the user via the head mounted display device 2.”   ¶ [0230]: “FIGS. 27A-C depict one embodiment of a user's interaction with one or more objects in the user's environment and the generation of an optimized image based on the user's interaction. FIG. 27A depicts an environment in which a user views one or more objects in a room 1100 using a HMD device 2. The room 1100 includes a front wall 1102, side wall 1104 and floor 1108, and example furniture such as a lamp 1106, a chair 1107, a wall clock 1118 and a table 1120. A video display screen 1110 is mounted to the wall 1102, in this example, and the hub 1116 rests on the table 1120. In an exemplary situation, user 1112 looks at an object such as the wall clock 1118 placed on the front wall 1102, via HMD device 2. 1121 represents the field of view of the user and 1122 represents the user's focal region.”    ¶ [0231]: “FIG. 27B depicts an optimized image generated by the camera of the HMD device of FIG. 27A, upon determining the user's intent to interact with the wall clock object 1118. In one embodiment, the user's intent may be determined as discussed by the process described in FIG. 25A. As illustrated in FIG. 27B, the optimized image 1124 includes an enhanced appearance of the wall clock object 1118 in the user's focal region and a diminished appearance of the lamp 1106, the display screen 1110, the hub 1116 and the table 1120 which are outside the user's focal region, but within the user's field of view. In the exemplary illustration, the wall clock object 1118 has been highlighted to enhance its appearance. The dotted lines around the objects 1106, 1110, 1116 and 1120 indicate their diminished appearance. In addition, the optimized image displays augmented content 1126 that shows the time of day in digital format next to the wall clock object 1118 and a message indicating that "Chloe's plane left on time." The message may have been formulated by one or more processors based on user profile data identifying Chloe as a social networking site friend, the flight information being on the user's calendar, and a check to the airline website which indicates the flight has just left. In one example, audio information about the time of day may also be heard by the user.”   ¶ [0232]: “FIG. 27C depicts the optimized image of FIG. 27B as seen by a user via a HMD device. The optimized image is provided by each of the display optical systems 14l and 14r, of the see-through, near-eye display device 2. The open regions 1127 and 1128 indicate the locations where light from the display enters the user's eyes as the opacity filter has diminished the appearance of the other furniture.”). 
 	Regarding claim 2 (depends on claim 1), PEREZ discloses that the sensing unit includes: 
 	a first camera (e.g., “physical environment facing video camera 113”) configured to receive a forward-view image including the external objects (¶ [0071]: “At the front of frame 115 is physical environment facing video camera 113 that can capture video and still images. Particularly in embodiments where the display device 2 is not operating in conjunction with depth cameras like capture devices 20a and 20b of the hub system 12, the physical environment facing camera 113 is a depth camera as well as a visible light sensitive camera. For example, the depth camera may include an IR illuminator transmitter and a hot reflecting surface like a hot mirror in front of the visible image sensor which lets the visible light pass and directs reflected IR radiation within a wavelength range or about a predetermined wavelength transmitted by the illuminator to a CCD or other type of depth sensor. The data from the sensors may be sent to the processing unit 4,5 which may process them but which may also send to a computer system over a network or hub computing system 12 for processing. The processing identifies and maps the user's real world field of view.”  ¶ [0101]: “In the embodiments above, the specific number of lenses shown are just examples. Other numbers and configurations of lenses operating on the same principles may be used. Additionally, in the examples above, only the right side of the see-through, near-eye display 2 are shown. A full near-eye, mixed reality display device would include as examples another set of lenses 116 and/or 118, another opacity filter 114, another lightguide optical element 112 for the embodiments of FIGS. 3C and 3D, another microdisplay 120, another lens system 122, likely another environment facing camera 113, another eye tracking camera 134 for the embodiments of FIGS. 3A to 3C, earphones 130, and a temperature sensor 138.”   ¶ [0109]: “Camera interface 216 provides an interface to the two physical environment facing cameras 113 and each eye camera 134 and stores respective images received from the cameras 113, 134 in camera buffer 218.”); and 
 	a second camera (¶ [0064]: “image sensor 134”; ¶ [0101]: “eye tracking camera 134”) configured to receive an image of the user (¶ [0063]: “FIG. 1C illustrates an exemplary arrangement of positions of respective sets of gaze detection elements in a gaze detection system for each eye positioned facing each respective eye by a see-through, near-eye, mixed reality display system embodied in a set of eyeglasses 2.”   ¶ [0064]: “In the example of FIG. 1C, a detection area 139r, 139l of at least one sensor is aligned with the optical axis of its respective display optical system 14r, 14l so that the center of the detection area 139r, 139l is capturing light along the optical axis. If the display optical system 14 is aligned with the user's pupil, each detection area 139 of the respective sensor 134 is aligned with the user's pupil. Reflected light of the detection area 139 is transferred via one or more optical elements to the actual image sensor 134 of the camera, in this example illustrated by dashed line as being inside the frame 115.”  ¶ [0065]: “In one example, a visible light camera also commonly referred to as an RGB camera may be the sensor, and an example of an optical element or light directing element is a visible light reflecting mirror which is partially transmissive and partially reflective.”  ¶ [0066]: “The visible light camera provides image data of the pupil of the user's eye, while IR photodetectors 152 capture glints which are reflections in the IR portion of the spectrum. If a visible light camera is used, reflections of virtual images may appear in the eye data captured by the camera. An image filtering technique may be used to remove the virtual image reflections if desired. An IR camera is not sensitive to the virtual image reflections on the eye.”  ¶ [0067]: “In other examples, the at least one sensor 134 is an IR camera or a position sensitive detector (PSD) to which the IR radiation may be directed. For example, a hot reflecting surface may transmit visible light but reflect IR radiation. The IR radiation reflected from the eye may be from incident radiation of the illuminators 153, other IR illuminators (not shown) or from ambient IR radiation reflected off the eye. In some examples, sensor 134 may be a combination of an RGB and an IR camera, and the optical light directing elements may include a visible light reflecting or diverting element and an IR radiation reflecting or diverting element. In some examples, a camera may be small, e.g. 2 millimeters (mm) by 2 mm. An example of such a camera sensor is the Omnivision OV7727. In other examples, the camera may be small enough, e.g. the Omnivision OV7727, e.g. that the image sensor or camera 134 may be centered on the optical axis or other location of the display optical system 14. For example, the camera 134 may be embedded within a lens of the system 14. Additionally, an image filtering technique may be applied to blend the camera into a user field of view to lessen any distraction to the user.”  ¶ [0088]: “A detection area 139r of a light sensor is also part of the display optical system 14r. An optical element 125 embodies the detection area 139r by capturing reflected light from the user's eye received along the optical axis 142 and directs the captured light to the sensor 134r, in this example positioned in the bridge 104. As shown, the arrangement allows the detection area 139 of the sensor 134r to have its center aligned with the center of the display optical system 14. For example, if sensor 134r is an image sensor, sensor 134r captures the detection area 139, so an image captured at the image sensor is centered on the optical axis because the detection area 139 is. In one example, sensor 134r is a visible light camera or a combination of RGB/IR camera, and the optical element 125 includes an optical element which reflects visible light reflected from the user's eye, for example a partially reflective mirror. In other embodiments, the sensor 134r is an IR sensitive device such as an IR camera, and the element 125 includes a hot reflecting surface which lets visible light pass through it and reflects IR radiation to the sensor 134r.”  ¶ [0091]: “In some embodiments, sensor 134r may be an IR camera which captures not only glints, but also an infra-red or near-infra-red image of the user's eye including the pupil.”  ¶ [0101]: “In the embodiments above, the specific number of lenses shown are just examples. Other numbers and configurations of lenses operating on the same principles may be used. Additionally, in the examples above, only the right side of the see-through, near-eye display 2 are shown. A full near-eye, mixed reality display device would include as examples another set of lenses 116 and/or 118, another opacity filter 114, another lightguide optical element 112 for the embodiments of FIGS. 3C and 3D, another microdisplay 120, another lens system 122, likely another environment facing camera 113, another eye tracking camera 134 for the embodiments of FIGS. 3A to 3C, earphones 130, and a temperature sensor 138.”  ¶ [0109]: “Camera interface 216 provides an interface to the two physical environment facing cameras 113 and each eye camera 134 and stores respective images received from the cameras 113, 134 in camera buffer 218.”  ¶ [0168]: “The respective image sensor in this example is a camera capable of capturing image data representing glints 1741 and 1761 generated respectively by illuminators 153a and 153b on the left side of the frame 115 and data representing glints 174r and 176r generated respectively by illuminators 153c and 153d.”). 
	Regarding claim 3 (depends on claim 2), PEREZ discloses: 
 	the processor senses the relative position (¶ [0201]: “the position of the mole or freckle may be monitored for a change in position”) and gaze direction (¶ [0204]: “to estimate a gaze vector”) of the user about the transparent display through the user image (¶ [0201]: “in the image data of the eye”) (¶ [0052]: “A current user focal region is determined based on a point of gaze which is determined based upon a gaze vector from each eye.”  ¶ [0062]: “As noted above, in some embodiments, gaze detection of each of a user's eyes is based on a three dimensional coordinate system of gaze detection elements on a near-eye, mixed reality display device like the eyeglasses 2 in relation to one or more human eye elements such as a cornea center, a center of eyeball rotation and a pupil center. Examples of gaze detection elements which may be part of the coordinate system including glint generating illuminators and at least one sensor for capturing data representing the generated glints. As discussed below (see FIG. 21 discussion, a center of the cornea can be determined based on two glints using planar geometry. The center of the cornea links the pupil center and the center of rotation of the eyeball, which may be treated as a fixed location for determining an optical axis of the user's eye at a certain gaze or viewing angle.” ¶ [0063]: “FIG. 1C illustrates an exemplary arrangement of positions of respective sets of gaze detection elements in a gaze detection system for each eye positioned facing each respective eye by a see-through, near-eye, mixed reality display system embodied in a set of eyeglasses 2.”   ¶ [0064]: “In the example of FIG. 1C, a detection area 139r, 139l of at least one sensor is aligned with the optical axis of its respective display optical system 14r, 14l so that the center of the detection area 139r, 139l is capturing light along the optical axis. If the display optical system 14 is aligned with the user's pupil, each detection area 139 of the respective sensor 134 is aligned with the user's pupil. Reflected light of the detection area 139 is transferred via one or more optical elements to the actual image sensor 134 of the camera, in this example illustrated by dashed line as being inside the frame 115.”  ¶ [0065]: “In one example, a visible light camera also commonly referred to as an RGB camera may be the sensor, and an example of an optical element or light directing element is a visible light reflecting mirror which is partially transmissive and partially reflective.”  ¶ [0066]: “The visible light camera provides image data of the pupil of the user's eye, while IR photodetectors 152 capture glints which are reflections in the IR portion of the spectrum. If a visible light camera is used, reflections of virtual images may appear in the eye data captured by the camera. An image filtering technique may be used to remove the virtual image reflections if desired. An IR camera is not sensitive to the virtual image reflections on the eye.”  ¶ [0067]: “In other examples, the at least one sensor 134 is an IR camera or a position sensitive detector (PSD) to which the IR radiation may be directed. For example, a hot reflecting surface may transmit visible light but reflect IR radiation. The IR radiation reflected from the eye may be from incident radiation of the illuminators 153, other IR illuminators (not shown) or from ambient IR radiation reflected off the eye. In some examples, sensor 134 may be a combination of an RGB and an IR camera, and the optical light directing elements may include a visible light reflecting or diverting element and an IR radiation reflecting or diverting element. In some examples, a camera may be small, e.g. 2 millimeters (mm) by 2 mm. An example of such a camera sensor is the Omnivision OV7727. In other examples, the camera may be small enough, e.g. the Omnivision OV7727, e.g. that the image sensor or camera 134 may be centered on the optical axis or other location of the display optical system 14. For example, the camera 134 may be embedded within a lens of the system 14. Additionally, an image filtering technique may be applied to blend the camera into a user field of view to lessen any distraction to the user.”   ¶ [0094]: “In the embodiment of FIG. 3B, light sensor 134r may be embodied as a visible light camera, sometimes referred to as an RGB camera, or it may be embodied as an IR camera or a camera capable of processing light in both the visible and IR ranges, e.g. a depth camera. In this example, the image sensor 134r is the detection area 139r, and the image sensor 134 of the camera is located vertically on the optical axis 142 of the display optical system. In some examples, the camera may be located on frame 115 either above or below see-through lens 118 or embedded in the lens 118. In some embodiments, the illuminators 153 provide light for the camera, and in other embodiments the camera captures images with ambient lighting or light from its own light source. Gaze determination techniques based on image data, glint data or both may be used based on the geometry of the gaze detection elements.”   ¶ [0201]: “In one embodiment, an eye camera may capture about 5 to 10 mm of area around the visible eyeball portion of the cornea bulge, eye white, iris and pupil so as to capture part of an eyelid and eyelashes. A positionally fixed facial feature like a mole or freckle on skin such as an eyelid or on the bottom rim of the skin encasing the lower eyeball may also be present in the image data of the eye. In image samples, the position of the mole or freckle may be monitored for a change in position. If the facial feature has moved up, down, right or left, a vertical or horizontal shift can be detected. If the facial feature appears larger or smaller, a depth change in the spatial relationship between eye and display device 2 can be determined. There may be a criteria range in the change of position to trigger recalibration of the training images due to things like camera resolution, etc.”  ¶ [0202]: “In another example, although lighting is a factor which changes the size of the pupil and the ratio of pupil area to visible iris area within the circumference or perimeter of the iris, the size of the perimeter or circumference of the iris does not change with gaze change or lighting change; hence, the perimeter or circumference is a fixed characteristic of the iris as a facial feature. Through ellipse fitting of the iris, processor 210 or a processor of the processing unit 4,5 of the display device 2 can determine whether the iris has become larger or smaller in image data in accordance with criteria. If larger, the display device 2 with its illuminators 153 and at least one sensor 134 has moved closer in depth to the user's eye; if smaller, the display device 2 has moved farther away. A change in a fixed characteristic can trigger an IPD alignment check.”    ¶ [0204]: “In some examples, comparisons between current sensed data and training images is to determine a closest match and interpolate where the current pupil position data fits between training data sets to estimate a gaze vector. Particularly when using training data for comparison, movement of the gaze detection coordinate system is a cause for recalibrating the training images. One may periodically redetermine the positions of the cornea center and fixed center of rotation to determine whether there has been a change in the spatial relationship between them and the illuminators and at least one sensor. A periodic check may also determine whether a lighting change in accordance with a criteria has occurred triggering generation of new training gaze data sets.”); and 
 	the processor recognizes the external object located in the gaze direction of the user at the relative position of the user from among a plurality of external objects included in the forward-view image (¶ [0002]: “One or more real or virtual objects of interest are identified. For example, the user intent to interact with an object may be determined from a gaze duration with respect to the object.”  ¶ [0004]: “One or more processors are communicatively coupled to the image generation unit and the at least one sensor and have access to a memory for storing software and data including the eye data. Under the control of software, the one or more processors determine a current user focal region based on the eye data in a current user field of view and identifies one or more virtual objects having a target location in the current user field of view. The one or more processors controls the image generation unit for creating one or more images in which each of the one or more virtual objects appear at a respective focal region in the current user field of view for a natural sight view.”   ¶ [0005]: “The method comprises determining a current user field of view of a user wearing the see-through, near-eye, mixed reality device. The field of view includes one or more real objects. A gaze vector is determined for each user eye based on an arrangement of gaze detection elements in fixed positions with respect to each other on a respective display optical system for each eye of the display device. A current user focal region is determined based on the gaze vectors within the current user field of view. One or more images is displayed including one or more virtual objects appearing at a respective focal region in the current user field of view for a natural sight view.”  ¶ [0006]: “An object of interest is identified by determining a user intent to interact with the object in the current user focal region. An optimized image is generated based on the object of interest. The optimized image is displayed to the user via the see-through display device.”  ¶ [0052]: “The user's field of view, which is a portion of the environment or space that the user may observe at a current head position, is determined and real objects in the user field of view are identified.”   ¶ [0087]: “When the system renders a scene for the augmented reality display, it takes note of which real-world objects are in front of which virtual objects. If a virtual object is in front of a real-world object, then the opacity should be on for the coverage area of the virtual object. If the virtual is (virtually) behind a real-world object, then the opacity should be off, as well as any color for that pixel, so the user will only see the real-world object for that corresponding area (a pixel or more in size) of real light. Coverage would be on a pixel-by-pixel basis, so the system could handle the case of part of a virtual object being in front of a real-world object, part of the virtual object being behind the real-world object, and part of the virtual object being coincident with the real-world object.”   ¶ [0124]: “For a see-through mixed reality display device, the gaze vectors are determined to identify a point of gaze in a three-dimensional (3D) user field of view which includes both real objects, typically not under computer control, and virtual objects generated by an application. The gaze vectors may intersect at an object 10 feet away or at a distance effectively at infinity. The following figures briefly discuss embodiments for determining a 3D user field of view.”   ¶ [0126]: “FIG. 7 is a block diagram of a system embodiment for determining positions of objects within a user field of view of a see-through, near-eye display device. This embodiment illustrates how the various devices may leverage networked computers to map a three-dimensional model of a user field of view and the real and virtual objects within the model. An application 456 executing in a processing unit 4,5 communicatively coupled to a display device 2 can communicate over one or more communication networks 50 with a computing system 12 for processing of image data to determine and track a user field of view in three dimensions. The computing system 12 may be executing an application 452 remotely for the processing unit 4,5 for providing images of one or more virtual objects. Either or both of the applications 456 and 452 working together may map a 3D model of space around the user. A depth image processing application and skeletal tracking application 450 detects objects, identifies objects and their locations in the model. An object may be a person or a thing. Additionally, the depth image processing application performs skeletal tracking of at least humans. The application 450 may perform its processing based on depth image data from depth camera like 20A and 20B, two-dimensional or depth image data from one or more front facing cameras 113, and GPS metadata associated with objects in the image data obtained from a GPS image tracking application 458.”  ¶ [0128]: “The GPS image tracking application 458 identifies images of the user's location in one or more image database(s) 470 based on GPS data received from the processing unit 4,5 or other GPS units identified as being within a vicinity of the user, or both. Additionally, the image database(s) may provide accessible images of a location with metadata like GPS data and identifying data uploaded by users who wish to share their images. The GPS image tracking application provides distances between objects in an image based on GPS data to the depth image processing application 450. Additionally, the application 456 may perform processing for mapping and locating objects in a 3D user space locally and may interact with the GPS image tracking application for receiving distances between objects. Many combinations of shared processing are possible between the applications by leveraging network connectivity.”  ¶ [0135]: “FIG. 8A is a flowchart of a method embodiment for determining a three-dimensional user field of view. In step 510, one or more processors of the control circuitry 136, the processing unit 4,5, the hub computing system 12 or a combination of these receive image data from one or more front facing cameras, and in step 512 identify one or more real objects in front facing image data. Data from the orientation sensor 132, e.g. the three axis accelerometer 132C and the three axis magnetometer 132A, can also be used with the front facing camera 113 image data for mapping what is around the user, the position of the user's face and head in order to determine which objects, real or virtual, he or she is likely focusing on at the time. Based on an executing application, the one or more processors in step 514 identify virtual object positions in a user field of view which may be determined to be the field of view captured in the front facing image data. In step 516, a three-dimensional position is determined for each object in the user field of view. In other words, where each object is located with respect to the display device 2, for example with respect to the optical axis 142 of each display optical system 14.”  ¶ [0136]: “FIG. 8B is a flowchart of a method embodiment for identifying one or more real objects in a user field of view. This embodiment may be used to implement step 512. Each of the implementing examples in FIGS. 8B, 8D and 8E may be used separately or in conjunction with one another to identify the location of objects in the user field of view. In step 520, a location of user wearing the display device 2 is identified. For example, GPS data via a GPS unit 965 in the mobile device 5 or GPS transceiver 144 on the display device 2 may identify the location of the user. In step 522, one or more processors, retrieve one or more images of the location from a database (e.g. 470), and uses pattern recognition in step 524 to select one or more images matching image data from the one or more front facing cameras. In some embodiments, steps 522 and 524 may be performed remotely by a more powerful computer, e.g. hub 12, having access to image databases. Based on GPS data, in step 526 the one or more processors determines a relative position of one or more objects in front facing image data to one or more GPS tracked objects 528 in the location, and determines in step 529 a position of user from the one or more real objects based on the one or more relative positions.” ¶ [0138]: “Data from the orientation sensor 132 may also be sent for identifying face or head position.”  ¶ [0138]: “In step 542, the display device 2 receives data identifying one or more objects in a field of view for the user and their positions in a 3D model of a space. The image data from the one or more front facing cameras 113 approximates the user field of view, so the hub system 12 identifies the object in the front facing image data, for example through image recognition or pattern recognition software. Orientation data may also be used with the front facing image data to refine the user field of view and identify objects tracked by the computer system 12 falling within the user field of view. (The hub system 12 also aligns the front facing image data when received from two or more cameras 113 for identifying the user field of view.) The processing unit 4,5 in step 544 receives a position of the user in the 3D model of the space, and in step 546 the processing unit 4,5, or the processor 210 of the control circuitry 136 or both determines a position of one or more objects in the user field of view based on the positions of the user and the one or more objects in the 3D model of the space.”  ¶ [0151]: “FIG. 10 is a flowchart of an embodiment of a process for generating a natural sight display view of virtual objects with real objects. In step 552, the system determines the current user field of view. That is, the system determines a portion of the environment or space within the user's vision and identifies the real and virtual objects therein as discussed above. For determining at what the user is specifically looking during a time period, a current user focal region of the user's depth of focus in the field of view is determined. Gaze data indicates what is the focal point or point of gaze which defines the current user focal region within the field of view. The Panum's fusional area can be calculated based on a focal point on a focal curve, the Horopter, within the Panum's fusional area. The Panum's fusional area is the area of single vision for binocular stereopsis used by the human eyes.”  ¶ [0152]: “In step 554, a gaze vector is determined for each eye based on the geometry of one or more gaze detection elements. FIGS. 3A through 3D illustrate some embodiments of arrangements of gaze detection elements for each respective display optical system 14 of a display device 2. Determination of gaze is discussed in more detail with respect to FIGS. 16 through 24. In step 556, a current user focal region is determined based on the gaze vectors within the current user field of view.”   ¶ [0158]: “FIG. 12 is a flowchart of an embodiment of a method for enhancing the display view of the one or more objects of interest in a see-through, mixed reality display device. In step 568, one or more processors of the augmented reality system determine one or more objects of interest to a user in the current user focal region. By identifying the objects of interest to a user, more relevant information may be targeted to a user, and the display view decluttered. In step 570, the one or more processors controls the different display elements for enhancing the display view of the one or more objects of interest. Both real and virtual objects of interest may be enhanced. For example, FIGS. 26B and 26C provide examples of how a zoom function may be implemented for a virtual object and for a real object for enhancing the display view of the one or more objects.”  ¶ [0170]: “FIG. 17 is a flowchart of a method embodiment for determining gaze in a see-through, near-eye mixed reality display system and provides an overall view of how a near-eye display device can leverage its geometry of optical components to determine gaze and a depth change between the eyeball and a display optical system. One or more processors of the mixed reality system such as processor 210 of the control circuitry, that in processing unit 4, the mobile device 5, or the hub computing system 12, alone or in combination, determine in step 602 boundaries for a gaze detection coordinate system. In step 604, a gaze vector for each eye is determined based on reflected eye data including glints, and in step 606 a point of gaze, e.g. what the user is looking at, is determined for the two eyes in a three-dimensional (3D) user field of view. As the positions and identity of objects in the user field of view are tracked, for example, by embodiments like in FIGS. 8A-8F, in step 608, any object at the point of gaze in the 3D user field of view is identified. In many embodiments, the three-dimensional user field of view includes displayed virtual objects and an actual direct view of real objects. The term object includes a person.”  ¶ [0205]: “FIGS. 25A through 26D describe embodiments for identifying an object of interest based a user's intent to interact with the object and an optimizing an image or a display view for that interaction.”  ¶ [0206]: “FIG. 25A is a flowchart describing one embodiment of a process for identifying an object of interest based on a user's intent to interact with the object in the user's focal region. For example, the process of FIG. 25A is one example implementation of step 568 of FIG. 12. FIG. 25A describes a process by which a user's intent to interact with one or more objects in the user's focal region is determined based on detecting the user's eye gaze patterns in the user's focal region and determining the duration of the user's gaze on one or more objects being viewed by the user in the user's focal region.”  ¶ [0208]: “In step 693, it is determined if the user is viewing one or more objects. For example, the locations of points of gaze along the scanpath may be used to detect if the user is viewing one or more objects.”  ¶ [0209]: “If it is determined that the user is viewing one or more objects, then the objects being viewed by the user are identified in step 694. For example, the objects may be identified as a wall clock, a round shiny table, John Doe, a green leather couch, etc. In step 696, the duration of the user's gaze on the one or more objects being viewed is determined. In one example, the duration of the user's gaze is determined based on determining the duration of the user's gaze (or fixation) on the objects within a time window.”   ¶ [0201]: “In one embodiment, an eye camera may capture about 5 to 10 mm of area around the visible eyeball portion of the cornea bulge, eye white, iris and pupil so as to capture part of an eyelid and eyelashes. A positionally fixed facial feature like a mole or freckle on skin such as an eyelid or on the bottom rim of the skin encasing the lower eyeball may also be present in the image data of the eye. In image samples, the position of the mole or freckle may be monitored for a change in position. If the facial feature has moved up, down, right or left, a vertical or horizontal shift can be detected. If the facial feature appears larger or smaller, a depth change in the spatial relationship between eye and display device 2 can be determined. There may be a criteria range in the change of position to trigger recalibration of the training images due to things like camera resolution, etc.”  ¶ [0202]: “In another example, although lighting is a factor which changes the size of the pupil and the ratio of pupil area to visible iris area within the circumference or perimeter of the iris, the size of the perimeter or circumference of the iris does not change with gaze change or lighting change; hence, the perimeter or circumference is a fixed characteristic of the iris as a facial feature. Through ellipse fitting of the iris, processor 210 or a processor of the processing unit 4,5 of the display device 2 can determine whether the iris has become larger or smaller in image data in accordance with criteria. If larger, the display device 2 with its illuminators 153 and at least one sensor 134 has moved closer in depth to the user's eye; if smaller, the display device 2 has moved farther away. A change in a fixed characteristic can trigger an IPD alignment check.”    ¶ [0204]: “In some examples, comparisons between current sensed data and training images is to determine a closest match and interpolate where the current pupil position data fits between training data sets to estimate a gaze vector. Particularly when using training data for comparison, movement of the gaze detection coordinate system is a cause for recalibrating the training images. One may periodically redetermine the positions of the cornea center and fixed center of rotation to determine whether there has been a change in the spatial relationship between them and the illuminators and at least one sensor. A periodic check may also determine whether a lighting change in accordance with a criteria has occurred triggering generation of new training gaze data sets.”   ¶ [0221]: “FIG. 26A is a flowchart describing one embodiment of a process for generating an optimized image with one or more objects based on the user's intent to interact with them and displaying the optimized image to the user via the see-through, near-eye display device. In one embodiment, generating an optimized image comprises, in step 730, optionally, diminishing the appearance of objects that are outside the user's focal region but within the user's field of view that the user does not intend to interact with. In one embodiment, the opacity filter 114 in the near-eye display device 2 is utilized to block out or darken the objects that are outside the user's focal region to diminish the appearance of objects that are outside the user's focal region. Thus, a portion of the real-world scene which includes the objects that the user is not interested may be blocked out by the opacity filter 114 from reaching the user's eye, so that the objects that the user intends to interact with in the user's focal region may clearly be seen by the user.”  ¶ [0222]: “In step 732, the objects that the user intends to interact with in the user's focal region are visually enhanced. In step 740, one or more optimized images are displayed to the user via the head mounted display device 2. In one embodiment, the micro display assembly 120 in the see-through, mixed reality device 2 is utilized to visually enhance the one or more objects for interaction in the user's focal region. The objects may be real or virtual. One or more enhancement techniques may be applied. In one approach, the objects are visually enhanced by highlighting the edges of the objects, displaying a visual indicator, for example a virtual box or a circle, in a region in which the objects are located. In another example, a real or virtual object which is accelerating may have its edges enhanced by highlighting which tracks the object as it increases in speed. In another example, a sharp virtual outline of the edges of an object may be tracked at a focal distance the user has better focusing ability at while the object is still out of focus. Additionally, color may be used to enhance an object. Furthermore, one or more objects that it is determined a user intends to interact with may also be enhanced by zooming the one or more objects in or out. The zooming may be implemented by adjusting a focal region of the one or more objects.”   ¶ [0228]: “FIG. 26D is a flowchart describing one embodiment of a process for displaying additional augmented content for an object, based on determining the user's intent to interact with the object. In step 734, augmented content related to the one or more objects is retrieved. In one example, the augmented content may include user-specific information retrieved from the user profile database 472. In another example, the augmented content may include user-specific information that is retrieved in real time from one or more data sources such as the user's social networking sites, address book, email data, Instant Messaging data, user profiles or other sources on the Internet.”  ¶ [0229]: “In step 736, audio content related to the identified objects is extracted. Step 736 is optional. For example, if the user is looking at a wall clock in the user's living room and it is determined that the user intends to interact with the wall clock object then audio information about the time may be heard by the user. In step 738, the augmented content is projected over or next to the one or more objects in the user's focal region. In one example, the augmented content is a virtual image including one or more virtual objects or virtual text that is displayed to the user. In another example, the augmented content may include a virtual object such as a menu with one or more choices. In step 740, one or more optimized images are displayed to the user via the head mounted display device 2.”   ¶ [0230]: “FIGS. 27A-C depict one embodiment of a user's interaction with one or more objects in the user's environment and the generation of an optimized image based on the user's interaction. FIG. 27A depicts an environment in which a user views one or more objects in a room 1100 using a HMD device 2. The room 1100 includes a front wall 1102, side wall 1104 and floor 1108, and example furniture such as a lamp 1106, a chair 1107, a wall clock 1118 and a table 1120. A video display screen 1110 is mounted to the wall 1102, in this example, and the hub 1116 rests on the table 1120. In an exemplary situation, user 1112 looks at an object such as the wall clock 1118 placed on the front wall 1102, via HMD device 2. 1121 represents the field of view of the user and 1122 represents the user's focal region.”    ¶ [0231]: “FIG. 27B depicts an optimized image generated by the camera of the HMD device of FIG. 27A, upon determining the user's intent to interact with the wall clock object 1118. In one embodiment, the user's intent may be determined as discussed by the process described in FIG. 25A. As illustrated in FIG. 27B, the optimized image 1124 includes an enhanced appearance of the wall clock object 1118 in the user's focal region and a diminished appearance of the lamp 1106, the display screen 1110, the hub 1116 and the table 1120 which are outside the user's focal region, but within the user's field of view. In the exemplary illustration, the wall clock object 1118 has been highlighted to enhance its appearance. The dotted lines around the objects 1106, 1110, 1116 and 1120 indicate their diminished appearance. In addition, the optimized image displays augmented content 1126 that shows the time of day in digital format next to the wall clock object 1118 and a message indicating that "Chloe's plane left on time." The message may have been formulated by one or more processors based on user profile data identifying Chloe as a social networking site friend, the flight information being on the user's calendar, and a check to the airline website which indicates the flight has just left. In one example, audio information about the time of day may also be heard by the user.”   ¶ [0232]: “FIG. 27C depicts the optimized image of FIG. 27B as seen by a user via a HMD device. The optimized image is provided by each of the display optical systems 14l and 14r, of the see-through, near-eye display device 2. The open regions 1127 and 1128 indicate the locations where light from the display enters the user's eyes as the opacity filter has diminished the appearance of the other furniture.”). 
	Regarding claim 9 (depends on claim 1), PEREZ discloses:
 	the processor acquires augmented reality (AR) information about the recognized external object (¶ [0002]: “One or more real or virtual objects of interest are identified. For example, the user intent to interact with an object may be determined from a gaze duration with respect to the object.  In some instances, content related to the object is retrieved and projected over or next to the object. In other examples, the object is visually enhanced in appearance by adjusting its focal region.”  ¶ [0052]: “The user's field of view, which is a portion of the environment or space that the user may observe at a current head position, is determined and real objects in the user field of view are identified. One or more virtual objects are identified for display based on determining where they are in the current user field of view in accordance with an executing application. A current user focal region is determined based on a point of gaze which is determined based upon a gaze vector from each eye. One or more processors control an image generation unit of the display device for generating one or more images including each of the identified one or more virtual objects at a respective focal region in the current user field of view for a natural sight view.”  ¶ [0053]: “An optimized image is generated based on the user's intent to interact with one or more objects. The optimized image is displayed to the user via the see-through, near-eye, augmented reality display device. Visual content, audio content or both may be projected over or next to the one or more objects with which the user wishes to interact in the current user focal region. In other examples, the optimized image may include one or more of an enhanced appearance of objects in the user's focal region, and a diminished appearance of objects outside the user's focal region but within the user's field of view.”  ¶ [0107]: “Processing unit 4, the components of which are depicted in FIG. 5, will receive the sensory information from the display device 2 and may also receive sensory information from hub computing device 12 (See FIG. 1). Based on that information, processing unit 4 will determine where and when to provide a virtual image to the user and send instructions accordingly to the control circuitry 136 of the display device 2.”    ¶ [0115]: “CPU 320 and GPU 322 are the main workhorses for determining where, when and how to insert virtual images into the view of the user.”  ¶ [0221]: “FIG. 26A is a flowchart describing one embodiment of a process for generating an optimized image with one or more objects based on the user's intent to interact with them and displaying the optimized image to the user via the see-through, near-eye display device. In one embodiment, generating an optimized image comprises, in step 730, optionally, diminishing the appearance of objects that are outside the user's focal region but within the user's field of view that the user does not intend to interact with. In one embodiment, the opacity filter 114 in the near-eye display device 2 is utilized to block out or darken the objects that are outside the user's focal region to diminish the appearance of objects that are outside the user's focal region. Thus, a portion of the real-world scene which includes the objects that the user is not interested may be blocked out by the opacity filter 114 from reaching the user's eye, so that the objects that the user intends to interact with in the user's focal region may clearly be seen by the user.”  ¶ [0222]: “In step 732, the objects that the user intends to interact with in the user's focal region are visually enhanced. In step 740, one or more optimized images are displayed to the user via the head mounted display device 2. In one embodiment, the micro display assembly 120 in the see-through, mixed reality device 2 is utilized to visually enhance the one or more objects for interaction in the user's focal region. The objects may be real or virtual. One or more enhancement techniques may be applied. In one approach, the objects are visually enhanced by highlighting the edges of the objects, displaying a visual indicator, for example a virtual box or a circle, in a region in which the objects are located. In another example, a real or virtual object which is accelerating may have its edges enhanced by highlighting which tracks the object as it increases in speed. In another example, a sharp virtual outline of the edges of an object may be tracked at a focal distance the user has better focusing ability at while the object is still out of focus. Additionally, color may be used to enhance an object. Furthermore, one or more objects that it is determined a user intends to interact with may also be enhanced by zooming the one or more objects in or out. The zooming may be implemented by adjusting a focal region of the one or more objects.”   ¶ [0228]: “FIG. 26D is a flowchart describing one embodiment of a process for displaying additional augmented content for an object, based on determining the user's intent to interact with the object. In step 734, augmented content related to the one or more objects is retrieved. In one example, the augmented content may include user-specific information retrieved from the user profile database 472. In another example, the augmented content may include user-specific information that is retrieved in real time from one or more data sources such as the user's social networking sites, address book, email data, Instant Messaging data, user profiles or other sources on the Internet.”  ¶ [0229]: “In step 736, audio content related to the identified objects is extracted. Step 736 is optional. For example, if the user is looking at a wall clock in the user's living room and it is determined that the user intends to interact with the wall clock object then audio information about the time may be heard by the user. In step 738, the augmented content is projected over or next to the one or more objects in the user's focal region. In one example, the augmented content is a virtual image including one or more virtual objects or virtual text that is displayed to the user. In another example, the augmented content may include a virtual object such as a menu with one or more choices. In step 740, one or more optimized images are displayed to the user via the head mounted display device 2.”   ¶ [0230]: “FIGS. 27A-C depict one embodiment of a user's interaction with one or more objects in the user's environment and the generation of an optimized image based on the user's interaction. FIG. 27A depicts an environment in which a user views one or more objects in a room 1100 using a HMD device 2. The room 1100 includes a front wall 1102, side wall 1104 and floor 1108, and example furniture such as a lamp 1106, a chair 1107, a wall clock 1118 and a table 1120. A video display screen 1110 is mounted to the wall 1102, in this example, and the hub 1116 rests on the table 1120. In an exemplary situation, user 1112 looks at an object such as the wall clock 1118 placed on the front wall 1102, via HMD device 2. 1121 represents the field of view of the user and 1122 represents the user's focal region.”    ¶ [0231]: “FIG. 27B depicts an optimized image generated by the camera of the HMD device of FIG. 27A, upon determining the user's intent to interact with the wall clock object 1118. In one embodiment, the user's intent may be determined as discussed by the process described in FIG. 25A. As illustrated in FIG. 27B, the optimized image 1124 includes an enhanced appearance of the wall clock object 1118 in the user's focal region and a diminished appearance of the lamp 1106, the display screen 1110, the hub 1116 and the table 1120 which are outside the user's focal region, but within the user's field of view. In the exemplary illustration, the wall clock object 1118 has been highlighted to enhance its appearance. The dotted lines around the objects 1106, 1110, 1116 and 1120 indicate their diminished appearance. In addition, the optimized image displays augmented content 1126 that shows the time of day in digital format next to the wall clock object 1118 and a message indicating that "Chloe's plane left on time." The message may have been formulated by one or more processors based on user profile data identifying Chloe as a social networking site friend, the flight information being on the user's calendar, and a check to the airline website which indicates the flight has just left. In one example, audio information about the time of day may also be heard by the user.”   ¶ [0232]: “FIG. 27C depicts the optimized image of FIG. 27B as seen by a user via a HMD device. The optimized image is provided by each of the display optical systems 14l and 14r, of the see-through, near-eye display device 2. The open regions 1127 and 1128 indicate the locations where light from the display enters the user's eyes as the opacity filter has diminished the appearance of the other furniture.”), and 
 	displays the acquired AR information around the recognized external object (¶ [0002]: “content related to the object is retrieved and projected over or next to the object. In other examples, the object is visually enhanced in appearance by adjusting its focal region.”  ¶ [0052]: “The user's field of view, which is a portion of the environment or space that the user may observe at a current head position, is determined and real objects in the user field of view are identified. One or more virtual objects are identified for display based on determining where they are in the current user field of view in accordance with an executing application. A current user focal region is determined based on a point of gaze which is determined based upon a gaze vector from each eye. One or more processors control an image generation unit of the display device for generating one or more images including each of the identified one or more virtual objects at a respective focal region in the current user field of view for a natural sight view.” ¶ [0053]: “An optimized image is generated based on the user's intent to interact with one or more objects. The optimized image is displayed to the user via the see-through, near-eye, augmented reality display device. Visual content, audio content or both may be projected over or next to the one or more objects with which the user wishes to interact in the current user focal region. In other examples, the optimized image may include one or more of an enhanced appearance of objects in the user's focal region, and a diminished appearance of objects outside the user's focal region but within the user's field of view.”  ¶ [0153]: “Based on a software application executing in one or more computer systems such as the hub computing device 12 or the processing unit 4, 5, one or more virtual objects having a target location in the current user field of view are identified in step 558. For example, the processing unit 4, 5 or hub system 12 or both use the three-dimensional (3D) model of the environment and position and orientation data of the user's head to determine whether the target location of any virtual object is within the user's field of view. In step 560, the display optical systems 14 of the display device 2 display each identified virtual object to appear at a respective focal region for a natural sight view. FIGS. 14 and 15 provide more details of implementation examples for making each identified virtual object appear to be a respective focal region for a natural sight view.”   ¶ [0221]: “FIG. 26A is a flowchart describing one embodiment of a process for generating an optimized image with one or more objects based on the user's intent to interact with them and displaying the optimized image to the user via the see-through, near-eye display device. In one embodiment, generating an optimized image comprises, in step 730, optionally, diminishing the appearance of objects that are outside the user's focal region but within the user's field of view that the user does not intend to interact with. In one embodiment, the opacity filter 114 in the near-eye display device 2 is utilized to block out or darken the objects that are outside the user's focal region to diminish the appearance of objects that are outside the user's focal region. Thus, a portion of the real-world scene which includes the objects that the user is not interested may be blocked out by the opacity filter 114 from reaching the user's eye, so that the objects that the user intends to interact with in the user's focal region may clearly be seen by the user.”  ¶ [0222]: “In step 732, the objects that the user intends to interact with in the user's focal region are visually enhanced. In step 740, one or more optimized images are displayed to the user via the head mounted display device 2. In one embodiment, the micro display assembly 120 in the see-through, mixed reality device 2 is utilized to visually enhance the one or more objects for interaction in the user's focal region. The objects may be real or virtual. One or more enhancement techniques may be applied. In one approach, the objects are visually enhanced by highlighting the edges of the objects, displaying a visual indicator, for example a virtual box or a circle, in a region in which the objects are located. In another example, a real or virtual object which is accelerating may have its edges enhanced by highlighting which tracks the object as it increases in speed. In another example, a sharp virtual outline of the edges of an object may be tracked at a focal distance the user has better focusing ability at while the object is still out of focus. Additionally, color may be used to enhance an object. Furthermore, one or more objects that it is determined a user intends to interact with may also be enhanced by zooming the one or more objects in or out. The zooming may be implemented by adjusting a focal region of the one or more objects.”   ¶ [0228]: “FIG. 26D is a flowchart describing one embodiment of a process for displaying additional augmented content for an object, based on determining the user's intent to interact with the object. In step 734, augmented content related to the one or more objects is retrieved. In one example, the augmented content may include user-specific information retrieved from the user profile database 472. In another example, the augmented content may include user-specific information that is retrieved in real time from one or more data sources such as the user's social networking sites, address book, email data, Instant Messaging data, user profiles or other sources on the Internet.”  ¶ [0229]: “In step 736, audio content related to the identified objects is extracted. Step 736 is optional. For example, if the user is looking at a wall clock in the user's living room and it is determined that the user intends to interact with the wall clock object then audio information about the time may be heard by the user. In step 738, the augmented content is projected over or next to the one or more objects in the user's focal region. In one example, the augmented content is a virtual image including one or more virtual objects or virtual text that is displayed to the user. In another example, the augmented content may include a virtual object such as a menu with one or more choices. In step 740, one or more optimized images are displayed to the user via the head mounted display device 2.”   ¶ [0230]: “FIGS. 27A-C depict one embodiment of a user's interaction with one or more objects in the user's environment and the generation of an optimized image based on the user's interaction. FIG. 27A depicts an environment in which a user views one or more objects in a room 1100 using a HMD device 2. The room 1100 includes a front wall 1102, side wall 1104 and floor 1108, and example furniture such as a lamp 1106, a chair 1107, a wall clock 1118 and a table 1120. A video display screen 1110 is mounted to the wall 1102, in this example, and the hub 1116 rests on the table 1120. In an exemplary situation, user 1112 looks at an object such as the wall clock 1118 placed on the front wall 1102, via HMD device 2. 1121 represents the field of view of the user and 1122 represents the user's focal region.”    ¶ [0231]: “FIG. 27B depicts an optimized image generated by the camera of the HMD device of FIG. 27A, upon determining the user's intent to interact with the wall clock object 1118. In one embodiment, the user's intent may be determined as discussed by the process described in FIG. 25A. As illustrated in FIG. 27B, the optimized image 1124 includes an enhanced appearance of the wall clock object 1118 in the user's focal region and a diminished appearance of the lamp 1106, the display screen 1110, the hub 1116 and the table 1120 which are outside the user's focal region, but within the user's field of view. In the exemplary illustration, the wall clock object 1118 has been highlighted to enhance its appearance. The dotted lines around the objects 1106, 1110, 1116 and 1120 indicate their diminished appearance. In addition, the optimized image displays augmented content 1126 that shows the time of day in digital format next to the wall clock object 1118 and a message indicating that "Chloe's plane left on time." The message may have been formulated by one or more processors based on user profile data identifying Chloe as a social networking site friend, the flight information being on the user's calendar, and a check to the airline website which indicates the flight has just left. In one example, audio information about the time of day may also be heard by the user.”   ¶ [0232]: “FIG. 27C depicts the optimized image of FIG. 27B as seen by a user via a HMD device. The optimized image is provided by each of the display optical systems 14l and 14r, of the see-through, near-eye display device 2. The open regions 1127 and 1128 indicate the locations where light from the display enters the user's eyes as the opacity filter has diminished the appearance of the other furniture.”). 
	Regarding claims 16-18, claim 16-18 are directed, respectively, to the method(s) implemented by the device(s) of claims 1-3, and, as such, are rejected for the same reasons applied above in the rejections of claims 1-3, respectively.
Claim Rejections – 35 USC § 103
 	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 

Determining the scope and contents of the prior art;
Ascertaining the differences between the prior art and the claims at issue;
Resolving the level of ordinary skill in the pertinent art; and
Considering objective evidence present in the application indicating obviousness or nonobviousness.

   	Claims 4-5 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over PEREZ et al. (US 2013/0050432) in view of FORUTANPOUR et al. (US 2013/0335573, hereinafter “FORUTANPOUR”).
 	Regarding claim 4 (depends on claim 1), whereas PEREZ may not be entirely explicit as to, FORUTANPOUR teaches:
  	the transparent display includes a touchscreen (¶ [0004]: “a transparent touch screen lens to select objects within a field of view”.  ¶ [0006]: “In some embodiments, a viewing apparatus (e.g. head mounted display, augmented reality goggles) may include at least one lens, wherein the lens can sense touches and output touch data indicative of a location of a touch on the lens by the user. A processor may be included in the viewing apparatus, wherein the processor may receive the touch data or other data indicative of the touch data, and may select an object within the field of view of the user corresponding to the touch data, wherein the object and the location of the touch on the lens by the user are on a common line of sight of the user.”); 
 	the processor, when a specific point on the touchscreen is touched (¶ [0006]: “the processor may receive the touch data or other data indicative of the touch data”), recognizes the external object (¶ [0006]: “select an object within the field of view of the user”; and/or ¶ [0057]: “determine that the user 802 has identified the United States Capitol 804”), based on the touched point (¶ [0006]: “location of the touch on the lens”), the relative position of the user (e.g., ¶ [0006]: “field of view of the user” and/or ¶ [0057]: “the user's 802 position”), and the gaze direction of the user (e.g., ¶ [0006]: “on a common Iine of sight of the user.”) (¶ [0006]: “In some embodiments, a viewing apparatus (e.g. head mounted display, augmented reality goggles) may include at least one lens, wherein the lens can sense touches and output touch data indicative of a location of a touch on the lens by the user. A processor may be included in the viewing apparatus, wherein the processor may receive the touch data or other data indicative of the touch data, and may select an object within the field of view of the user corresponding to the touch data, wherein the object and the location of the touch on the lens by the user are on a common line of sight of the user.”  ¶ [0009]: “In some embodiments, the processor of the viewing apparatus may be further configured to receive an image from the camera, identify a location on the image that is mapped to the location of the touch by the user on the at least one lens, and then select the object within the user's field of view corresponding to the location of the touch on the lens and also the location in the image mapped to the location of the touch on the lens. For example, the processor may receive a digital image from the camera, comprised of a rectangular matrix of digital pixels. The processor may calculate a coordinate position (e.g. (x, y)) of the touch by the user on the lens, and map that coordinate position to a coordinate position in the digital image from the camera that corresponds to a pre-calibrated mapped location of pixels of the image (e.g. (x', y')). The processor may then select the object within the image that corresponds to that location of pixels (x', y').”   ¶ [0012]: “In some embodiments, the viewing apparatus may further comprise at least two lenses, wherein the image projection module may be further configured to display visual information of a selected object based further on which lens is touched (e.g. (x, y, 1)). For example, if the user touches a location on the left lens that corresponds to a person within the user's line of sight and the touch, then the processor may attempt to identify the person, and the image projection module may display the selected person's Facebook profile. On the other hand, if the right lens was touched, only a circle may display over the selected person. In some embodiments, touching one of the lenses corresponds to an information display function, for example causing the pico-projector to display an identity of a selected individual, while touching the other lens corresponds to an action, for example calling the selected individual. Such action may include, for example, transmission of information to another device such as a phone.”  ¶ [0039]: “Referring to FIG. 4, the graphical illustration 400 represents various embodiments using touch lens features. To activate various functions, a user 402, able to perceive a field of view 404 through a viewing apparatus 420, may touch the front side of a lens 406 of the viewing apparatus 420. The lens 406 may be at least one transparent or semi-transparent pane, sheet or film, comprised of at least one type of material, and is not limited to it providing optical enhancements or the like. The lens 406, being touch sensitive and configured to output touch data, may then transmit the touch data derived from the user's touch to a processor, not shown. The processor may be built into the viewing apparatus or contained at a remote location. The processor may then receive the touch data or some substantial representation of said touch data and perform an operation based on the touch data or said substantial representation. Such operations may include selecting objects within the user's 402 field of view 404 and in accordance with the user's 402 line of sight. For example, the user 402 may touch the location on the lens 406 that corresponds with the user's 402 line of sight of the dog 422. Based on the touch, the viewing apparatus 420 may then perform the operation of selecting the location on the lens of the touch that corresponds with the line of sight of the dog 422.”  ¶ [0049]: “After turning on the cameras 710 and 712, the user 702 may touch the location on his right lens that corresponds in his line of sight with a man's face 714 that is within the user's field of view. By touching the right lens, that may signal to the viewing apparatus 720 to perform operations that interact with the user's surroundings, such as selecting and highlighting the object corresponding to the user's touch location. Thus, the camera 712 may record an image and/or start video recording at least the portion of view corresponding to the user's touch. In this case, the camera is instructed to focus on the man's face 714.”  ¶ [0050]: “Based on the touch data and image recording, the light projector 716 may display a highlight or circle around the selected object according to the location of the user's 702 touch. Thus, it can be seen that each camera and each light projector of embodiments can perform different operations.”    ¶ [0054]: “touching a lens at a particular location may signal to select an object corresponding to the line of sight and the touch,”   ¶ [0056]: “Referring to FIG. 8, graphical illustration 800 represents other various embodiments. Here, viewing apparatus 820 may be configured to provide information about buildings and notable objects within the user's 802 field of view. For example, the user 802 may be touring the United States Capitol building 804 and be able to see it within his field of view. The user 802 may touch the lens corresponding to the area aligned with his line of sight and with the Capitol 804. Camera 818 may take a picture of the Capitol 804. Then, the image recorded from camera 818 may be recognized not to be a person's face this time, but of a building.”  ¶ [0057]: “Subsequently, viewing apparatus 820 may utilize a satellite positioning system (SPS) operation 806 that may be one of the applications available in the viewing apparatus 820. For example, the SPS function 806 may comprise a global satellite positioning (GPS) operation and/or a GLONASS operation. The SPS function 806 may activate, after having recognized that a building is selected, in order to determine the user's 802 position. A compass application feature 808 may also be used to determine which direction the user 802 was facing when the image was identified, and based on such information, the viewing apparatus 820 or some remote application may determine that the user 802 has identified the United States Capitol 804.”  ¶ [0085]: “A third option starting from block 1002, is for the user to touch the right lens, at block 1026. This may result in actions that interface with the user's field of view and surroundings, at block 1028. Such an action may be depicted in FIGS. 7 and 8, above. For example, the user may touch a location on the right lens where the user's touch aligns with a line of sight that corresponds to a person's face within the user's field of view. This may cause a processor built into the goggles to display a highlight or circle around the face in the lens, and then access a database of images or look up information via wireless connection that corresponds to the selected face, at blocks 1028 and 1030. The processor may employ image recognition to match the face, or may upload a snapshot of the selected image to a cloud-computing server to post-process for more information. Alternatively, the user may touch a location on the right lens that corresponds to a building, or a sign. The processor may employ the same image selection and recognition functions, but for objects, to identify more information about the building or sign.”). 
 	Thus, in order to obtain a more versatile and user friendly XR device having the cumulative features and/or functionality taught by PEREZ and FORUTANPOUR, it would have been obvious to one of ordinary skill in the art to have modified the XR device taught by PEREZ to include a touchscreen transparent display and the functionality of recognizing the external object, based on the touched point, the relative position of the user and the gaze direction of the user, as taught by FORUTANPOUR.
 	Regarding claim 5 (depends on claim 4), FORUTANPOUR also teaches:
 	the processor recognizes the external object (¶ [0006]: “select an object within the field of view of the user”; and/or ¶ [0057]: “identified the United States Capitol 804”) that is located at a specific point (i.e., the real world location of the object and/or the real world location of the United States Capitol appearing in the field of view of the user) where the touched point at the relative position (¶ [0006]: “within the field of view of the user corresponding to the touch data,”) meets the gaze direction (¶ [0006]: “wherein the object and the location of the touch on the lens by the user are on a common line of sight of the user”) (¶ [0006]: “In some embodiments, a viewing apparatus (e.g. head mounted display, augmented reality goggles) may include at least one lens, wherein the lens can sense touches and output touch data indicative of a location of a touch on the lens by the user. A processor may be included in the viewing apparatus, wherein the processor may receive the touch data or other data indicative of the touch data, and may select an object within the field of view of the user corresponding to the touch data, wherein the object and the location of the touch on the lens by the user are on a common line of sight of the user.”  ¶ [0012]: “In some embodiments, the viewing apparatus may further comprise at least two lenses, wherein the image projection module may be further configured to display visual information of a selected object based further on which lens is touched (e.g. (x, y, 1)). For example, if the user touches a location on the left lens that corresponds to a person within the user's line of sight and the touch, then the processor may attempt to identify the person, and the image projection module may display the selected person's Facebook profile. On the other hand, if the right lens was touched, only a circle may display over the selected person. In some embodiments, touching one of the lenses corresponds to an information display function, for example causing the pico-projector to display an identity of a selected individual, while touching the other lens corresponds to an action, for example calling the selected individual. Such action may include, for example, transmission of information to another device such as a phone.”  ¶ [0039]: “Referring to FIG. 4, the graphical illustration 400 represents various embodiments using touch lens features. To activate various functions, a user 402, able to perceive a field of view 404 through a viewing apparatus 420, may touch the front side of a lens 406 of the viewing apparatus 420. The lens 406 may be at least one transparent or semi-transparent pane, sheet or film, comprised of at least one type of material, and is not limited to it providing optical enhancements or the like. The lens 406, being touch sensitive and configured to output touch data, may then transmit the touch data derived from the user's touch to a processor, not shown. The processor may be built into the viewing apparatus or contained at a remote location. The processor may then receive the touch data or some substantial representation of said touch data and perform an operation based on the touch data or said substantial representation. Such operations may include selecting objects within the user's 402 field of view 404 and in accordance with the user's 402 line of sight. For example, the user 402 may touch the location on the lens 406 that corresponds with the user's 402 line of sight of the dog 422. Based on the touch, the viewing apparatus 420 may then perform the operation of selecting the location on the lens of the touch that corresponds with the line of sight of the dog 422.”  ¶ [0049]: “After turning on the cameras 710 and 712, the user 702 may touch the location on his right lens that corresponds in his line of sight with a man's face 714 that is within the user's field of view. By touching the right lens, that may signal to the viewing apparatus 720 to perform operations that interact with the user's surroundings, such as selecting and highlighting the object corresponding to the user's touch location. Thus, the camera 712 may record an image and/or start video recording at least the portion of view corresponding to the user's touch. In this case, the camera is instructed to focus on the man's face 714.”  ¶ [0050]: “Based on the touch data and image recording, the light projector 716 may display a highlight or circle around the selected object according to the location of the user's 702 touch. Thus, it can be seen that each camera and each light projector of embodiments can perform different operations.”    ¶ [0054]: “touching a lens at a particular location may signal to select an object corresponding to the line of sight and the touch,”   ¶ [0056]: “Referring to FIG. 8, graphical illustration 800 represents other various embodiments. Here, viewing apparatus 820 may be configured to provide information about buildings and notable objects within the user's 802 field of view. For example, the user 802 may be touring the United States Capitol building 804 and be able to see it within his field of view. The user 802 may touch the lens corresponding to the area aligned with his line of sight and with the Capitol 804. Camera 818 may take a picture of the Capitol 804. Then, the image recorded from camera 818 may be recognized not to be a person's face this time, but of a building.”  ¶ [0057]: “Subsequently, viewing apparatus 820 may utilize a satellite positioning system (SPS) operation 806 that may be one of the applications available in the viewing apparatus 820. For example, the SPS function 806 may comprise a global satellite positioning (GPS) operation and/or a GLONASS operation. The SPS function 806 may activate, after having recognized that a building is selected, in order to determine the user's 802 position. A compass application feature 808 may also be used to determine which direction the user 802 was facing when the image was identified, and based on such information, the viewing apparatus 820 or some remote application may determine that the user 802 has identified the United States Capitol 804.”  ¶ [0085]: “A third option starting from block 1002, is for the user to touch the right lens, at block 1026. This may result in actions that interface with the user's field of view and surroundings, at block 1028. Such an action may be depicted in FIGS. 7 and 8, above. For example, the user may touch a location on the right lens where the user's touch aligns with a line of sight that corresponds to a person's face within the user's field of view. This may cause a processor built into the goggles to display a highlight or circle around the face in the lens, and then access a database of images or look up information via wireless connection that corresponds to the selected face, at blocks 1028 and 1030. The processor may employ image recognition to match the face, or may upload a snapshot of the selected image to a cloud-computing server to post-process for more information. Alternatively, the user may touch a location on the right lens that corresponds to a building, or a sign. The processor may employ the same image selection and recognition functions, but for objects, to identify more information about the building or sign.”). 
	Thus, in order to obtain a more versatile and user friendly XR device having the cumulative features and/or functionality taught by PEREZ and FORUTANPOUR, it would have been obvious to one of ordinary skill in the art to have modified the XR device taught by PEREZ to include a touchscreen transparent display and the functionality of recognizing the external object, based on the touched point, the relative position of the user and the gaze direction of the user, as taught by FORUTANPOUR.
	Regarding claims 19-20, claim 19-20 are directed, respectively, to the method(s) implemented by the device(s) of claims 4-5, and, as such, are rejected for the same reasons applied above in the rejections of claims 4-5, respectively.
     	Claims 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over PEREZ et al. (US 2013/0050432) in view of COSTA (US 2018/0365898).
 	Regarding claim 6 (depends on claim 1), whereas PEREZ may not be entirely explicit as to, Costa teaches:
 	the processor, when recognition failure of the external object has occurred, displays information notifying of the recognition failure on the transparent display (¶ [0045]: “In some examples, the object classifier 316 and/or object classification system 360 may be configured to automatically identify additional information that may facilitate an identification or a classification of a real-world object. As a first example, in response to a real-world object receiving a general classification, additional information may be identified that could allow a more specific classification to be obtained for the real-world object.”  ¶ [0046]: “The object classifier 316 and/or object classification system 360 may identify additional information useful for determining whether one of the more specific classifications may apply to the real-world object. As a second example, in response to multiple likely classification options being identified for a real-world object, additional information may be identified that could allow a more definitive classification. The object classification system 360 may indicate the identified additional information to object classifier 316, and in response to this indication, the MR device 250 may perform actions to obtain and provide the additional information to the object classification system 360. Examples of the additional information include, but are not limited to, higher resolution image data, image data for currently uncaptured areas of a physical space and/or real-world object, a user selection of an option from a list of options, and/or a user response to a query presented via the MR device 250 (for example, a spoken or text response provided by the user 240).”). 
   	Thus, in order to obtain a more versatile extended reality (XR) device having the  cumulative features and/or functionality taught by PEREZ and COSTA, it would have been obvious to one of ordinary skill in the art to have modified the XR device taught by PEREZ so as to also include the functionality of displaying information notifying of the recognition failure on the transparent display when recognition failure of the external object has occurred, as taught by COSTA.
	Regarding claim 7 (depends on claim 6), COSTA further teaches:
 	the recognition failure notification information includes specific information that guides movement of the user's relative position to successfully recognize the external object (¶ [0045]: “The object classifier 316 and/or object classification system 360 may identify additional information useful for determining whether one of the more specific classifications may apply to the real-world object. As a second example, in response to multiple likely classification options being identified for a real-world object, additional information may be identified that could allow a more definitive classification. The object classification system 360 may indicate the identified additional information to object classifier 316, and in response to this indication, the MR device 250 may perform actions to obtain and provide the additional information to the object classification system 360. Examples of the additional information include, but are not limited to, higher resolution image data, image data for currently uncaptured areas of a physical space and/or real-world object, a user selection of an option from a list of options, and/or a user response to a query presented via the MR device 250 (for example, a spoken or text response provided by the user 240).”   ¶ [0069]: “In another implementation, a request for additional information may be displayed to the user 240, permitting the user 240 to collect additional image data using the MR device 250 (for example, previously uncaptured areas or surfaces of a real-world object and/or physical space) to be used for classification, choose from multiple classification options, and/or provide additional information regarding a real-world object. The additional information may be effective for allowing a real-world object to be classified.”).
     	Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over PEREZ et al. (US 2013/0050432) in view of MULLINS et al. (US 2016/0247324, hereinafter “MULLINS”).
 	Regarding claim 10 (depends on claim 9), whereas PEREZ may not be entirely explicit as to, DANIELS teaches: 
 	when size change of the external object visible through the transparent display occurs in response to a change of the user's relative position about the transparent display, the processor changes a display format of the AR information to another format according to the size change (¶ [0072]: “The shape, size, color, or appearance of the virtual content may change based on the position and orientation of the viewing device 101 relative to the physical object being viewed. For example, the virtual content may appear bigger as the viewing device 101 moves closer to the physical object associated with the virtual content. Similarly, the color of the virtual content may change based on a current state of the physical object (e.g., red for malfunction, green for normal operation).”). 
   	Thus, in order to obtain a more versatile extended reality (XR) device having the  cumulative features and/or functionality taught by PEREZ and MULLINS, it would have been obvious to one of ordinary skill in the art to have modified the XR device taught by PEREZ so as to also include the functionality of changing a display format of the AR information to another format according to a size change of an external object visible through the transparent display due to a change in relative position of the user, as taught by DANIELS.
     	Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over PEREZ et al. (US 2013/0050432) in view of MULLINS et al. (US 2016/0247324), further in view of COSTA (US 2018/0365898).
	Regarding claim 11 (depends on claim 10), whereas neither PEREZ nor MULLINS is explicit as to, COSTA teaches:
 	as the size of the external object is changed, the processor allows the display format of the AR information to be changed from a summarized display format to a gradually-detailed display format (¶ [0073]: “Furthermore, it should be understood that in some implementations, when first software application 810 is launched, graphical element(s) of the first software application 810 can be configured to remain connected, attached, linked, anchored, or otherwise tethered to the object. Thus, in different implementations, as the user 240 moves through the environment, the graphical elements of the first software application 810 remains in the proximity or vicinity of the first real-world object 210. In some implementations, the graphical elements--such as a graphical virtual interface--for the software application can be understood to be substantially fixed in space relative to the object to which it is associated. As the user 240 moves away, the graphical virtual interface of the software application remains keyed or tethered to the object. When the object is no longer in the field of vision displayed to the user 240 via the MR device 250, the graphical virtual interface of the software application may not be displayed to the user 240. However, when the user 240 returns to view the object, the graphical virtual interface of the software application will again be visible in proximity to or upon the object.”    ¶ [0074]: “In some cases, a graphical virtual interface of the first software application 810 can change in size as a position of the user 240 relative to the object changes. For example, a graphical virtual interface of the first software application 810 may decrease in size, perceptibility, discernibility, brightness, or other features as the user 240 moves further away from the first real-world object 210. Similarly, a graphical virtual interface of the first software application 810 may increase in size, perceptibility, discernibility, brightness, or other features as the user 240 moves closer to the first real-world object 210.”  ¶ [0086]: “In another example, the user 240 may select the object by providing the MR device 250 with a directional cue. In some implementations, the directional cue can comprise the direction of the user's gaze while wearing the MR device 250. This can provide the information necessary to the MR device 250 to determine which object the user 240 is referring to.” ). 
   	Thus, in order to obtain a more versatile extended reality (XR) device having the  cumulative features and/or functionality taught by PEREZ,  MULLINS and COSTA, it would have been obvious to one of ordinary skill in the art to have modified the XR device taught by the combination of PEREZ and MULLINS so as to also include the functionality of allowing the display format of the AR information to be changed from a summarized display format to a gradually-detailed display format as the apparent size of the external object changes, as taught by COSTA.
      Claims 13 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over PEREZ et al. (US 2013/0050432) in view of DANIELS et al. (US 2019/0114061, hereinafter “DANIELS”).
	Regarding claim 13 (depends on claim 1), whereas PEREZ may not be entirely explicit as to, DANIELS teaches:
 	 the external object includes an Internet of Things (IoT) device (e.g., ¶ [0286]: “the IoT device”) capable of communicating with the XR device (e.g., ¶ [0286]: “mobile device”;  ¶ [0306]: “wearable devices such as head mounted graphical displays (e.g., AR glasses)”) in a home (e.g., ¶ [0306]: “My House”) (¶ [0285]: “In general, association between the AR object data items and the IoT device data items may be established using any suitable method. A non-exhaustive list of example methods follows. In some examples, the IoT device may be GPS-enabled, and the user's mobile device can look up all IoT devices within a certain range, seeing the IoT device in question as part of a list of nearby available devices. The user may then choose to own/associate a shareable object with the IoT device.”  ¶ [0286]: “In some examples, the user's mobile device may scan an identifying QR code, or marker, possibly on the IoT device itself. In some examples, computer vision (CV) methods may be used to recognize individual IoT devices and associate with them. In general, computer vision (also referred to as machine vision) may include any suitable methods and systems for acquiring, processing, analyzing, and understanding images and dimensional data from the real world to produce numerical or symbolic information. For example, one form of computer vision can be used to determine a scene's geometry from various hardware inputs on a mobile device, such as cameras and sensors.”  ¶ [0287]: “Once an IoT device has been associated/owned, a position of the device in space may further be generated or determined by any suitable method, e.g., by Bluetooth triangulation, object recognition, registration to an object's point cloud, the scanning of a QR code, tracking to a marker placed on the IoT device.”   ¶ [0288]: “FIG. 25 is a schematic representation of a computing system that includes AR platform 2300 and three IoT devices 2500, 2502, 2504 interacting with each other via a communications network 2506, such as the Internet. In this example, a client device 2508 represents a device (e.g., a smart phone) that is operated by a human user 2510. Client device 2508 implements the client-side program 2512 of the AR platform, as an example corresponding to child application 2306 being instantiated by client-side program 2310 (see FIG. 23).”  ¶ [0289]: “A server system 2514 (e.g., a server, server cluster, etc.) implements a server-side program 2516 of the AR platform, corresponding to server-side program 2312 described above (see FIG. 23). An AR application layer 2518 is established (similar to AR application layer 2302, above), and may be referred to in this example as a "Universal IoT Layer." AR application layer 2518 may include one or more shareable objects spanning the client-side and server-side programs as described above with reference to FIG. 23. For example, AR application layer 2518 may include three shareable objects, a first shareable object 2520 (called "my bulb"), a second shareable object 2522 (called "my TV"), and a third shareable object 2524 (called "my thermostat").”  ¶ [0306]: “The user has grouped all of these AR objects into a single collection, and labeled it "My House." AR view 2800 represents an example of a completed setup and association of IoT objects, and the resulting interface provided via a mobile device. It will be understood that a mobile device may include handheld mobile devices (e.g., smartphones, tablet computers, etc.) and/or wearable devices such as head mounted graphical displays (e.g., AR glasses), among others.”); and 
 	the processor displays a user interface (UI) for controlling the IoT device (e.g., FIG. 28) on the transparent display (¶ [306]: “head mounted graphical displays (e.g., AR glasses)”) (¶ [0290]: “In this example, IoT devices 2500, 2502, and 2504 are physical devices in the real world, where device 2500 is an IoT light bulb, device 2502 is an IoT television, and device 2504 is an IoT thermostat. The shareable objects are each associated with a corresponding IoT device, such that shareable object 2520 is associated with device 2500, shareable object 2522 is associated with device 2502, and shareable object 2524 is associated with device 2504. Accordingly, each shareable object can be accessed by user 2510 to interact with the respective IoT device.”  ¶ [0291]: “This interaction may include viewing of status or state, viewing of settings, changing of settings, rule-based control of the IoT device, dynamic real-time control of the IoT device, and/or the like, or any combination of these. In some examples, the shareable AR object takes the form of a specific class of shareable AR object provided for each IoT device. This class may be defined (by first or third party developers) to use API's specific to the IoT device in question, to access the IoT device's data and/or controls. In some examples, these classes of shareable AR Object may "inherit" from a Universal IoT shareable AR Object class. This class includes a unified method of storing and displaying data "values" from the IoT devices. These values can be used as inputs and outputs from an If/Then AR object, which has support for handling booleans as well as numerical values and custom state-sets. A custom state set is a software object with an arbitrary number of states. In one implementation, each state is a number or a string. However, each state can be any software object for which a function exists that returns a hashable value. These states are predefined, and the "value" of the setting can be selected from one of these states.”  ¶ [0292]: “In some examples, a shareable AR object (e.g., object 2520) may be used to display one or more settings from an IoT device (e.g., IoT light bulb 2500). In a first step, an IoT interface script of AR object 2520 queries IoT device 2500 (over network 2506) regarding the state of the IoT device's variables or settings. In response, software or firmware of IoT device 2500 checks the state of each requested setting, and sends the state of the setting(s) back to the IoT interface script of AR object 2520 via network 2506. The IoT interface script is configured to then set a respective internal variable of AR object 2520 to reflect the state of each of the IoT device settings. A display script of AR object 2520 may then be called to determine what should be rendered on the display (e.g., "on" or "off" for a binary setting). The display script may make function calls to cause the renderer to display the appropriate text or imagery on user device 2508 when the scene is rendered as part of the display of the AR object.”  ¶ [0293]: “Continuing with this example, a shareable AR object may be used to change a setting of an IoT device, e.g., in response to a user action. First, mobile device 2508 of user 2510 registers a user input, and activates the AR platform program. The display script of AR object 2520 is then called, which determines how the setting should be changed based on the user input, and changes the internal state variable of the AR object to reflect the change. For example, the user may have indicated (e.g., with a tap of an icon) that the setting should be changed from "on" to "off." The internal state variable for that setting is then changed to "off." When the IoT interface script runs, possibly in response to the variable value change, it notes the change in the internal state variable and sends a command through the network to change the corresponding IoT device setting. The IoT device software receives this command, and interfaces with IoT device 2500 hardware to enact the intended change in the device.”   ¶  [0296]: “AR application layer 2612 contains IoT device libraries and code 2616 used to display a human-computer interface 2618 (as appropriate) to a human user of the client device 2610.”   ¶ [0297]: “The user may create IoT device settings 2620. At least some of these IoT device settings may take the form of shareable objects 2614. In some examples, a shareable object may be used to change the settings of an IoT device. This may be done by creating a shareable object and associating that object with the IoT device. The user then uses the interface of the shareable object to change the settings of the IoT device. As part of "Universal IoT" layer 2612, this process may include viewing the IoT device using user's device 2610, and then using an input method (e.g., a double tap, a button press, etc.) to indicate the user's desire to create a shareable AR object associated with this IoT device.”   ¶ [0298]: “Additionally or alternatively, a button or other GUI element may be assigned to ‘create new shareable AR settings object.’ As part of the process of creating the shareable AR settings object, the object is associated with one or more IoT devices. In some examples, the user may instead associate an existing shareable settings object not yet associated with any IoT devices.”   ¶ [0299]: “As an example, an IoT light bulb device may have a brightness setting (% brightness) with various states: "On" (100% brightness), "Off" (0% brightness), and a plurality of intermediate levels (1%-99% brightness). An AR object (e.g., in the shape of a dial or slider) may be created and associated with the IoT light bulb. The user can then interact with the AR object, e.g., by turning the dial or sliding the slider, to change the settings of the IoT light bulb, thereby affecting the level of brightness it produces. In some examples, a similar AR "dial" object could further be associated with several IoT light bulbs in a given room or area, and be interacted with to change all of their settings collectively. In some examples, this AR dial object could further be associated with another AR object--this time an AR object that appears as the end of a power cord plugged into a wall outlet. Interacting with the AR power cord (e.g., pulling the plug) changes the settings of all AR objects connected to it (including the AR dial) to "off." The associated IoT devices (including the aforementioned IoT light bulbs) would then respond by turning off.”   ¶ [0306]: “The user has grouped all of these AR objects into a single collection, and labeled it "My House." AR view 2800 represents an example of a completed setup and association of IoT objects, and the resulting interface provided via a mobile device. It will be understood that a mobile device may include handheld mobile devices (e.g., smartphones, tablet computers, etc.) and/or wearable devices such as head mounted graphical displays (e.g., AR glasses), among others.”). 
  	Thus, in order to obtain a more versatile extended reality (XR) device having the  cumulative features and/or functionality taught by PEREZ and DANIELS, it would have been obvious to one of ordinary skill in the art to have modified the XR device taught by PEREZ so as to also include the functionality of recognizing an external object that is an IoT device in a home and displaying a user interface (UI) for controlling the IoT device on the transparent display, as is expressly taught by DANIELS.
	Regarding claim 14 (depends on claim 1), whereas PEREZ may not be entirely explicit as to, DANIELS teaches:
 	the processor executes an application (e.g., ¶ [0268]: “a child application running inside the client-side AR platform program. The child application comprises an AR object.” … “The user may interact with this AR object to effectively interact with in the IoT device.”) related to the external object (¶ [0267]: “physical devices, such as Internet of Things (IoT) devices”) from among a plurality of applications of the XR device (e.g., ¶ [0290]: “shareable objects are each associated with a corresponding IoT device, such that shareable object 2520 is associated with device 2500, shareable object 2522 is associated with device 2502, and shareable object 2524 is associated with device 2504. Accordingly, each shareable object can be accessed by user 2510 to interact with the respective IoT device.”) (¶ [0267]: “As shown in FIGS. 25-34, this section describes an illustrative augmented reality (AR) platform for network-connected physical devices, such as Internet of Things (IoT) devices. The AR platforms described below are examples of AR platform 110. Any or all of the features and aspects of platform 110 described above may be combined with or present in the AR platforms described below.”   ¶ [0268]: “In general, and in accordance with the aspects of AR platform 110 described above (see, e.g., FIG. 3), an AR platform for IoT devices may include an AR platform program running on a server, an AR platform program running on a client device, and a child application running inside the client-side AR platform program. The child application comprises an AR object. Data from one or more IoT devices may be used as inputs for one or more scripts of this AR object (e.g., for displaying the data in an AR view). Similarly, one or more scripts of the AR object may send data as output to one or more IoT devices. This data may cause the IoT device(s) to alter their real-world operation states (e.g., an IoT light may turn on or off). The user may interact with this AR object to effectively interact with in the IoT device. The user may also view, edit, copy, delete, share, and/or reassign aspects of the AR object, and may grant or be granted permissions relating to any of these actions. Information relating to how users interact with shared AR objects may be tracked and/or used to filter the objects and/or collect object-related (e.g., popularity-based) data.”   ¶ [0269]: “In this context, an AR object script may include any suitable set of instructions contained in, and executable by, an AR object on an AR platform. Scripts may include features of an AR object and/or AR application layer that enable users, other AR content, or themselves, to manipulate, visualize, or interact with the AR object or layer in a variety of ways. For instance, instructions for rotating AR objects relative to a user's point of view, generating animations, and calling a certain phone number associated with an AR application layer are all examples of scripts. Scripts may be activated by events, assets, and/or by other scripts. Scripts can be used to define rules such (e.g., physics rules) and the behavior of objects based on those rules.”  ¶ [0285]: “In general, association between the AR object data items and the IoT device data items may be established using any suitable method. A non-exhaustive list of example methods follows. In some examples, the IoT device may be GPS-enabled, and the user's mobile device can look up all IoT devices within a certain range, seeing the IoT device in question as part of a list of nearby available devices. The user may then choose to own/associate a shareable object with the IoT device.”  ¶ [0286]: “In some examples, the user's mobile device may scan an identifying QR code, or marker, possibly on the IoT device itself. In some examples, computer vision (CV) methods may be used to recognize individual IoT devices and associate with them. In general, computer vision (also referred to as machine vision) may include any suitable methods and systems for acquiring, processing, analyzing, and understanding images and dimensional data from the real world to produce numerical or symbolic information. For example, one form of computer vision can be used to determine a scene's geometry from various hardware inputs on a mobile device, such as cameras and sensors.”  ¶ [0287]: “Once an IoT device has been associated/owned, a position of the device in space may further be generated or determined by any suitable method, e.g., by Bluetooth triangulation, object recognition, registration to an object's point cloud, the scanning of a QR code, tracking to a marker placed on the IoT device.”   ¶ [0288]: “FIG. 25 is a schematic representation of a computing system that includes AR platform 2300 and three IoT devices 2500, 2502, 2504 interacting with each other via a communications network 2506, such as the Internet. In this example, a client device 2508 represents a device (e.g., a smart phone) that is operated by a human user 2510. Client device 2508 implements the client-side program 2512 of the AR platform, as an example corresponding to child application 2306 being instantiated by client-side program 2310 (see FIG. 23).”  ¶ [0289]: “A server system 2514 (e.g., a server, server cluster, etc.) implements a server-side program 2516 of the AR platform, corresponding to server-side program 2312 described above (see FIG. 23). An AR application layer 2518 is established (similar to AR application layer 2302, above), and may be referred to in this example as a "Universal IoT Layer." AR application layer 2518 may include one or more shareable objects spanning the client-side and server-side programs as described above with reference to FIG. 23. For example, AR application layer 2518 may include three shareable objects, a first shareable object 2520 (called "my bulb"), a second shareable object 2522 (called "my TV"), and a third shareable object 2524 (called "my thermostat").”  ¶ [0290]: “In this example, IoT devices 2500, 2502, and 2504 are physical devices in the real world, where device 2500 is an IoT light bulb, device 2502 is an IoT television, and device 2504 is an IoT thermostat. The shareable objects are each associated with a corresponding IoT device, such that shareable object 2520 is associated with device 2500, shareable object 2522 is associated with device 2502, and shareable object 2524 is associated with device 2504. Accordingly, each shareable object can be accessed by user 2510 to interact with the respective IoT device.”). 
   	Thus, in order to obtain a more versatile extended reality (XR) device having the  cumulative features and/or functionality taught by PEREZ and DANIELS, it would have been obvious to one of ordinary skill in the art to have modified the XR device taught by PEREZ so as to also include the functionality of recognizing an external object that is an IoT device and executing a child application for controlling the IoT device, as is expressly taught by DANIELS.
       Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over PEREZ et al. (US 2013/0050432) in view of WANG et al. (US 2016/0065903, hereinafter “WANG”).
	Regarding claim 15 (depends on claim 1), whereas PEREZ may not be entirely explicit as to, WANG teaches:
 	the external object includes a product (¶ [0105]: “For example, a user may find an object of interest in a surrounding environment and may like to further find out whether the same object (also called item) and/or similar objects are available for purchasing and then may perform an order to purchase one or more objects and/or find a real store for hands-on checking and/or purchasing. It is possible to capture an image of the object of interest and identify the same or similar items available for purchasing based on image analysis of the captured image.”   ¶ [0112]: “Step 702 determines at least one target object among a plurality of objects according to the at least one scene image. The at least one target object may be contained or partially contained in the captured at least one scene image. In the example in FIG. 2, the determined scene camera 214 captures the person 226 in the scene image 231. The clothing (e.g. skirt) of the person 226 (which may be the object of interest or a part of the object of interest indicated by the attention direction 203) may be determined as a target object.”  ¶ [0115]: “One or more of a plurality of objects may be determined to be the at least one target object. The plurality of objects may be provided by one or more databases (e.g. the databases 711-713). In one example, the plurality of objects may include a plurality of shopping items available (e.g. online and/or in real stores) for purchasing. Each respective object of the plurality of objects may be associated with at least one reference image containing the respective object. Further, the respective object may have price information, manufacturer information, location information (e.g. a location for a real store), web link information, type or category information, etc. The plurality of objects are represented by their associated information in ant method or system disclosed herein.”   ); and 
 	the processor searches for shopping information related to the product on websites (¶ [0115]: “One or more of a plurality of objects may be determined to be the at least one target object. The plurality of objects may be provided by one or more databases (e.g. the databases 711-713). In one example, the plurality of objects may include a plurality of shopping items available (e.g. online and/or in real stores) for purchasing. Each respective object of the plurality of objects may be associated with at least one reference image containing the respective object. Further, the respective object may have price information, manufacturer information, location information (e.g. a location for a real store), web link information, type or category information, etc. The plurality of objects are represented by their associated information in ant method or system disclosed herein.”   ¶ [0116]: “The databases 711-713 may be located on a server computer side. For example, an online shop provides, on its online server computer, various clothing items with their reference images and prices, e.g. for skirts, jeans and shirts. The clothing items may be compared to the skirt of the person 226 in terms of their colors, shapes, and/or textures in order to determine at least one of the clothing items as the at least one target object. For this, image based matching or similarity measures could be employed for the comparison, e.g. match the image 231 or only the image region 233 with reference the images associated with the clothing items.”   ¶ [0117]: “In one embodiment, it is possible to automatically determine one or more target objects among the plurality of objects based on matching the at least one scene image with at least part of reference images associated with the plurality of objects. One or more reference images that are matched with the at least one scene image could be determined. Then respective objects related to the matched reference images can be determined as target objects. The image matching may be based on, e.g., image features (e.g. SIFT; SURF), template matching, histogram, texture model (e.g. co-occurrence matrices, wavelets), and/or machine learning (e.g. random forest).”  ¶ [0122]: “A vision based visual search method like that disclosed in Girod, Bernd, et al. "Mobile visual search." Signal Processing Magazine, IEEE 28.4 (2011): 61-76 or Philbin, James, et al. "Object retrieval with large vocabularies and fast spatial matching." Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 2007 (e.g. based on image features, similarity measures, template matching, and/or machine learning) may be performed in order to search, among the plurality of clothing items, one or more clothing items that have visual information (e.g. texture, color, and/or shape) similar or relevant to at least part of the scene image 231 (e.g. the region of interest 233) or to an object contained in the scene image 231 (e.g. the skirt 232). For this, at least part of the image 231 could be matched with reference image features or reference images associated with the plurality of clothing items.”  ¶ [0124]: “When at least one object of interest in the scene image and/or its type is recognized, this information may be provided to search at least one target object. For example, among the plurality of clothing items, only skirts may be considered as potential target objects and other clothing items are excluded from subsequent searching. For example, a skirt among the plurality of clothing items having similar color or texture as the skirt 232 may be determined based on an image matching method.”  ¶ [0128]: “Step 703 creates target object information related to the at least one target object. Target object information related to the determined at least one target object may be created. In one example, one or more skirts among the plurality of clothing items may be determined as the at least one target object. The skirts may come from one or more clothing providers. The target object information includes at least one of images containing the determined at least one target object, sizes, materials, prices, brands, clothing providers, online information links, and/or online store links related to the determined at least one target object.”   ), and 
 	displays the searched shopping information on the transparent display (¶ [0129]: “Optional step 704 displays the target object information on a display device. The target object information may be displayed on a display device, e.g. a LCD screen.”   ¶ [0142]: “Augmented reality systems could present enhanced information of a real object by providing a visualization of overlaying computer-generated virtual information with visual impressions or an image of the real object. For this, a real object is detected or tracked in order to retrieve or generate the relevant virtual information. The overlay of the virtual and real information can be seen by a user using a well-known video see-through device comprising a camera and a display screen. In this case, the object of interest is captured in an image by the camera. The overlay of the virtual information and the captured image is shown on the display screen to the user. The user often looks at the object of interest captured in the image displayed on the screen, but not at other objects captured in the image. Thus, the gaze information of the user or a pose of the user's face relative to the screen or the camera can determine the object of interest.”  ¶ [0143]: “In another embodiment, the overlay of the virtual and real information can be seen by a user in a well-known optical see-through device having semi-transparent glasses. In this case, the user sees through the semi-transparent glasses real objects of the real environment augmented with the virtual information blended in in the semi-transparent glasses. At least one camera is often attached to the optical see-through device in order to identify, track or reconstruct the object of interest by using computer vision methods. In this case, a spatial relationship between the camera attached to the optical see-through device and the user attention direction could be used to determine or detect image features in images captured by the camera. The image locations of the user attention directions in one image captured the camera could be determined according to that spatial relationship.”  ¶ [0150]: “Many Augmented Reality (AR) applications can benefit from the present invention. For example, in AR shopping, AR maintenance, and AR touring applications, there are multiple real objects located in the real world (e.g. clothing for AR shopping, engine components for AR maintenance, and monuments for AR touring). The user is often interested in one object at a time. The object of interest to the user could be determined according to the user attention direction, e.g. the gaze of the user, the pose of the face, or a hand pointing direction at that time. Then, only the object of interest may be detected, tracked, or reconstructed. Further, digital information related only to the object of interest would be generated and visually displayed on the top of an image of the object in an AR view.”  ¶ [0183]: “Augmented reality systems could present enhanced information of a real object by providing a visualization of overlaying computer-generated virtual information with visual impressions or an image of a real object. For this, the real object is detected or tracked in order to retrieve or generate the relevant virtual information.”   ¶ [0184]: “The overlay of the virtual and real information can also be seen by a user by means of a well-known optical see-through device having semi-transparent glasses. In this case, the user then sees through the semi-transparent glasses objects of the real environment augmented with the virtual information blended in, in the semitransparent glasses. At least one camera is often attached to the optical see-through device in order to identify, track or reconstruct the object of interest by using computer vision methods.”   ¶ [0205]: “For example, the processing system according to the invention is comprised, at least in part, in a mobile device (such as a mobile phone, wearable computer, tablet computer, mobile computer, often called laptop, or a head mounted display, such as used for optical see-through augmented reality applications) and/or in a server computer adapted to communicate with the mobile device. The processing system may be comprised in only one of these devices, e.g. in the mobile device or in the server computer, or may be a distributed system in which one or more processing tasks are distributed and processed by one or more processing devices which are distributed and are communicating with each other, e.g. by point to point communication or via a network.”   ¶ [0307]: “A display screen could also be a semi-transparent screen, like google glasses. One example is to place an optical-see-though device between the user's eye and the real object. The real object can then be directly observed through this semi-transparent screen of the optical-see-though device, while the virtual object is computer-generated and shown on the semi-transparent screen. This configuration is referred to as optical-see-through AR.”). 
Thus, in order to obtain a more versatile extended reality (XR) device having the  cumulative features and/or functionality taught by PEREZ and WANG, it would have been obvious to one of ordinary skill in the art to have modified the XR device taught by PEREZ so as to also include the functionality of searching for shopping information related to the external object and displaying the searched shopping information on the transparent display, as expressly taught by WANG.
Allowable Subject Matter
 	Claims 8 and 12 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
 	At present, it is not apparent to the examiner which part of the application could serve as a basis for new and allowable claims.   However, should the applicant nevertheless regard some particular matter as patentable, the examiner encourages applicant to appropriately amend the claims to include such matter and to indicate in the REMARKS the difference(s) between the prior art and the claimed invention as well as the significance thereof.
 	Furthermore, should applicant decide to amend the claims, examiner respectfully requests that the applicant please indicate in the REMARKS from which page(s), line(s) or claim(s) of the originally filed application that any amendments are derived.   See MPEP § 2163(II)(A) (There is a strong presumption that an adequate written description of the claimed invention is present in the specification as filed, Wertheim, 541 F.2d at 262, 191 USPQ at 96; however, with respect to newly added or amended claims, applicant should show support in the original disclosure for the new or amended claims.).
 	A shortened statutory period for reply to this action is set to expire THREE MONTHS from the mailing date of this action.  Extensions of time may be available under the provisions of 37 CFR 1.136(a).   In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. Failure to reply within the set or extended period for reply will, by statute, cause the application to become ABANDONED (35 USC § 133).  	
Contact Information
 		Any inquiry concerning this communication or earlier communications from the examiner should be directed to VINCENT PEREN whose telephone number is (571)270-7781.  The examiner can normally be reached on 10am-6pm M-F.
 		If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KING POON can be reached on 571-272-7440.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, please contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/VINCENT PEREN/
Examiner, Art Unit 2675

/KING Y POON/Supervisory Patent Examiner, Art Unit 2675