DETAILED ACTION
This action is responsive to the Amendment filed on 04/12/2022 and the Supplemental Amendment filed on 06/03/2022. Claims 14-15, 18-25, and 28-35 are pending in the case. Claims 1-13, 16, 17, 26, and 27 are canceled. Claims 34 and 35 are new. Claims 14 and 24 are the independent claims.
This office action is FINAL.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Applicant’s Response
In Applicant’s response dated 04/12/2022 (hereinafter Response), Applicant amended Claims 14, 18, 20, 24, 28, and 30; cancelled Claims 16, 17, 26, and 27; added Claims 34 and 35; Amended the Disclosure (specification); and argued against all objections and rejections previously set forth in the Office Action dated 01/13/2022.
Applicant's amendment to the disclosure is acknowledged. Applicant’s amendment to claims 14, 18, 20, 24, 28, and 30 to further clarify the metes and bounds of the invention are acknowledged. Applicant made no statement of specific support for any claim amendment, other than asserting “no new matter has been added” (see Response page 9).
It is noted that an applicant-initiated interview was held on 03/25/2022 during which no agreement was reached (see PTO-413 mailed 03/29/2022). After review of the amendment filed 04/12/2022 an examiner-initiated interview was held on 06/02/2022 (see PTO-413 mailed 06/07/2022). 
In response to the interview and consistent with Examiner suggestions to address potential rejections under 35 USC 112 of all claims in the Response, Applicant filed a supplemental response on 06/03/2022 (hereinafter Supplemental Response). In the Supplemental Response, Applicant further amended claims 14, 18, 20, 24, 28, 30, and 35-35; reiterated Applicant’s position with respect to the specification and drawings; and reiterated Applicant’s position with respect to the rejection of at least the independent claims under 35 USC 103.
Applicant’s further amendments to the claims are acknowledged.
Response to Amendment/Arguments
In response to Applicant's amendment to the disclosure, the objection to the disclosure and drawings are respectfully withdrawn.
In response to Applicant's argument with respect to the rejection of independent claims 14 and 24 under 35 USC 103 as unpatentable over LACEY in view of RABINOVICH (see Response, starting page 10; see Supplemental Response starting page 8), Examiner respectfully disagrees.
In Applicant’s arguments against LACEY, Applicant’s position is that LACEY cannot be relied upon to teach: 
input a data set of virtual user interactions with a plurality of virtual objects in a virtual environment to a trained machine learning program to output a probability of selection for each of the plurality of virtual objects in the virtual environment… 
identify a first virtual object among the plurality of virtual objects in the virtual environment as being selected by a user based on the probability of selection for the first virtual object output by the trained machine learning program exceeding a threshold value; and 
identify a manipulation of the first virtual object by the user
Applicant’s first position is “the disclosed machine learning in Lacey is to ‘learn the user’s behavior patterns and convergence/divergence tendencies’ and can be used to ‘adjust thresholds (e.g. increase variance threshold to determine convergence of hand input with other input(s)) or apply suitable filtering to compensate for the user's hand jitter" and learn "the sequence and timing of sensor convergences (or divergences) that are particular to a user.’” relying on LACEY [0445], see Supplemental Response page 8. However, this position does not take into account the fact that LACEY starts with some initial trained model for recognizing input, which is further trained over time so that recognition is improved. The sequence for using the system is in FIG 11 and includes (1140) display virtual UI, (1150) wait for a user command, (1160) if the gesture or other command is recognized, then (1170) generate virtual content/perform action based on the command. These are broadly explained at [0169]. The act of recognizing the command, which may be a gesture, a head or eye movement, input from a user device, etc. is performed by the system as a whole, where the system has been trained to recognize the input in order to determine what command must be performed.
Applicant’s second position acknowledges that “as disclosed in [0198] and [0203] of Lacey, a confidence score is calculated that is indicative of ‘a higher probability or likelihood that the system has identified the desired target object’” then argues that this cannot be a probability since scores can exceed 100 as shown in FIG 17C, however (1) there is no requirement in the claim of a specific range of probability values between, for example, 0% and 100% (or 0.0  to 1.0); and (2) FIG 17C shows an aggregated confidence score when there is more than one type of input, however if only one type of input is received [0224] makes clear 
a head pose input 1614 produces a higher confidence score for application A (80% confidence) over application B (30% confidence), whereas gesture input 1612 produces a higher confidence score for application B (60% confidence) over application A (30% confidence).
LACEY teaches a number of different possible ways to aggregate the confidence scores including [0225] simple addition and some calculated, weighted combination. Note that a “weighted combination” could result in a combined confidence score which is, for example, in the range between 0% and 100% (using simple mathematical operations) Further, [0227] makes clear that 
The confidence scores in the FIGS. 18A and 18B may be based on a single input mode (such as e.g., the user's head pose). Multiple confidence scores can be calculated (for some or all of the various multimodal inputs) and then aggregated to determine a user interface operation or a target virtual object based on multimodal user inputs
Applicant’s arguments against LACEY are not persuasive for at least these reasons.
Applicant makes no argument against RABINOVITCH in either the Response or Supplemental Response. 
The rejections of independent claims 14 and 24 are respectfully maintained, restated in response to Applicant’s amendments to the claims. New grounds of rejection are appropriate for claims 18 and 28 which recite subject matter not previously considered, and new claims 34-35.
Claim Rejections – 35 USC 103
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art. 
2. Ascertaining the differences between the prior art and the claims at issue. 
3. Resolving the level of ordinary skill in the pertinent art. 
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 14, 15, 18-25, and 28-35 are rejected under 35 U.S.C. 103 as being unpatentable over LACEY et al. (US 20190362557 A1) in view of RABINOVICH et al. (US 20200334461 A1).
Regarding claim 14 {and similarly claim 24}, LACEY teaches the system (e.g. FIG 2A, additional detail in FIG 2B), comprising a computer including a processor and a memory (e.g. [0101] local processing and data module 260 may comprise a hardware processor, as well as digital memory, such as non-volatile memory (e.g., flash memory), both of which may be utilized to assist in the processing, caching, and storage of data; note e.g. processor 128; [0499] program in Appendix A can be performed by the local processing and data module 260, the remote processing module 270, the central runtime server 1650, the processor 128, or other processor associated with the wearable system 200),  the memory storing instructions executable by the processor to {perform operations comprising} (broad method in FIG 11 [00168-0169] (1140) display virtual UI, (1150) wait for a user command, (1160) if the gesture or other command is recognized, then (1170) generate virtual content/perform action based on the command; note FIG 14 [0192-0195] illustrating examples of selecting a virtual object from a plurality of virtual objects using only one input mode or a combination of input modes; note also FIG 23 (2320) receive multi-modal inputs for a user interaction, (2330) identify a subject, a command, and a parameter of the user interaction based on the multi-modal inputs, (2340) execute the user interaction):
input a data set of virtual user interactions with a plurality of virtual objects in a virtual environment to a trained machine learning program to output a probability of selection for each of the plurality of virtual objects in the virtual environment (LACEY [0198] calculate confidence score for various objects in the user’s environment and select, as target object, the object with highest confidence [0203] a higher confidence score indicates a higher probability or likelihood that the system has identified the desired target object; [0264] the wearable system can receive multimodal inputs for a user interaction. The multimodal inputs may be direct or indirect inputs; [0265] the wearable system can parse the multimodal inputs to identify a subject, a command, and a parameter of the user interaction. For example, the wearable system can assign confidence scores to candidate target virtual objects, target commands, and target parameters and select the subject, command, and parameters based on the highest confidence scores), 

identify a first virtual object (e.g. the target object) among the plurality of virtual objects in the virtual environment as being selected by a user based on the probability of selection for the first virtual object output by the trained machine learning program (LACEY [0198, 0203, 0264-0265], select based on highest confidence scores) exceeding a threshold value (LACEY [0265] select the subject, command, and parameters based on the highest confidence scores; interpreting “highest” as exceeding all others, thus the next lower confidence score {probability} is a threshold value; note also [0212] Each candidate object may be associated with a confidence score, and in some cases, the candidate object with the highest confidence score (e.g., higher than other object's confidence scores or higher than a threshold score) is selected by the system as the target object); and
identify a manipulation (e.g. target command) of the first virtual object by the user (LACEY FIG 23 [0265] (2330) identify a subject, a command, and a parameter of the user interaction based on the multi-modal inputs…assign confidence scores to candidate target virtual objects, target commands, and target parameters and select the subject, command, and parameters based on the highest confidence scores [0266] (2340) execute the user interaction).
While LACEY clearly uses at least one trained machine learning program to analyze the user’s input in order to identify target virtual objects so that the user may interact with the virtual objects as explained above, LACEY does not appear to expressly disclose wherein the machine learning program is trained by determining a plurality of training data sets of user interactions with real objects in a real environment, and training the machine learning program with the training data sets, wherein the plurality of training data sets include trajectory data of at least one of head positions, hand positions, head orientations, or hand orientations;
LACEY is primarily directed to fusing the input from a multiple of different sensors (when such sensor data is available) in order to improve the user’s experience. The trained machine learning program of LACEY is capable of recognizing user input based on trajectory data of at least one of head positions, hand positions, head orientations, or hand orientations (see at least [0169] the UI may be a body centric ring around the user's body. The wearable system may then wait for the command (a gesture, a head or eye movement, input from a user input device, etc.), and if it is recognized (block 1160)… [0128] wearable system may determine head pose (position or orientation) [0158] describes hand gesture tracking [0159] describes eye tracking [0276] the wearable system can determine the user's hand position in view of the planar mesh to determine the link that the user is targeting and selecting). 
In other words, LACEY describes the wearable system using several different machine-trained recognition models to determine what virtual object the user might want to interact with. LACEY is merely deficient in describing how these one or more of machine-trained recognition models are actually trained.
RABINOVICH is directed to (abstract) head-mounted augmented reality (AR) device can include a hardware processor programmed to receive different types of sensor data from a plurality of sensors (e.g., an inertial measurement unit, an outward-facing camera, a depth sensing camera, an eye imaging camera, or a microphone); and determining an event of a plurality of events using the different types of sensor data and a hydra neural network. RABINOVICH may be relied upon to teach a machine learning program is trained by determining a plurality of training data sets of user interactions with real objects in a real environment, and training the machine learning program with the training data sets, wherein the plurality of training data sets include trajectory data of at least one of head positions, hand positions, head orientations, or hand orientations:
System trained to recognize [0059] head pose and orientation of user; recognize a physical location of a real object (e.g., user's head, totem, haptic device, wearable component, user's hand, etc.) and correlate the physical coordinates of the real object to virtual coordinates corresponding to one or more virtual objects being displayed to the user.
[0176] deep learning techniques used to determine as the pose of a user’s head relative to the environment (head pose includes head position and/or head orientation); [0185] system is configured to recognize gestures by user’s hands (hand pose includes hand position and/or hand orientation) by utilizing images during training time.
[0180] With the pose transformation determined, one may then integrate associated IMU data (from accelerometers, gyros, etc.- as discussed above) into the pose transformation and continue tracking as the user moves away from the origin, around the room, and at whatever trajectory. Such a system may be termed a "relative pose net", which as noted above, is trained based upon pairs of frames wherein the known pose information is available (the transformation is determined from one frame to the other, and based upon the variation in the actual images, the system learns what the pose transformation is in terms of translation and rotation) {In other words, system is trained using images to determine trajectory (translation, rotation) as pose (e.g. of hand, head) changes.
[0182] deep network solutions, such as those described above using convolutional neural nets to estimate pose, the smoothing issue may be addressed using a recurrent neural networks (RNN), which is akin to a long short term memory network… The simple structure of the RNN with built in feedback loop that allows it to behave like a forecasting engine, and the result when combined with the convolutional neural net in this embodiment is that the system can take relatively noisy trajectory data from the convolutional neural net, push it through the RNN, and it will output a trajectory that is much smoother, much more like human motion, such as motion of a user's head which may be coupled to a head mounted component (58) of a wearable computing system
 [0196] The process of training a neural network with a hydra architecture (272) involves presenting the network with both input data and corresponding target output data. This data, comprising both example inputs and target outputs, can be referred to as a training set.
[0227] under control of a hardware processor … receiving different types of training sensor data, wherein the training sensor data is associated with a plurality of different types of events; generating a training set comprising the different types of training sensor data as input data and the plurality of different types of events as corresponding target output data; and training a neural network, for determining a plurality of different types of events, using the training set (this is commensurate with the instant application [0031] training data sets TDS are obtained by acquiring data representative of user interactions of the user 4 in a real environment 8; [0043] An item of status information S is also associated with each training data set TDS, about which object 12a, 12b, 12c, 12d, 12e, 12f was selected and/or whether it was not selected by the user 4 in the real environment 8).
[0228] wherein the different types of training sensor data comprises inertial measurement unit data, image data, depth data, sound data, voice data, or any combination thereof (all examples of real-environment data, raw image data necessarily includes real objects which must be detected in order to recognize them);
[0229] wherein the plurality of different types of events comprises face recognition, visual search, gesture identification, semantic segmentation, object detection, lighting detection, simultaneous localization and mapping,  relocalization, or any combination thereof (events therefor include at least the pose detection and object localization needed for LACEY when performing input fusion to determine target object, where poses and intended objects may be learned from user’s previous behavior).
Note also [0182] using a convolutional neural network to estimate pose; using recurrent neural network for smoothing.
Accordingly, it would have been obvious to one having ordinary skill in graphical user interfaces before the effective filling date of the claimed invention, having the teachings of LACEY and RABINOVICH before them, to have combined LACEY (selecting a target virtual object based on input fusion including pose detection which was learned from previous user behavior) and RABINOVICH (a method of training a neural network to predict at least a pose based on previous real environment sensor data), the combination motivated by the suggestion at LACEY [0445] and the teaching in RABINOVICH [0056] it is advantageous if the accuracy of head-tracking is high and that the overall system latency is very low from the first detection of head motion to the updating of the light that is delivered by the display to the user's visual system [0057] can benefit from accurate and low latency head pose detection. [0059] Detecting head pose and orientation of the user, and detecting a physical location of real objects in space enable the AR system to display virtual content in an effective and enjoyable manner.
Regarding dependent claim 15 (25), incorporating the rejection of claim 14 (24), LACEY in view of RABINOVICH, combined at least for the reasons discussed above, further teaches wherein the machine learning program is a recurrent neural network (see at least RABINOVICH [0182], noting further that the hydra neural network taught in RABINOVICH contains [0197] convolutional and recurrent layers that [0199] operate with respect to time points)
Claims 16-17 and 26-27 – canceled.
Regarding dependent claim 19 (29), incorporating the rejection of claim 14 (24), LACEY in view of RABINOVICH, combined at least for the reasons discussed above further teaches wherein the instructions further include instructions to generate a plurality of sets of sensor data of user interactions, each set including data for a respective period of time different than the period of time for each other data set (suggested at LACEY [0445] learn how a particular user picks up or grasps objects, by learning the sequence and timing of sensor convergences ( or divergences) that are particular to that user (see, e.g., the time sequences of head-eye-controller vergences in FIG. 58B; where RABINOVITCH is relied upon to teach the actual training of the neural network as explained in rejection of claim 14 using different data (which may be obtained at different times), in particular consider the sequence of images used for training different hand or head poses).
Regarding claim 18 (28) incorporating the rejection of claim 14 (24), LACEY further teaches identify a second virtual object among the plurality of virtual objects in the virtual environment based on the probability of selection for the second virtual object output by the trained machine learning program also exceeding a threshold value ([0212] objects with confidence scores below a threshold confidence score are eliminated from consideration to improve computational efficiency), and determining the first virtual object as being selected when the probability of selection of the second first virtual object exceeds the probability of selection of the first second virtual object (see e.g. [0198, 0203, 0212, 0264-0265], select based on highest confidence scores). 
Regarding dependent claim 20 (30), incorporating the rejection of claim 14 (24), LACEY further teaches wherein the instructions further include instructions to actuate an input device based on the identified manipulation of the first virtual object ([0266] the wearable system can execute the user interaction based on the subject, command, and the parameter [0376] FIGS. 39A and 39B illustrate examples of user inputs received through controller buttons or input regions on a user input device).
Regarding dependent claim 21 (31), incorporating the rejection of claim 14 (24), LACEY in view of RABINOVICH, combined at least for the reasons discussed above, further teaches wherein the instructions further include instructions to determine the plurality of training data sets of user interactions based on data from a virtual reality headset (LACEY suggests training using previous behavior from wearable system [0445]; RABINOVICH which is relied upon for teaching the training (e.g. for recognizing poses or performing localization of recognized objects using system components of FIG 8 which includes at least head mounted component (58) as well as other components for obtaining sensor information) 
Regarding dependent claim 22 (32), incorporating the rejection of claim 14 (24), LACEY in view of RABINOVICH, combined at least for the reasons discussed above, further teaches wherein the instructions further include instructions to determine the plurality of training data sets of user interactions based on data from an infrared tracking sensor (see e.g. LACEY [0106] such as an infrared LED beacon (used for determining position/orientation; [0108] head mounted wearable component (58) features similar components, as illustrated, in addition to lighting emitters (130) configured to assist the camera (124) detectors, such as infrared emitters (130) for an infrared camera (124); note identical recitations in RABINOVICH [0092] resources coupled to the wall (8) or having known positions and/or orientations relative to the global coordinate system (10) may include … a beacon or reflector (112) configured to emit or reflect a given type of radiation, such as an infrared LED beacon; [0093] head mounted wearable component (58) features similar components, as illustrated, in addition to lighting emitters (130) configured to assist the camera (124) detectors, such as infrared emitters (130) for an infrared camera (124); where LACEY makes clear training is needed and RABINOVICH teaches how such training may be implemented using sensor information).
Regarding dependent claim 23 (33), incorporating the rejection of claim 14 (24), LACEY further teaches wherein the instructions further include instructions to determine the data set of virtual user interactions with the virtual environment based on data from a virtual reality headset (LACEY [0109] head mounted wearable component (58) … utilized to provide a very high level of connectivity, system component integration, and position/orientation tracking).
Regarding claim 34 (35) incorporating the rejection of claim 14 (24), LACEY further teaches wherein the threshold value is 0.8 or 0.9 ([0217] central runtime server 1650 can set a threshold confidence value to be equal to or above 80%).
It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned. They are part of the literature of the art, relevant for all they contain.” In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)). Further, a reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill the art, including nonpreferred embodiments. Merck & Co. v. Biocraft Laboratories, 874 F.2d 804, 10 USPQ2d 1843 (Fed. Cir.), cert. denied, 493 U.S. 975 (1989). See also Upsher-Smith Labs. v. Pamlab, LLC, 412 F.3d 1319, 1323, 75 USPQ2d 1213, 1215 (Fed. Cir. 2005); Celeritas Technologies Ltd. v. Rockwell International Corp., 150 F.3d 1354, 1361, 47 USPQ2d 1516, 1522-23 (Fed. Cir. 1998).


CONCLUSION
The prior art made of record is considered pertinent to applicant’s disclosure and is recorded on Form PTO-892. Applicant is required under 37 C.F.R. § 1.111(c) to consider these references fully when responding to this action.
20180096505 VALDIVIA controls and interface for user interface interactions in virtual spaces (note abstract)
20180046851 KIENZLE select candidate object from plural objects based on score
20200066047 KARALIS determine gesture target based on interactions (real paper)
20200142499 KATZ touch free gesture detection
20180307303 POWDERLY FIGs 8, 11, 23
20210166483 KOZLOSKI predictive virtual reconstruction of physical environments
20180285631 MURRISH using proxy physical objects to represent virtual objects

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  

A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMY M LEVY whose telephone number is 571-270-3771.  The examiner can normally be reached on Mon-Fri 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KIEU VU can be reached on 571-272-4057.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Amy M Levy/Primary Examiner, Art Unit 2173