Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Compact Prosecution 
Claim 21 recites “outputting the procedural workflow data in a knowledge transfer format.”  The “knowledge transfer format” is a very broad term.  If the claim recites a specific type of “knowledge transfer format” that is not clearly associated with augmented or mixed realities, the potential amendment may overcome the existing rejections on the record.  No determination regarding allowability has been made.

Response to Amendment 
This is in response to applicant’s amendment/response filed on 12/03/2020, which has been entered and made of record.  Claims 1 has been amended.  Claim 20 has been cancelled.  Claim 21 has been added.  Claims 1-19 and 21 are pending in the application. 

Applicant’s arguments and amendments filed 12/03/2020 have been entered and considered.  Arguments are moot in view of the Examiner’s new ground of rejections.  However, the Examiner would like to discuss the following issues, in case they are related to the Examiner’s new ground of rejections. 
       (1) Applicant states (Remarks 8):

    PNG
    media_image1.png
    182
    569
    media_image1.png
    Greyscale

	First, Applicant amended the claim and the Examiner has a new ground of rejection to address the amendments. 
Second, Finding discloses the synchronization of data from different sources, stating “The visual inertial navigation (VIN) module 222 enables a wearer or user to view the virtual object layers on a view of a real world environment. An absolute position or relative position of the AR device in space may be tracked using the visual inertial navigation (VIN) module in the AR device. In some embodiments, the VIN module generates a plurality of video frames with at least one camera of the AR device and generates inertial measurement unit (IMU) data with at least one IMU sensor of the AR device. The VIN module tracks features in the plurality of video frames for each camera, synchronizes and aligns the plurality of video frames for each camera with the IMU data. The VIN module then computes a dynamic state of the AR device based on the synchronized plurality of video frames with the IMU data.”  Finding ¶ 49.
Synchronization is based on time.  The time information is stored, including the beginning of a recording.




    PNG
    media_image2.png
    131
    563
    media_image2.png
    Greyscale

	First, Applicant amended the claim and the Examiner has a new ground of rejection to address the amendments.
	Second, the Examiner did not construe “times” to mean “instances.”

(3) Applicant states (Remarks 8):

    PNG
    media_image3.png
    179
    574
    media_image3.png
    Greyscale

	Kohlhoff’s unrelated teachings are not an impediment to the Examiner’s combination.  Applicant did not explain why they are either. 


    PNG
    media_image4.png
    285
    571
    media_image4.png
    Greyscale

	The Examiner disagrees.
	Applicant amended the claim to recite “timestamp.”  The Examiner introduces news references to address the amendments.
	Claim 1 recites “store, in association with one another: the video data, the audio data, the identified relative times, and the augmentations to the video data.”  According to the previously presented rejection analysis, all the data from different sources are recorded in association with, for example, the time point at the beginning of the recording, identified according to kohlhoff. 
	

    PNG
    media_image5.png
    297
    571
    media_image5.png
    Greyscale

	The Examiner disagrees. 
Finding discloses the synchronization of data from different sources, stating “The visual inertial navigation (VIN) module 222 enables a wearer or user to view the virtual object layers on a view of a real world environment. An absolute position or relative position of the AR device in space may be tracked using the visual inertial navigation (VIN) module in the AR device. In some embodiments, the VIN module generates a plurality of video frames with at least one camera of the AR device and generates inertial measurement unit (IMU) data with at least one IMU sensor of the AR device. The VIN module tracks features in the plurality of video frames for each camera, synchronizes and aligns the plurality of video frames for each camera with the IMU data. The VIN module then computes a dynamic state of the AR device based on the synchronized plurality of video frames with the IMU data.”  Finding ¶ 49.
Synchronization is based on time. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1–4, 6–16, and 18–19, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Finding et al. (US 20180253900 A1) in view of Hodge (US 20180276895 A1), Fallon (US 20180004481 A1), and Haligowski et al. (US 20170364843 A1) .
Regarding Claim 1, Finding discloses A method performed by a computer system having at least one processor and a memory (Fig. 11), the method comprising, the computer system: 
receiving time-synchronized session data comprising video data captured by a camera, audio data captured by a microphone within audio proximity of the camera, and motion data captured by an inertial measurement unit physically fixed relative to the camera (
Finding discloses the use of camera and inertial measurement unit, stating “A device has an optical sensor, an inertial sensor, and a hardware processor. The optical sensor generates image data. The inertial sensor generates inertia data. The hardware processor receives an 
Finding discloses the use of microphone, stating “The sensors 202 may include, for example, and without limitation. a proximity or location sensor (e.g., near field communication, GPS, Bluetooth, Wi-Fi), an optical sensor (e.g., a camera), an orientation or inertia sensor (e.g., a gyroscope, accelerometer, inertial measurement unit (IMU)), an audio sensor (e.g., a microphone), depth sensors, such as, infrared (IR) camera and IR projector, thermal sensor or any suitable combination thereof.”  Finding ¶ 44.

    PNG
    media_image6.png
    380
    439
    media_image6.png
    Greyscale
, showing that the sensors, which includes camera, inertial measurement unit, and microphone, may be integrated into an AR device 106. 
100 includes one or more AR devices 106 such as, and without limitation, a head mounted display, wearable display devices, or a mobile phone.”  Finding ¶ 30.
Finding does not explicitly disclose that the inertial measurement unit is “physically fixed relative to the camera.”  It is expected to be so in the HMD, wearable display device, or mobile phone.  The Examiner takes an Official Notice that an inertial measurement unit may be physically fixed relative to a camera in an HMD, wearable display device, or mobile phone.  The benefits of combining this well-known knowledge would have been that a device would have been reliably constructed.  Applicant does not traverse the examiner’s assertion of official notice or applicant’s traverse is not adequate, the common knowledge or well-known in the art statement is taken to be admitted prior art because applicant either failed to traverse the examiner’s assertion of official notice or that the traverse was inadequate. 
Finding teaches or suggests the captured image data and motion data are time-synchronized, stating “The hardware processor . . . generates media content using the image data, and receives a selection of spatial coordinates within a three-dimensional region using the inertia data and the image data.”  Finding Abstract.  
Finding discloses the synchronization of the captured image data and motion, stating “The visual inertial navigation (VIN) module 222 enables a wearer or user to view the virtual object layers on a view of a real world environment. An absolute position or relative position of the AR device in space may be tracked using the visual inertial navigation (VIN) module in the AR device. In some embodiments, the VIN module generates a plurality of video frames with at inertial measurement unit (IMU) data with at least one IMU sensor of the AR device. The VIN module tracks features in the plurality of video frames for each camera, synchronizes and aligns the plurality of video frames for each camera with the IMU data. The VIN module then computes a dynamic state of the AR device based on the synchronized plurality of video frames with the IMU data.”  Finding ¶ 49.
Finding discloses, together with image data, audio data is also recorded, stating “The recorded content dataset 616 includes, for example, media content, audio recording, recorded images of virtual objects, notes, and corresponding 3D coordinates.”  Finding ¶ 80.
Finding does not explicitly disclose that the recorded image data and the audio data are synchronized.  It is expected to be so.  The Examiner takes an Official Notice that image data and their associated audio data may be synchronized.  The benefits of combining this well-known knowledge would have been that the image data and audio data provide context to each other.  Applicant does not traverse the examiner’s assertion of official notice or applicant’s traverse is not adequate, the common knowledge or well-known in the art statement is taken to be admitted prior art because applicant either failed to traverse the examiner’s assertion of official notice or that the traverse was inadequate. ), 
wherein the time-synchronized session data relate to a 

    PNG
    media_image7.png
    677
    257
    media_image7.png
    Greyscale
, according to steps 707 and 708, the recorded content is used to generate AR instructions.                                                                                        
Finding discloses examples of AR instructions which may show a user physically performing a procedure, stating “In another example, a factory worker may install a new production line and may want to share shut down and/or start up information with other workers. The process may involve several instructions at different points along the line. . . . In other words, the relevant information may be displayed at a predefined location within the plant. In another example, an experienced boiler engineer may want to teach his less experienced team  ¶ 24. 
Finding discloses further examples of AR instructions which may show a user physically performing a procedure, stating “[t]he present application describes an AR device than enables a user of the AR device to generate virtual content by recording a video of the user fixing a machine. For example, the video may show how to change a filter of a machine. Other virtual content may include, for example, and without limitation, video, images, thermal data, biometric data, user and application input, graphics, audio, annotations, AR manipulations, 3D objects, graphics animations, or substantially any other display render-able data.”  Finding ¶ 25.), and 
wherein the audio data comprise spoken words of the user (
Finding discloses recording audio comments, stating “In accordance with another embodiment, the server 112 receives media content from the AR device 106 and generates annotations (e.g., audio/video comments) on the media content. Each audio/video comment is associated with a particular location in space. The server 112 stores the audio/video comments and spatial location for the corresponding portions of the media content.”  Finding ¶ 36.  Audio comments comprise spoken words of a user.); 

processing a set of data comprising the video data and the motion data to, for each of one or more objects within a field of view of the camera  (
The optical sensor generates image data. The inertial sensor generates inertia data. The hardware processor receives an augmented reality (AR) authoring template authored at a client device, generates media content using the image data, and receives a selection of spatial coordinates within a three-dimensional region using the inertia data and the image data.”  Finding Abstract.
Finding discloses mechanism to identify objects, stating “In one example, the AR authoring template includes a form or a table identifying specific machines or tools, corresponding entries for media content to be recorded, corresponding entries for specific locations in space (e.g., three-dimensional coordinates), corresponding entries for work instructions or notes.”  Finding ¶ 31.
Finding discloses general technical implementation of the feature in question, stating “The physical object may include a visual reference (e.g., an identifiable visual feature) that the augmented reality application can identify. . . . Other AR applications allow a user to experience visualization of the additional information overlaid on top of a view or an image of any object in the real physical world.”  Finding ¶ 19.): 
identify the object (
Finding discloses mechanism to identify objects, stating “In one example, the AR authoring template includes a form or a table identifying specific machines or tools, corresponding entries for media content to be recorded, corresponding entries for specific  ¶ 31.), 
track the object over time, using the motion data, to determine a time-based series of locations of the object within the field of view of and relative to the camera  (
Finding teaches or suggests tracking the object, e.g., a physical table, over time, stating “[w]hen the user moves, the inertial position of the AR device 106 is tracked and the display of the AR content is adjusted based on the new inertial position. For example, the user may view a virtual object visually perceived to be on a physical table. The position, location, and display of the virtual object is updated in the display 106 as the user moves around the physical table (e.g., away from, closer to, around).”  Finding ¶ 48.), and 
augment the video data by overlaying added visual content over video frames containing the object, such that the added visual content tracks motion of the object within the video frames over time (
Finding teaches or suggests adding visual content that tracks motion of the object within the video frames over time, e.g., a virtual object on a physical table, stating “[w]hen the user moves, the inertial position of the AR device 106 is tracked and the display of the AR content is adjusted based on the new inertial position. For example, the user may view a virtual object visually perceived to be on a physical table. The position, location, and display of the virtual object is updated in the display 106 as the user moves around the physical table (e.g., away from, closer to, around).”  Finding ¶ 48.  
a three-dimensional virtual object overlaid on an image of a physical object captured by a camera of a display device (e.g., mobile computing device, wearable computing device such as a head mounted device). The physical object may include a visual reference (e.g., an identifiable visual feature) that the augmented reality application can identify. A visualization of the additional information, such as the three-dimensional virtual object overlaid or engaged with an image of the physical object, is generated in a display of the AR device. The three-dimensional virtual object may selected based on the recognized visual reference, a captured image of the physical object, or a location, position, or orientation of the display device. A rendering of the visualization of the three-dimensional virtual object may be based on a position of the display relative to the visual reference. Other AR applications allow a user to experience visualization of the additional information overlaid on top of a view or an image of any object in the real physical world. The virtual object may include a three-dimensional virtual object or a two-dimensional virtual object. For example, the three-dimensional virtual object may include a three-dimensional view of a machine. The two-dimensional virtual object may include a two-dimensional view of a dialog box, a menu, or written information such as statistics information for a factory tool.”  Finding ¶ 19.); and 
store, in association with one another: the video data, the audio data, 
Finding discloses the integration of video data, audio data, and the augmentations to the video data, stating “The recorded content dataset 616 includes, for example, media content,  ¶ 80.
Finding teaches or suggests the integrated video data, audio data, and the augmentations to the video data are organized according to instruction steps, stating “a plant operator may want to train employees to complete a 20-point daily inspection. The plant operator, using an AR device, may place inspection points throughout the plant to illustrate what the employees are to do. The plant operator can record steps of instructions at one or more locations which particular steps should be performed in the plant and then share the recorded steps of instructions as virtual content with the employees. The employees can use their own AR devices to view the recorded steps of instructions to potentially learn how to perform the same inspection.”  Finding ¶ 23.
Finding teaches or suggests the provided instructional assistance is organized according to “specific time” for a task, stating “AR devices can be used to provide enhanced assistance (e.g., technical support) to other users via human interaction, and customized data generated for the specific time and issue where assistance is needed.”  Finding ¶ 3.). 
Finding does not explicitly disclose
the session is continuously recorded,
processing the audio data to identify timestamps during the session at which one or more of the plurality of steps commences, or
store . . . the identified timestamps.  


the session is continuously recorded (
Hodge discloses “Moreover, all augmented reality sessions can be recorded and stored by monitoring center 140.”  Hodge ¶ 68.
“Monitoring center 140 can then utilize this information by recording the augmented reality session for later review and/or monitor the actions of users within the augmented reality communication system 100.”  Hodge ¶ 35. ). 
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine Finding with Hodge.  The suggestion/motivation would have been in order to monitor for safety reasons, to review, and/or to edit.
However, Finding in view of Hodge does not explicitly disclose processing the audio data to identify timestamps during the session at which one or more of the plurality of steps commences;
store . . . the identified timestamps.    
Fallon discloses processing the audio data to identify timestamps during the session at which one or more tasks are completed 
“For example, a mixed reality device, such as a headset or other such device can perform various operations in response to a voice command or other such input.”  Fallon Abstract.
invoking a note taking application to dictate notes relative to a specific timestamp within the content duration of the mixed environment, communicating with users of the mixed environment, and controlling other aspects of the mixed environment using voice commands and/or gestures. In accordance with various embodiments, a user can invoke a note taking application using voice commands, air gesture-based commands, and/or using a handset or hand-held controller to invoke the note taking app.”  Fallon  ¶ 23. );
store . . . the identified timestamps (Id.).    
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine Finding in view of Hodge with Fallon.  The suggestion/motivation would have been in order to synchronize data from multiple sources.
However, Finding in view of Hodge and Fallon does not explicitly disclose the one or more tasks completed including recording timestamps of one or more of the plurality of steps that commences. 
Haligowski discloses one or more tasks completed including recording timestamps of one or more of the plurality of steps that commences (
“As the exemplary workflow is invoked, the various invocations of the workflow can be referred to as instances of the workflow. Step data (e.g., the step data 108) can be collected for each step 1-5 of each instance of the workflow. That is, start times, stop times, status indications, identification information, tags, etc. can be received and associated with each step every time the exemplary workflow is performed.”  ¶ 31. 
Fallon already teaches the use of timestamp.);
store . . . the identified timestamps (Id.).    
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine Finding in view of Hodge and Fallon with Haligowski.  The suggestion/motivation would have been in order to track the time needed to complete each step of a task and/or keep detailed records of events.

Regarding Claim 2, Finding in view of Hodge, Fallon, and Haligowski discloses The method of claim 1, further comprising, 
substantially contemporaneously with capturing of the video data by the camera, for each of the one or more objects, displaying the added visual content as a virtual image in spatial relation to and tracking with the object in the user's visual field of view (
Finding states “[w]hen the user moves, the inertial position of the AR device 106 is tracked and the display of the AR content is adjusted based on the new inertial position. For example, the user may view a virtual object visually perceived to be on a physical table. The position, location, and display of the virtual object is updated in the display 106 as the user moves around the physical table (e.g., away from, closer to, around).”  Finding ¶ 48). 

Regarding Claim 3, Finding in view of Hodge, Fallon, and Haligowski discloses The method of claim 2, wherein the virtual image is projected into the user's field of view through a mixed reality headset (
100 includes one or more AR devices 106 such as, and without limitation, a head mounted display, wearable display devices, or a mobile phone.”  Finding ¶ 30.
Finding discloses displaying mixed reality, stating “For example, the user may view a virtual object visually perceived to be on a physical table. The position, location, and display of the virtual object is updated in the display 106 as the user moves around the physical table (e.g., away from, closer to, around).”  Finding ¶ 48.). 

Regarding Claim 4, Finding in view of Hodge, Fallon, and Haligowski discloses The method of claim 3, wherein the virtual image is projected into the user's field of view through smart glasses (
Finding discloses the AR device may be an HMD, stating “The network environment 100 includes one or more AR devices 106 such as, and without limitation, a head mounted display, wearable display devices, or a mobile phone.”  Finding ¶ 30.
Finding discloses the HMD may be glasses, stating “In another example, the display of the AR device 106 may be at least transparent, such as lenses of computing glasses.”  Finding ¶ 32.). 

The method of claim 1, further comprising displaying the augmented video data on a display to the user substantially contemporaneously with capturing of the video data by the camera (
Finding discloses displaying mixed reality, stating “For example, the user may view a virtual object visually perceived to be on a physical table. The position, location, and display of the virtual object is updated in the display 106 as the user moves around the physical table (e.g., away from, closer to, around).”  Finding ¶ 48.). 

Regarding Claim 7, Finding in view of Hodge, Fallon, and Haligowski discloses The method of claim 6, wherein the display is of a virtual reality headset (
Finding discloses the AR device may be an HMD, stating “The network environment 100 includes one or more AR devices 106 such as, and without limitation, a head mounted display, wearable display devices, or a mobile phone.”  Finding ¶ 30.
Finding discloses displaying virtual objects, stating “For example, the user may view a virtual object visually perceived to be on a physical table. The position, location, and display of the virtual object is updated in the display 106 as the user moves around the physical table (e.g., away from, closer to, around).”  Finding ¶ 48.  A reality that comprises virtual objects is a virtual reality. ). 

The method of claim 6, wherein the display and the camera are of a mobile phone (Finding discloses the AR device may be a mobile phone, stating “The network environment 100 includes one or more AR devices 106 such as, and without limitation, a head mounted display, wearable display devices, or a mobile phone.”  Finding ¶ 30.). 

Regarding Claim 9, Finding in view of Hodge, Fallon, and Haligowski discloses The method of claim 1, 
wherein the synchronized session data further comprises spatial data relating to locations of at least one of the plurality of objects within the field of view of the camera, and wherein the set of data comprises the spatial data (
Finding discloses the use of camera and inertial measurement unit, stating “The hardware processor receives an augmented reality (AR) authoring template authored at a client device, generates media content using the image data, and receives a selection of spatial coordinates within a three-dimensional region using the inertia data and the image data.”  Finding Abstract.  
Finding discloses the use of spatial data, stating “A sixth example provides the AR device 106 of the first embodiment, wherein the selected spatial coordinates are defined relative to a location 118, 120, 122 of a physical object 102, 114, 116 at the geographic  ¶ 132.  
    PNG
    media_image8.png
    673
    452
    media_image8.png
    Greyscale

“The server 112 stores the audio/video comments and spatial location for the corresponding portions of the media content. In another example, the client device 108 generates ). 

Regarding Claim 10, Finding in view of Hodge, Fallon, and Haligowski discloses The method of claim 9, wherein the spatial data is generated by one or more spatial sensors (
Finding discloses the use of camera and inertial measurement unit, stating “A device has an optical sensor, an inertial sensor, and a hardware processor. The optical sensor generates image data. The inertial sensor generates inertia data. The hardware processor receives an augmented reality (AR) authoring template authored at a client device, generates media content using the image data, and receives a selection of spatial coordinates within a three-dimensional region using the inertia data and the image data.”  Finding Abstract.
Finding discloses the use of other sensors, stating “[i]n some example embodiments, the AR device 106 may offload some processes (e.g., tracking and rendering of virtual objects to be displayed in the AR device 106) using the tracking sensors and computing resources of the server 112. The tracking sensors may be used to track the location and orientation of the AR device 106 externally without having to rely on the sensors internal to the AR device 106. The tracking sensors may be used additively or as a failsafe/redundancy or for fine tuning. The tracking sensors may include optical sensors (e.g., depth-enabled 3D IR cameras), wireless sensors (e.g., Bluetooth, WiFi), GPS sensors, biometric sensors, and audio sensors to determine the location of the user 105 with the AR device 106, distances between the user 105 and the tracking sensors in the physical environment (e.g., sensors placed in corners of a venue or a ). 

Regarding Claim 11, Finding in view of Hodge, Fallon, and Haligowski discloses The method of claim 10, wherein the spatial sensors are selected from the group consisting of: 3D depth sensors, camera sensors, time-of-flight infrared sensors, structured infrared light sensors, stereoscopic cameras, and ultrasonic sensors (
Finding discloses the use of camera and inertial measurement unit, stating “A device has an optical sensor, an inertial sensor, and a hardware processor. The optical sensor generates image data. The inertial sensor generates inertia data. The hardware processor receives an augmented reality (AR) authoring template authored at a client device, generates media content using the image data, and receives a selection of spatial coordinates within a three-dimensional region using the inertia data and the image data.”  Finding Abstract.
Finding discloses the use of other sensors, stating “[i]n some example embodiments, the AR device 106 may offload some processes (e.g., tracking and rendering of virtual objects to be displayed in the AR device 106) using the tracking sensors and computing resources of the server 112. The tracking sensors may be used to track the location and orientation of the AR device 106 externally without having to rely on the sensors internal to the AR device 106. The tracking sensors may be used additively or as a failsafe/redundancy or for fine tuning. The tracking sensors may include optical sensors (e.g., depth-enabled 3D IR cameras), wireless sensors (e.g., Bluetooth, WiFi), GPS sensors, biometric sensors, and audio sensors to determine ). 

Regarding Claim 12, Finding in view of Hodge, Fallon, and Haligowski discloses The method of claim 1, 
wherein the camera and the inertial measurement unit are incorporated into a device worn by the user (
Finding discloses the use of camera and inertial measurement unit, stating “The sensors 202 may include, for example, and without limitation. a proximity or location sensor (e.g., near field communication, GPS, Bluetooth, Wi-Fi), an optical sensor (e.g., a camera), an orientation or inertia sensor (e.g., a gyroscope, accelerometer, inertial measurement unit (IMU)), an audio sensor (e.g., a microphone), depth sensors, such as, infrared (IR) camera and IR projector, thermal sensor or any suitable combination thereof.”  Finding ¶ 44.

    PNG
    media_image6.png
    380
    439
    media_image6.png
    Greyscale
, showing that the sensors, which includes camera, inertial measurement unit, and microphone, are integrated into an AR device 106.), 
the method further comprising: 
processing the set of data comprising the video data and the motion data to determine a time-based series of spatial dispositions of the device (
Finding discloses the use of camera and inertial measurement unit to determine spatial disposition of the device, stating “A device has an optical sensor, an inertial sensor, and a hardware processor. The optical sensor generates image data. The inertial sensor generates inertia data. The hardware processor receives an augmented reality (AR) authoring template authored at a client device, generates media content using the image data, and receives a selection of spatial coordinates within a three-dimensional region using the inertia data and the image data.”  Finding Abstract.  
106 is tracked and the display of the AR content is adjusted based on the new inertial position. For example, the user may view a virtual object visually perceived to be on a physical table. The position, location, and display of the virtual object is updated in the display 106 as the user moves around the physical table (e.g., away from, closer to, around).”  Finding ¶ 48.), and 
processing the time-based series of spatial dispositions of the device to identify relative times during the session at which one or more of the plurality of steps commences (
Finding teaches or suggests tracking the device’s positions over time, stating “[w]hen the user moves, the inertial position of the AR device 106 is tracked and the display of the AR content is adjusted based on the new inertial position. For example, the user may view a virtual object visually perceived to be on a physical table. The position, location, and display of the virtual object is updated in the display 106 as the user moves around the physical table (e.g., away from, closer to, around).”  Finding ¶ 48.
Finding discloses examples of AR instructions which may show a user physically performing a procedure, stating “In another example, a factory worker may install a new production line and may want to share shut down and/or start up information with other workers. The process may involve several instructions at different points along the line. . . . In other words, the relevant information may be displayed at a predefined location within the plant. In another example, an experienced boiler engineer may want to teach his less experienced team member how to service an old boiler because minimal documentation is currently available.”  Finding ¶ 24. 
 ¶ 25.). 

Regarding Claim 13, Finding in view of Hodge, Fallon, and Haligowski discloses The method of claim 12, wherein the spatial dispositions comprise relative locations of the device within an environment (
Finding discloses the use of camera and inertial measurement unit, stating “A device has an optical sensor, an inertial sensor, and a hardware processor. The optical sensor generates image data. The inertial sensor generates inertia data. The hardware processor receives an augmented reality (AR) authoring template authored at a client device, generates media content using the image data, and receives a selection of spatial coordinates within a three-dimensional region using the inertia data and the image data.”  Finding Abstract.  The optical sensor is mapped to the camera, and the inertial sensor is mapped to the inertial measurement unit. 
Finding discloses the use of other sensors, stating “[i]n some example embodiments, the AR device 106 may offload some processes (e.g., tracking and rendering of virtual objects to be displayed in the AR device 106) using the tracking sensors and computing resources of the 
Finding states “In many embodiments a location may be for example, and without limitation, a location of a real world physical object, a position or orientation within an environmental space, an angle relative to a user of an AR device, or any combination thereof.”    Finding ¶ 21.
The determined locations are relative locations of within an environment.). 

Regarding Claim 14, Finding in view of Hodge, Fallon, and Haligowski discloses The method of claim 12, wherein the spatial dispositions comprise geolocations of the device (
Finding discloses the device’s location may be determined by sensors including GPS, stating “In some example embodiments, the AR device 106 may offload some processes (e.g., tracking and rendering of virtual objects to be displayed in the AR device 106) using the tracking sensors and computing resources of the server 112. The tracking sensors may be used to track the location and orientation of the AR device 106 externally without having to rely on the sensors internal to the AR device 106. The tracking sensors may be used additively or as a failsafe/redundancy or for fine tuning. The tracking sensors may include optical sensors (e.g., depth-enabled 3D IR cameras), wireless sensors (e.g., Bluetooth, WiFi), GPS sensors, biometric sensors, and audio sensors to determine the location of the user 105 with the AR device 106, distances between the user 105 and the tracking sensors in the physical environment (e.g., sensors placed in corners of a venue or a room), or the orientation of the AR device 106 to track what the user 105 is looking at (e.g., direction at which the AR device 106 is pointed).”  Finding ¶ 38.
Such determined location is a geolocation.).
 
Regarding Claim 15, Finding in view of Hodge, Fallon, and Haligowski discloses The method of claim 12, wherein the spatial dispositions comprise orientations of the device (
Finding discloses that spatial dispositions comprise orientations of the device, stating “The AR device 106 determines a location 118, orientation, and position of the AR device 106 within the predefined region 104 using a combination of inertia data, wireless data from fixed frame of references, and image data.”  Finding ¶ 38.). 

Regarding Claim 16, Finding in view of Hodge, Fallon, and Haligowski discloses The method of claim 1, 
wherein processing the audio data to identify relative times during the session at which each of the plurality of steps commences comprises: 
causing a textual transcription of the spoken words to be generated based on the audio data; and processing the textual transcription to identify trigger words indicative of commencement of a step (
“The notes can be captured through speech, where the user dictates the notes they would like transcribed, or through gesture-based input, where the notes are inputted into the note application using air gesture-based approaches.”  Fallon ¶ 23. 
“The speech processing service 220 may include an automatic speech recognition (ASR) module 222 that performs automatic speech recognition on audio data regarding user utterances, a natural language understanding (NLU) module 228 that performs natural language understanding on transcriptions generated by the ASR module 222, a context interpreter 224 that applies contextual rules to current NLU results based on prior interpretations and dialog acts, a natural language generation ("NLG") module that converts certain dialog acts into user-understandable communications (e.g., text that can be "read" to the user by a text-to-speech 226 or "TTS" component), among other such modules.”  Fallon ¶ 29.
The Examiner takes an Official Notice that it would have been well-known in the art that a textual transcription of a user’s speech may be used to recognize voice commands.  The benefits of combining this well-known knowledge would have been that a flexible and intelligent system may have been created, because it would have been easier for a user or programmer to change/add trigger words.
).  
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine Finding in view of Hodge with Fallon.  The suggestion/motivation would have been in order to provide assistance to people with disabilities, to provide users with additional means of communications, and/or to keep simplified record of the completed task. 

Regarding Claim 18, Finding in view of Hodge, Fallon, and Haligowski discloses The method of claim 1, 
wherein the time-synchronized session data further comprises sensor data captured by a sensor associated with the each of the plurality of objects (
Finding discloses , stating “AR devices allow a user to experience information, such as in the form of a three-dimensional virtual object overlaid on an image of a physical object captured by a camera of a display device (e.g., mobile computing device, wearable computing device such as a head mounted device). The physical object may include a visual reference (e.g., an identifiable visual feature) that the augmented reality application can identify. A visualization of the additional information, such as the three-dimensional virtual object overlaid or engaged with an image of the physical object, is generated in a display of the AR device. The three-. A rendering of the visualization of the three-dimensional virtual object may be based on a position of the display relative to the visual reference. Other AR applications allow a user to experience visualization of the additional information overlaid on top of a view or an image of any object in the real physical world. The virtual object may include a three-dimensional virtual object or a two-dimensional virtual object. For example, the three-dimensional virtual object may include a three-dimensional view of a machine. The two-dimensional virtual object may include a two-dimensional view of a dialog box, a menu, or written information such as statistics information for a factory tool.”  Finding ¶ 19.
Finding discloses the use of spatial data, stating “A sixth example provides the AR device 106 of the first embodiment, wherein the selected spatial coordinates are defined relative to a location 118, 120, 122 of a physical object 102, 114, 116 at the geographic  ¶ 132.  
    PNG
    media_image8.png
    673
    452
    media_image8.png
    Greyscale
), and 
wherein the overlaid video content for the each object is based on the sensor data (Finding states “Other AR applications allow a user to experience visualization of the additional information overlaid on top of a view or an image of any object in the real physical world. The virtual object may include a three-dimensional virtual object or a two-dimensional virtual object.”  Finding ¶ 19.). 

Regarding Claim 19, Finding in view of Hodge, Fallon, and Haligowski discloses A method comprising: 
capturing, by a hands-free head-mounted device worn by a user, time-synchronized session data (See Claim 1 rejection for detailed analysis.  Finding discloses the AR device may be an HMD, wearable display device, or mobile phone, stating “The network environment 100 includes one or more AR devices 106 such as, and without limitation, a head mounted display, wearable display devices, or a mobile phone.”  Finding ¶ 30.) comprising: 
video data captured by a camera, audio data captured by a microphone within audio proximity of the camera (See Claim 1 rejection for detailed analysis.); 
spatial data capture by one or more spatial sensors (See Claim 1 rejection for detailed analysis.), and 
motion data captured by an inertial measurement unit physically fixed relative to the camera (See Claim 1 rejection for detailed analysis.), 
wherein the time-synchronized session data relate to a session during which the user physically performs a procedure having a plurality of steps (See Claim 1 rejection for detailed analysis.), and 
wherein the audio data comprise spoken words of the user (See Claim 1 rejection for detailed analysis.); 
substantially contemporaneously with the capturing of the time-synchronized session data, processing the time-synchronized session data by a processor to identify relative times during the session at which one or more of the plurality of steps commences (
See Claim 1 rejection for detailed analysis.); and 
for each of the one or more of the plurality of steps, displaying on a display visible to the user of the hands-free head-mounted device, an indication of the each step while the each step is being performed (
Finding discloses examples of AR instructions which may show a user physically performing a procedure, stating “In another example, a factory worker may install a new production line and may want to share shut down and/or start up information with other workers. The process may involve several instructions at different points along the line. . . . In other words, the relevant information may be displayed at a predefined location within the plant. In another example, an experienced boiler engineer may want to teach his less experienced team member how to service an old boiler because minimal documentation is currently available.”  Finding ¶ 24. 
Finding discloses further examples of AR instructions which may show a user physically performing a procedure, stating “[t]he present application describes an AR device than enables a user of the AR device to generate virtual content by recording a video of the user fixing a machine. For example, the video may show how to change a filter of a machine. Other virtual content may include, for example, and without limitation, video, images, thermal data, biometric data, user and application input, graphics, audio, annotations, AR manipulations, 3D objects, graphics animations, or substantially any other display render-able data.”  Finding ¶ 25.
Finding states “a plant operator may want to train employees to complete a 20-point daily inspection. The plant operator, using an AR device, may place inspection points throughout the plant to illustrate what the employees are to do. The plant operator can record steps of instructions at one or more locations which particular steps should be performed in the plant and The employees can use their own AR devices to view the recorded steps of instructions to potentially learn how to perform the same inspection.”  Finding ¶ 23.). 

Regarding Claim 21, Finding in view of Hodge, Fallon, and Haligowski discloses The method of claim 1, further comprising: 
processing the time-synchronized session data to identify the plurality of steps (
Finding discloses the synchronization of the captured image data and motion, stating “The visual inertial navigation (VIN) module 222 enables a wearer or user to view the virtual object layers on a view of a real world environment. An absolute position or relative position of the AR device in space may be tracked using the visual inertial navigation (VIN) module in the AR device. In some embodiments, the VIN module generates a plurality of video frames with at least one camera of the AR device and generates inertial measurement unit (IMU) data with at least one IMU sensor of the AR device. The VIN module tracks features in the plurality of video frames for each camera, synchronizes and aligns the plurality of video frames for each camera with the IMU data. The VIN module then computes a dynamic state of the AR device based on the synchronized plurality of video frames with the IMU data.”  Finding ¶ 49.
Finding discloses, together with image data, audio data is also recorded, stating “The recorded content dataset 616 includes, for example, media content, audio recording, recorded images of virtual objects, notes, and corresponding 3D coordinates.”  Finding ¶ 80.

“As the exemplary workflow is invoked, the various invocations of the workflow can be referred to as instances of the workflow. Step data (e.g., the step data 108) can be collected for each step 1-5 of each instance of the workflow. That is, start times, stop times, status indications, identification information, tags, etc. can be received and associated with each step every time the exemplary workflow is performed.”  Haligowski ¶ 31. ); 
generating procedural workflow data including step data for the plurality of steps (
“In one example, a plant operator may want to train employees to complete a 20-point daily inspection. The plant operator, using an AR device, may place inspection points throughout the plant to illustrate what the employees are to do. The plant operator can record steps of instructions at one or more locations which particular steps should be performed in the plant and then share the recorded steps of instructions as virtual content with the employees. The employees can use their own AR devices to view the recorded steps of instructions to potentially learn how to perform the same inspection.”  Finding ¶ 23.

“As the exemplary workflow is invoked, the various invocations of the workflow can be referred to as instances of the workflow. Step data (e.g., the step data 108) can be collected for each step 1-5 of each instance of the workflow. That is, start times, stop times, status indications, identification information, tags, etc. can be received and associated with each step every time the exemplary workflow is performed.”  Haligowski ¶ 31.); and 
outputting the procedural workflow data in a knowledge transfer format (
The term “knowledge transfer format” is an interesting one. The plain meaning of it is that it is a format that allows knowledge to be transferred.  Because the workflow information taught by Finding in view of Hodge, Fallon, and Haligowski is conveyed to other users.  Therefore, the limitation is taught. 
Finding teaches communicating workflow information to other users, stating “a plant operator may want to train employees to complete a 20-point daily inspection. The plant operator, using an AR device, may place inspection points throughout the plant to illustrate what the employees are to do. The plant operator can record steps of instructions at one or more locations which particular steps should be performed in the plant and then share the recorded steps of instructions as virtual content with the employees. The employees can use their own AR devices to view the recorded steps of instructions to potentially learn how to perform the same inspection.”  Finding ¶ 23.
“As used herein, a “database” is a data storage resource and may store data structured as a text file, a table, a spreadsheet, a relational database (e.g., an object-relational database), a triple store, a hierarchical data store, or any suitable combination thereof.”  Finding ¶ 41.
“In an operation 835, an AR reenactment module of the AR device records the virtual content, type of display (e.g. dynamic perspective and/or static perspective), manipulations to the virtual content, the dynamic states during the display, and user identification information to generate an experience file.”  Finding ¶ 88.).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine Finding with Hodge.  The suggestion/motivation would have been in order to monitor for safety reasons, to review, and/or to edit.
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine Finding in view of Hodge with Fallon.  The suggestion/motivation would have been in order to synchronize data from multiple sources.
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine Finding in view of Hodge and Fallon with Haligowski.  The suggestion/motivation would have been in order to track the time needed to complete each step of a task and/or keep detailed records of events.

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Finding et al. (US 20180253900 A1) in view of Hodge (US 20180276895 A1), Fallon (US 20180004481 A1), Haligowski et al. (US 20170364843 A1) .and Ries (US 20170371163 A1).
Regarding Claim 5, Finding in view of Hodge, Fallon, and Haligowski discloses The method of claim 3, wherein the virtual image is displayed on an HMD 
Finding discloses the AR device may be an HMD, stating “The network environment 100 includes one or more AR devices 106 such as, and without limitation, a head mounted display, wearable display devices, or a mobile phone.”  Finding ¶ 30.
Finding discloses the HMD may be glasses, stating “In another example, the display of the AR device 106 may be at least transparent, such as lenses of computing glasses.”  Finding ¶ 32.). 
However, Finding in view of Hodge, Fallon, and Haligowski does not explicitly disclose the virtual image is reflected from a display through a partially reflective lens disposed within the field of view of the user.
Ries discloses the virtual image is reflected from a display through a partially reflective lens disposed within the field of view of the user (
Ries states “An augmented reality display device (such as a head mounted device) includes a partially transparent and partially reflective lens, a laser light source, a radio frequency source, a display controller, an acousto-optical modulator, and a microelectromechanical (MEMS) device.”  Ries Abstract.).
Hodge, Fallon, and Haligowski with Ries.  The suggestion/motivation would have been in order to create mixed or augmented realty images.  It would also have been a simple substitution of one known element for another (KSR) that produces predictable results.  The HMD taught by Finding in view of Hodge, Fallon, and Haligowski  is one kind of known HMD.  Ries taught another known HMD.  All these HMD produce mixed or augmented reality images.  Therefore, the substitution produces predictable results. 

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Finding et al. (US 20180253900 A1) in view of  Hodge (US 20180276895 A1), Fallon (US 20180004481 A1), Haligowski et al. (US 20170364843 A1) and Patrick et al. (US 8836222 B1).
Regarding Claim 17, Finding in view of Hodge, Fallon, and Haligowski  discloses The method of claim 1. 
However, Finding in view of Hodge, Fallon, and Haligowski does not explicitly disclose 
 wherein identify the object is performed by at least: 
causing a textual transcription of the spoken words to be generated based on the audio data; 
processing the textual transcription to identify names of objects; 
referencing physical description data for the identified names of objects using a database that relates object names to physical description data for named objects; and 
processing video frames to identify objects within the video frames based on the physical description data for objects named in the textual transcription within temporal proximity to the video frames.
Patrick discloses 
wherein identify the object is performed by at least: 
causing a textual transcription of the spoken words to be generated based on the audio data (
Patrick recites “Image data of the physical space surrounding a user and audio data including a user voice command is received, 452. The audio data is analyzed to identify a user voice command and to determine if the voice command describes a projection area, 454 (e.g., ‘project light around me’, thus describing a projection area around the user’s current location). If the voice command describes a projection area, then the projection area is determined based on the voice command and the image data, 460. Otherwise, the audio data is analyzed to determine if the audio command identifies an object, 456. A user voice command may simply identify the object (e.g., “project light onto the table”) or may comprise a request for an object to be located and have light projected onto it (e.g., “find my keys”); if the object can be identified from the image data, 458, the image data is analyzed to locate the object and determine the projection area, 460.”  Patrick col. 6 lines 5-20.
Patrick recites “Audio processing module 215 may process audio data to analyze user voice requests for light as it relates to the physical space around the user.”  Patrick col. 3 lines 57-59.
 ); 
processing the textual transcription to identify names of objects (
Patrick recites “Otherwise, the audio data is analyzed to determine if the audio command identifies an object, 456. A user voice command may simply identify the object (e.g., ‘project light onto the table’) or may comprise a request for an object to be located and have light projected onto it (e.g., ‘find my keys’).”  Patrick col. 6 lines 5-20.); 
referencing physical description data for the identified names of objects using a database that relates object names to physical description data for named objects (
Patrick recites “Search engine 212 may execute a search within database 220 (which may store object images or object information such as barcode information) or execute a web-based search for an object identified in a user command. For example, a user may issue the command: ‘Find my phone;’ search engine 212 may search for images of the user's phone previously stored ”  Patrick col. 4 lines 9–15.
Finding discloses general technical implementation of the feature in question, stating “The physical object may include a visual reference (e.g., an identifiable visual feature) that the augmented reality application can identify.”  Finding ¶ 19.); and 
processing video frames to identify objects within the video frames based on the physical description data for objects named in the textual transcription within temporal proximity to the video frames (Id.).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine Finding in view of Hodge, Fallon, and Haligowski with Patrick.  The suggestion/motivation would have been in order to make it more convenient for a user to interact with a computing system.  Some may prefer voice commands.

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Finding et al. (US 20180253900 A1) in view of Kohlhoff (US 20190266804 A1).
Regarding Claim 19, Finding discloses A method comprising: 
capturing, by a hands-free head-mounted device worn by a user, time-synchronized session data (
Finding discloses the AR device may be an HMD, wearable display device, or mobile phone, stating “The network environment 100 includes one or more AR devices 106 such as, and without limitation, a head mounted display, wearable display devices, or a mobile phone.”  Finding ¶ 30.
Finding teaches or suggests the captured image data and motion data are time-synchronized, stating “The hardware processor . . . generates media content using the image data, and receives a selection of spatial coordinates within a three-dimensional region using the inertia data and the image data.”  Finding Abstract.  
Finding discloses the synchronization of the captured image data and motion, stating “The visual inertial navigation (VIN) module 222 enables a wearer or user to view the virtual object layers on a view of a real world environment. An absolute position or relative position of the AR device in space may be tracked using the visual inertial navigation (VIN) module in the AR device. In some embodiments, the VIN module generates a plurality of video frames with at least one camera of the AR device and generates inertial measurement unit (IMU) data with at least one IMU sensor of the AR device. The VIN module tracks features in the plurality of video frames for each camera, synchronizes and aligns the plurality of video frames for each camera with the IMU data. The VIN module then computes a dynamic state of the AR device based on the synchronized plurality of video frames with the IMU data.”  Finding ¶ 49.
Finding discloses, together with image data, audio data is also recorded, stating “The recorded content dataset 616 includes, for example, media content, audio recording, recorded images of virtual objects, notes, and corresponding 3D coordinates.”  Finding ¶ 80.
Finding does not explicitly disclose that the recorded image data and the audio data are synchronized.  It is expected to be so.  The Examiner takes an Official Notice that image data ) comprising: 
video data captured by a camera, audio data captured by a microphone within audio proximity of the camera (
Finding discloses the use of camera and inertial measurement unit, stating “A device has an optical sensor, an inertial sensor, and a hardware processor. The optical sensor generates image data. The inertial sensor generates inertia data. The hardware processor receives an augmented reality (AR) authoring template authored at a client device, generates media content using the image data, and receives a selection of spatial coordinates within a three-dimensional region using the inertia data and the image data.”  Finding Abstract.  The optical sensor is mapped to the camera, and the inertial sensor is mapped to the inertial measurement unit. 
Finding discloses the use of microphone, stating “The sensors 202 may include, for example, and without limitation. a proximity or location sensor (e.g., near field communication, GPS, Bluetooth, Wi-Fi), an optical sensor (e.g., a camera), an orientation or inertia sensor (e.g., a gyroscope, accelerometer, inertial measurement unit (IMU)), an audio sensor (e.g., a microphone), depth sensors, such as, infrared (IR) camera and IR projector, thermal sensor or any suitable combination thereof.”  Finding ¶ 44.

    PNG
    media_image6.png
    380
    439
    media_image6.png
    Greyscale
, showing that the sensors, which includes camera, inertial measurement unit, and microphone, may be integrated into an AR device 106. 
Finding discloses the AR device may be an HMD, wearable display device, or mobile phone, stating “The network environment 100 includes one or more AR devices 106 such as, and without limitation, a head mounted display, wearable display devices, or a mobile phone.”  Finding ¶ 30.
Finding does not explicitly disclose that the inertial measurement unit is “physically fixed relative to the camera.”  It is expected to be so in the HMD, wearable display device, or mobile phone.  The Examiner takes an Official Notice that an inertial measurement unit may be physically fixed relative to a camera in an HMD, wearable display device, or mobile phone.  The benefits of combining this well-known knowledge would have been that a device would have been reliably constructed.); 
spatial data capture by one or more spatial sensors ( Finding discloses the use of camera and inertial measurement unit, stating “A device has an optical sensor, an inertial sensor, and a hardware processor. The optical sensor generates image data. The inertial sensor generates inertia data. The hardware processor receives an augmented reality (AR) authoring template authored at a client device, generates media content using the image data, and receives a selection of spatial coordinates within a three-dimensional region using the inertia data and the image data.”  Finding Abstract.
Finding discloses the use of other sensors, stating “[i]n some example embodiments, the AR device 106 may offload some processes (e.g., tracking and rendering of virtual objects to be displayed in the AR device 106) using the tracking sensors and computing resources of the server 112. The tracking sensors may be used to track the location and orientation of the AR device 106 externally without having to rely on the sensors internal to the AR device 106. The tracking sensors may be used additively or as a failsafe/redundancy or for fine tuning. The tracking sensors may include optical sensors (e.g., depth-enabled 3D IR cameras), wireless sensors (e.g., Bluetooth, WiFi), GPS sensors, biometric sensors, and audio sensors to determine the location of the user 105 with the AR device 106, distances between the user 105 and the tracking sensors in the physical environment (e.g., sensors placed in corners of a venue or a room), or the orientation of the AR device 106 to track what the user 105 is looking at (e.g., direction at which the AR device 106 is pointed).”  Finding ¶ 38.), and 
motion data captured by an inertial measurement unit physically fixed relative to the camera (
The optical sensor generates image data. The inertial sensor generates inertia data. The hardware processor receives an augmented reality (AR) authoring template authored at a client device, generates media content using the image data, and receives a selection of spatial coordinates within a three-dimensional region using the inertia data and the image data.”  Finding Abstract.  The optical sensor is mapped to the camera, and the inertial sensor is mapped to the inertial measurement unit.), 
wherein the time-synchronized session data relate to a session during which the user physically performs a procedure having a plurality of steps (

    PNG
    media_image7.png
    677
    257
    media_image7.png
    Greyscale
, according to steps 707 and 708, the recorded content is used to generate AR instructions.                                                                                        
Finding discloses examples of AR instructions which may show a user physically performing a procedure, stating “In another example, a factory worker may install a new production line and may want to share shut down and/or start up information with other workers. The process may involve several instructions at different points along the line. . . . In other words, the relevant information may be displayed at a predefined location within the plant. In another example, an experienced boiler engineer may want to teach his less experienced team  ¶ 24. 
Finding discloses further examples of AR instructions which may show a user physically performing a procedure, stating “[t]he present application describes an AR device than enables a user of the AR device to generate virtual content by recording a video of the user fixing a machine. For example, the video may show how to change a filter of a machine. Other virtual content may include, for example, and without limitation, video, images, thermal data, biometric data, user and application input, graphics, audio, annotations, AR manipulations, 3D objects, graphics animations, or substantially any other display render-able data.”  Finding ¶ 25.), and 
wherein the audio data comprise spoken words of the user (
Finding discloses recording audio comments, stating “In accordance with another embodiment, the server 112 receives media content from the AR device 106 and generates annotations (e.g., audio/video comments) on the media content. Each audio/video comment is associated with a particular location in space. The server 112 stores the audio/video comments and spatial location for the corresponding portions of the media content.”  Finding ¶ 36.  Audio comments comprise spoken words of a user.); 
substantially contemporaneously with the capturing of the time-synchronized session data, processing the time-synchronized session data by a processor 
Finding discloses the AR device may be an HMD, wearable display device, or mobile phone, stating “The network environment 100 includes one or more AR devices 106 such as, and without limitation, a head mounted display, wearable display devices, or a mobile phone.”  Finding ¶ 30.
Finding teaches or suggests the captured image data and motion data are time-synchronized, stating “The hardware processor . . . generates media content using the image data, and receives a selection of spatial coordinates within a three-dimensional region using the inertia data and the image data.”  Finding Abstract.  
Finding discloses the synchronization of the captured image data and motion, stating “The visual inertial navigation (VIN) module 222 enables a wearer or user to view the virtual object layers on a view of a real world environment. An absolute position or relative position of the AR device in space may be tracked using the visual inertial navigation (VIN) module in the AR device. In some embodiments, the VIN module generates a plurality of video frames with at least one camera of the AR device and generates inertial measurement unit (IMU) data with at least one IMU sensor of the AR device. The VIN module tracks features in the plurality of video frames for each camera, synchronizes and aligns the plurality of video frames for each camera with the IMU data. The VIN module then computes a dynamic state of the AR device based on the synchronized plurality of video frames with the IMU data.”  Finding ¶ 49.
Finding discloses, together with image data, audio data is also recorded, stating “The recorded content dataset 616 includes, for example, media content, audio recording, recorded images of virtual objects, notes, and corresponding 3D coordinates.”  Finding ¶ 80.
Finding does not explicitly disclose that the recorded image data and the audio data are synchronized.  It is expected to be so.  The Examiner takes an Official Notice that image data ); and 
for each of the one or more of the plurality of steps, displaying on a display visible to the user of the hands-free head-mounted device, an indication of the each step while the each step is being performed (
Finding discloses examples of AR instructions which may show a user physically performing a procedure, stating “In another example, a factory worker may install a new production line and may want to share shut down and/or start up information with other workers. The process may involve several instructions at different points along the line. . . . In other words, the relevant information may be displayed at a predefined location within the plant. In another example, an experienced boiler engineer may want to teach his less experienced team member how to service an old boiler because minimal documentation is currently available.”  Finding ¶ 24. 
Finding discloses further examples of AR instructions which may show a user physically performing a procedure, stating “[t]he present application describes an AR device than enables a user of the AR device to generate virtual content by recording a video of the user fixing a machine. For example, the video may show how to change a filter of a machine. Other virtual content may include, for example, and without limitation, video, images, thermal data, biometric  ¶ 25.
Finding states “a plant operator may want to train employees to complete a 20-point daily inspection. The plant operator, using an AR device, may place inspection points throughout the plant to illustrate what the employees are to do. The plant operator can record steps of instructions at one or more locations which particular steps should be performed in the plant and then share the recorded steps of instructions as virtual content with the employees. The employees can use their own AR devices to view the recorded steps of instructions to potentially learn how to perform the same inspection.”  Finding ¶ 23.). 
However, Finding does not explicitly disclose processing the time-synchronized session data by a processor to identify relative times during the session at which one or more of the plurality of steps commences. 
Kohlhoff discloses processing data by a processor to identify relative times during the session at which one or more of the plurality of steps commences (Kohlhoff discloses the use of voice command, stating “[a]n operator wearing a AR/VR wearable device such as a smart glass may start recording assembling by a vocal/voice command like ‘start recording’. The voice command ‘start recording’ may be referred to as a first voice command.”  Kohlhoff ¶ 19. ).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine Finding with Kohlhoff.  The suggestion/motivation .

Conclusion 
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Guven et al. (US 10747300 B2), which recites “Dynamic Content Generation For Augmented Reality Assisted Technology Support.”  Title. 
Schmirler et al. (US 10735691 B2), which recites “Virtual Reality And Augmented Reality For Industrial Automation.”  Title 
Kritzler et al. (US 20200117268 A1), which recites “AUTHORING AUGMENTED REALITY EXPERIENCES USING AUGMENTED REALITY AND VIRTUAL REALITY”  Title
Wright et al. (US 20190370544 A1), which recites “Accordingly, the local user can move through the “AR experience” while being hands free, enabling the user to perform manual tasks while simultaneously receiving instructions from the “AR experience” of the HMD.”  ¶ 78.
Li et al. (US 20190114482 A1), which recites “METHODS FOR PROVIDING TASK RELATED INFORMATION TO A USER, USER ASSISTANCE SYSTEMS, AND COMPUTER-READABLE MEDIA.”  Title.
Dusik et al. (US 20150146007 A1), which recites “The maintenance assistance system may include, but is not limited to, a camera, a heads-up display, a memory configured to maintenance task data.”
Lehtiniemi et al (US 20190369722 A1) 
    PNG
    media_image9.png
    483
    774
    media_image9.png
    Greyscale

Petersen, Nils, and Didier Stricker. "Learning task structure from video examples for workflow tracking and authoring." 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 2012.
Petersen, Nils, Alain Pagani, and Didier Stricker. "Real-time modeling and tracking manual workflows from first-person vision." 2013 IEEE International symposium on mixed and augmented reality (ISMAR). IEEE, 2013.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZHENGXI LIU whose telephone number is (571)270-7509.  The examiner can normally be reached on M-F 9 AM - 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on (571) 272-7794.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ZHENGXI LIU/Primary Examiner, Art Unit 2611