DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
2.	The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

3.	The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103(a) are summarized as follows:
1.	Determining the scope and contents of the prior art.
2.	Ascertaining the differences between the prior art and the claims at issue.
3.	Resolving the level of ordinary skill in the pertinent art.
4.	Considering objective evidence present in the application indicating obviousness or nonobviousness.

4.	Claims 1-4, 6-8 and 10-20 are rejected under 35 U.S.C. 103(a) as being unpatentable over Ostrover et al. (US 2019/0253691 A1) in view of Divine et al. (US 2018/0247024 A1).
5.	With reference to claim 1, Ostrover teaches A method for creating an augmented reality (AR) environment, (“the encoder is configured to generate AR (augmented reality) encoded content based on received content and content obtained the method including: obtaining audio and video from a field of view; (“wherein said video components define at least a first AR scene including virtual and real elements and said audio processor is configured to receive virtual and real audio components and acoustic characteristics of said first AR scene, said audio processor generating coefficients descriptive audio signals configured to match the acoustic characteristics of said first AR scene.” [0026]) Ostrover further teaches recognizing speech in the audio; (“as discussed in more detail below, the audio track generated herein includes not only standard audio elements such speech, sound effects, etc., but these elements are modified and new elements are added that conform to characteristics of the virtual environment of the scene being played, and, in some cases the physical environment of the viewer(s).” [0046]) Ostrover teaches determine an AR effect based on the detect event; (“The processor 200 analyzes the raw audio tracks and modifies them as necessary to compensate for the acoustics of the theater. For example, a water wall, a whispered conversation or a passing fire truck sound differently to a spectator seated in different theaters, or in different locations within a theater, in either case with reference to the different acoustic environmental characteristics. The environmental processor 200 compensates for this effect so that the spectator will hear a more realistic sound track--i.e. a sound track closer to what the content director originally intended. … In order to obtain a realistic effect, audio processor 21B includes an environmental processor 210, a virtual environment detector 212 and an acoustic memory 214. The detector 212 detects the appropriate visual environment for a particular scene based on video information it receives. The video information may be render the AR effect with the video to create an AR environment. (“For this presentation, both the audio and the video information are combined with real audio and video signals from the actual environment of the viewer resulting in the augmented reality.” [0070])
Ostrover does not explicitly teach analyzing the recognized speech to detect an event; This is what Divine teaches (“the procedure identification component 112 can determine the procedure being performed based on analysis of the received input data associated with performance of the procedure. For example, based on the received input (e.g., visual, audio, motion, biometric, data entry, medical device generated, etc.), the procedure characterization component 114 can generate one or more descriptive parameters regarding characteristics of the procedure observed (e.g., identified objects, people, words, actions or event, characteristics of those actions or events, etc.). The procedure identification component 112 can further determine the type of procedure being performed by comparing the one or more descriptive parameters with predefined information (e.g., in memory 130, at one or more external information sources 138, or at another device) that associates the one or more descriptive parameters (e.g., the identified objects, people, words, actions or event, characteristics of those actions or 
6.	With reference to claim 2, Ostrover teaches the AR effect is rendered in real time with the audio and video. (“AR presentations are similar to VR presentations and consist of images of real time objects that a spectator is looking at and which images are combined with other 3D images that are superimposed or otherwise combined with the real time images.” [0045] “FIG. 6C shows the details of an audio processor 21C for an AR presentation. For this presentation, both the audio and the video information are combined with real audio and video signals from the actual environment of the viewer resulting in the augmented reality.” [0070])
7.	With reference to claim 3, Ostrover does not explicitly teach the obtaining comprises capturing video from a camera and simultaneously recording audio from a microphone, the camera and the microphone included in a computing device. This is what Divine teaches (“the remote assisting entity can be provided with a 
8.	With reference to claim 4, Ostrover does not explicitly teach the obtaining audio and video includes receiving the audio and video from a network. This is what Divine teaches (“The user device 102 can include any suitable computing device associated with a user and that can receive and/or render auxiliary information generated by the AR assistance module 110. For example, the one user device can include a desktop computer, a laptop computer, a television, an Internet enabled television, a mobile phone, a smartphone, a tablet user computer (PC), or a personal digital assistant (PDA).” [0057] “the user device 102 includes at least one camera 104 and a display 106. Speakers or other audio output may also be included in some embodiments. The camera 104 can include any suitable camera configured to capture image data, including still images and/or video. For example, in one or more embodiments, the camera 104 can capture live video of a healthcare professional's environment in association with performance of a healthcare procedure by the healthcare professional.” [0047] “For example, the one or more external information sources 138 can include systems, servers, devices, etc., that are internal to a particular healthcare organization providing and/or controlling the server device 102 and associated AR assistance module 110, as well as various remote systems, server, devices, etc., accessible via a network (e.g., the Internet).” [0049]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the 
9.	With reference to claim 6, Ostrover does not explicitly teach the event is a recognized word. This is what Divine teaches (“the procedure identification component 112 can determine the procedure being performed based on analysis of the received input data associated with performance of the procedure. For example, based on the received input (e.g., visual, audio, motion, biometric, data entry, medical device generated, etc.), the procedure characterization component 114 can generate one or more descriptive parameters regarding characteristics of the procedure observed (e.g., identified objects, people, words, actions or event, characteristics of those actions or events, etc.). The procedure identification component 112 can further determine the type of procedure being performed by comparing the one or more descriptive parameters with predefined information (e.g., in memory 130, at one or more external information sources 138, or at another device) that associates the one or more descriptive parameters (e.g., the identified objects, people, words, actions or event, characteristics of those actions or events, etc.) or patterns in the descriptive parameters with a defined type of procedure.” [0064] “the descriptive parameters identified and/or generated by the procedure characterization component 114 can describe various aspects of a procedure being performed, including respective actions and events of the procedure and characteristics of those actions or events. For example, descriptive information generated by the procedure characterization component over the course of procedure can include but is not limited to: objects associated with a procedures or 
10.	With reference to claim 7, Ostrover teaches the AR effect is a virtual element presented on a display. (“The immersive AR/VR effects may be provided or enhanced by motion sensors in a headset (or elsewhere) that detect motion of the user's head, and adjust the video display(s) accordingly. By turning his head to the side, the user can see the VR or AR scene off to the side; by turning his head up or down, the user can look up or down in the VR or AR scene. The headset (or other device) may also include tracking sensors that detect position of the user's head and/or body, and adjust the video display(s) accordingly. By leaning or turning, the user can see a VR or AR scene from a different point of view.” [0006] “As in FIG. 6B, the virtual video information is provided to virtual environmental detector 226. The environmental detector 226 detects the virtual environment from the video signals and provides this information to acoustic memory 228.’ [0071] “The term `virtual actions or characters` is used to describe cartoon characters or other virtual objects or action (both visual and audio) generated animation or by a video game or other similar rendering device.” [0067]).
the virtual element is one or more of a text, an icon, an animation, an image, or a video. (“The immersive AR/VR effects may be provided or enhanced by motion sensors in a headset (or elsewhere) that detect motion of the user's head, and adjust the video display(s) accordingly. By turning his head to the side, the user can see the VR or AR scene off to the side; by turning his head up or down, the user can look up or down in the VR or AR scene. The headset (or other device) may also include tracking sensors that detect position of the user's head and/or body, and adjust the video display(s) accordingly. By leaning or turning, the user can see a VR or AR scene from a different point of view.” [0006] “As in FIG. 6B, the virtual video information is provided to virtual environmental detector 226. The environmental detector 226 detects the virtual environment from the video signals and provides this information to acoustic memory 228.’ [0071] “The term `virtual actions or characters` is used to describe cartoon characters or other virtual objects or action (both visual and audio) generated animation or by a video game or other similar rendering device.” [0067]).
12.	Claim 10 is similar in scope to the combination of claims 1 and 3, and thus is rejected under similar rationale. Ostrover additionally teaches a processor communicatively coupled to the one or more cameras, the one or more microphones, and the memory, (“the raw audio tracks are fed to a respective summer 222, Summer 222 also receives real live audio signals from the actual environment of the viewer through a microphone 220. The combined audio tracks are provided to environmental processor 224. As in FIG. 6B, the virtual video information is provided to virtual environmental detector 226. The environmental detector 226 detects the virtual 
13.	With reference to claim 11, Ostrover does not explicitly teach a communication interface for communication with a network. This is what Divine teaches (“Remote computer(s) 2044 is logically connected to computer 2012 through a network interface 2048 and then physically connected via communication connection 2050. Network interface 2048 encompasses wire and/or wireless communication networks such as local-area networks (LAN), wide-area networks (WAN), cellular networks, etc. LAN technologies include Fiber Distributed Data Interface (FDDI), Copper Distributed Data Interface (CDDI), Ethernet, Token Ring and the like. WAN technologies include, but are not limited to, point-to-point links, circuit switching networks like Integrated Services Digital Networks (ISDN) and variations thereon, packet switching networks, and Digital Subscriber Lines (DSL). Communication connection(s) 2050 refers to the hardware/software employed to connect the network interface 2048 to the system bus 2018.” [0187]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Divine into Ostrover, in order to facilitate adherence to defined practice standards and minimize errors associated with deviations from the defined practice standards.
14.	With reference to claim 12, Ostrover does not explicitly teach receive at least a portion of the AR environment from the network. This is what Divine teaches (“The user device 102 can include any suitable computing device associated with a user and 
15.	With reference to claim 13, Ostrover teaches the AR environment includes computer generated virtual elements combined with real elements imaged using the one or more cameras. (“"Virtual Reality" is a term that has been used for various types of content that simulates immersion in a partially or wholly computer-generated and/or live action three-dimensional world. Such content may include, for example, various video games and animated film content. A variation of these technologies is sometimes called "Augmented Reality." In an Augmented Reality presentation, an actual 3D presentation of the current surroundings of a user that is `augmented` by the addition of one or more virtual objects or overlays.” [0004] “the virtual video information is provided to virtual environmental detector 226. The environmental detector 226 detects the virtual environment from the video signals and provides this information to acoustic memory 228. In addition, a real environment detector 230 is used to detect the real environment of the viewer. For this purpose, the detector 230 is connected to a video camera 232.” [0071-0072])
16.	With reference to claim 14, Ostrover teaches the audio is speech. (“the audio track generated herein includes not only standard audio elements such speech, sound effects, etc., but these elements are modified and new elements are added that conform to characteristics of the virtual environment of the scene being played, and, in some cases the physical environment of the viewer(s).” [0046])
17.	With reference to claim 15, Ostrover teaches the processor (“As shown in FIG. 1A, the apparatus for implementing the invention includes a 3D video encoder 10, an audio processor 21, a video processor 23 and an authoring tool 22.” [0049])
Ostrover does not explicitly teach detect an event based on a word recognized in the speech, the AR environment based on the recognized word. This is what Divine teaches (“the descriptive parameters identified and/or generated by 
18.	With reference to claim 16, Ostrover teaches the processor (“As shown in FIG. 1A, the apparatus for implementing the invention includes a 3D video encoder 10, an audio processor 21, a video processor 23 and an authoring tool 22.” [0049]) 
Ostrover does not explicitly teach determine a meaning from the speech; and detect an event based on the determined meaning, the AR environment based on the determined meaning. This is what Divine teaches (“The audio capture device 804 can include a microphone or another type of audio capture device that can receive and record audio during a procedure, such as speech spoken by the healthcare professional performing the procedure, the patient, and/or one or more other healthcare professionals involved in the procedure. In some implementations, the audio capture device 804 can further process captured audio to convert detected speech to text for providing to the AR assistance module 110.” [0124] “the descriptive parameters identified and/or generated by the procedure characterization component 114 can describe various aspects of a procedure being performed, including respective actions and events of the procedure and characteristics of those actions or events. For example, descriptive information generated by the procedure characterization 
19.	Claim 17 is similar in scope to the combination of claims 10 and 11, and thus is rejected under similar rationale. Ostrover does not explicitly teach one or more augmented reality (AR) sources; This is what Divine teaches (“Information identifying the respective procedures and the guidelines/protocols for the respective procedures can be included in memory 130 (e.g., as procedure guideline/protocol information 132), at one or more external information sources 138, or another source accessible to the 
20.	With reference to claim 18, Ostrover does not explicitly teach the one or more AR sources provide at least a portion of the AR environment. This is what Divine teaches (“Information identifying the respective procedures and the guidelines/protocols for the respective procedures can be included in memory 130 (e.g., as procedure guideline/protocol information 132), at one or more external information sources 138, or another source accessible to the AR assistance module.” [0075] “The one or more external information sources 138 can include various systems and databases that can provide information to the AR assistance module 110 that facilitates evaluating performance of a healthcare related procedure and/or performance of various aspects of a healthcare organization. For example, the one or more external information sources 138 can include systems, servers, devices, etc., that are internal to a particular healthcare organization providing and/or controlling the server device 102 and associated AR assistance module 110, as well as various remote systems, server, devices, etc., accessible via a network (e.g., the Internet).” [0049] “the clinician is wearing an AR device 406 that includes a transparent display through which the clinician 402 can view his environment clearly (e.g., display 106). In various embodiments, the AR device 406 can be or include user device 102. Further, the AR device 406 can include or be communicatively coupled to the AR assistance module 
21.	With reference to claim 19, Ostrover teaches determine a virtual element for the AR environment based on the captured audio and the captured video; (“The immersive AR/VR effects may be provided or enhanced by motion sensors in a headset (or elsewhere) that detect motion of the user's head, and adjust the video display(s) accordingly. By turning his head to the side, the user can see the VR or AR scene off to the side; by turning his head up or down, the user can look up or down in the VR or AR scene. The headset (or other device) may also include tracking sensors that detect position of the user's head and/or body, and adjust the video display(s) accordingly. By leaning or turning, the user can see a VR or AR scene from a different point of view.” [0006] “As in FIG. 6B, the virtual video information is provided to virtual environmental detector 226. The environmental detector 226 detects the virtual environment from the video signals and provides this information to acoustic memory 228.’ [0071] “The term `virtual actions or characters` is used to describe cartoon characters or other virtual objects or action (both visual and audio) generated animation or by a video game or other similar rendering device.” [0067]).
the one or more AR sources are configured to: receive the captured audio and the captured video from the computing device; and transmit the virtual element to the computing device. These are what Divine teaches. Divine teaches the one or more AR sources (“Information identifying the respective procedures and the guidelines/protocols for the respective procedures can be included in memory 130 (e.g., as procedure guideline/protocol information 132), at one or more external information sources 138, or another source accessible to the AR assistance module.” [0075]) Divine also teaches receive the captured audio and the captured video from the computing device; and transmit the virtual element to the computing device. (“The user device 102 can further provide this feedback to the AR assistance module 110 for processing in real-time in association with evaluating performance of the procedure and/or the environment. For example, the one or more input device 802 can include camera 104 which is previously described. The one or more input devices 802 can also include but are not limited to, an audio capture device 804, one or more motion sensors 806 and/or one or more biometric sensors. The audio capture device 804 can include a microphone or another type of audio capture device that can receive and record audio during a procedure, such as speech spoken by the healthcare professional performing the procedure, the patient, and/or one or more other healthcare professionals involved in the procedure. In some implementations, the audio capture device 804 can further process captured audio to convert detected speech to text for providing to the AR assistance module 110.” [0124] "The user device 102 can include any suitable computing device associated with a user and that can receive and/or render auxiliary information generated by the AR assistance module 110." 
22.	With reference to claim 20, Ostrover does not explicitly teach the computing device is a smartphone, a tablet, or a virtual assistant appliance. This is what Divine teaches (“The user device 102 can include any suitable computing device associated with a user and that can receive and/or render auxiliary information generated by the AR assistance module 110. For example, the one user device can include a desktop computer, a laptop computer, a television, an Internet enabled television, a mobile phone, a smartphone, a tablet user computer (PC), or a personal digital assistant (PDA).” [0057]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Divine into Ostrover, in order to facilitate adherence to defined practice standards and minimize errors associated with deviations from the defined practice standards.
23.	Claims 5 and 9 are rejected under 35 U.S.C. 103(a) as being unpatentable over Ostrover et al. (US 2019/0253691 A1) and Divine et al. (US 2018/0247024 A1), as applied to claim 1 above, and further in view of Sommers et al. (US 2018/0047395 A1).
24.	With reference to claim 5, Ostrover does not explicitly teach the analyzing the recognized speech includes semantic analysis. This is what Divine teaches.  Divine teaches the analyzing the recognized speech (“the procedure identification component 112 can determine the procedure being performed based on analysis of the 
The combination of Ostrover and Divine does not explicitly teach semantic analysis. This is what Sommers teaches (“A word flow annotation implementation may perform conversion of speech to text locally or remotely, e.g., on a wearable device using the location processing & data module 260 or on a remote server (which for example includes the remote computing system 1220).” [0119] “The word flow annotation system may comprise a data repository (e.g., a database) of information including objects of interest and their associated auxiliary information. For example, the data repository may store common words, rare words, other contextual keywords, common objects in a user's environment (with which the user often interacts), etc. The auxiliary information can include semantic information.” [0135]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Sommers into the combination of Ostrover and Divine, in order to facilitate a comfortable, natural-feeling, rich presentation of virtual image elements amongst other virtual or real-world imagery elements.
25.	With reference to claim 9, the combination of Ostrover and Divine does not explicitly teach the AR environment is part of a video chat application. This is what Sommers teaches (“the speaker and the listener may be conversing via a telephone or through an Internet audio or audio-video chat session. The speaker and the listener may be conversing using AR systems communicating through a network (such as, e.g., in a telepresence session), as illustrated in FIG. 12.” [0112]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Sommers into the combination of Ostrover and Divine, in order to facilitate a comfortable, natural-feeling, rich 

Conclusion
26.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michelle Chin whose telephone number is (571)270-3697.  The examiner can normally be reached on M-F 9:00am-5:30pm. If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mark Zimmerman can be reached on (571)272-7653.  The fax phone number for the organization where this application or proceeding is assigned is (571)273-8300.   
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at (886)217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call (800)786-9199 (IN USA OR CANANA) or (571)-272-1000.

/MICHELLE CHIN/

Primary Examiner, Art Unit 2619