Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-18, 21-27, 36-37, and 39-58 are pending. Claims 1, 6, 16, 25 are independent.  
Claims 19 and 38 are canceled. Claims 6, 10, 12, 16, and 43 are amended. 

Claim 1:   Dependents: 2-15:                 
Claim 25: Dependents: 49-58
Claim 6:    Dependents: 36-37, 39-48
Claim 16:  Dependents: 17-18, 21-24, 26-27
This Application was published as U.S. 2021/0407513.
Apparent priority 29 June 2020.

Applicant’s amendments and arguments are considered but are either unpersuasive or moot in view of the new grounds of rejection that, if presented, were necessitated by the amendments to the Claims.
This action is Final.
Response to Amendments
Rejection of Claims 10-13 under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite is withdrawn in view of the amendments to Claims 10 and 13.
Rejection of Claims 10 and 43 under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite is withdrawn in view of the amendments to these Claims.
Response to Arguments
Claim 1
Claim 1 has not been amended:
1. A head-wearable apparatus comprising: 
a microphone; 
a display panel visible to the wearer; 
a gaze tracker configured to determine a direction of a gaze of a wearer of the head-wearable apparatus; 
a further microphone configured to collect further sound from the sides and rear of the wearer's head; 
an auditory transducer; and 
a controller configured to: 
extract speech from sound collected by the microphone from the determined direction, 
present the extracted speech on the display panel, and 
provide, to the auditory transducer, audio representing the further sound collected by the further microphone responsive to the further sound representing a predetermined keyword, wherein the auditory transducer renders the further audio. 

Applicant argues that the cited references do not teach the highlighted limitations.  Response 12.
Applicant argues that the cited teachings of Wexler in [0506] and [0507] do not teach or suggest that the stored indication that the speech of the second individual is directed toward the first individual is used to play the audio to the first individual and all that Wexler teaches is storing the indication.  Response 13.

Supporting Specification:  Regarding the claimed “keyword,” the instant Application provides:  “[0058] Referring again to FIG. 7, the process 700 may include determining whether a keyword is present in the collected sound, at 704. The keywords may include, for example, the user's name….”  “[0059] When no keyword is detected, the process 700 may include continuing to collect sound, at 702. But when a keyword is detected, the process 700 may include alerting the user to the keyword, at 706. In some embodiments, the alerting may include providing audio representing the sound to an auditory transducer, which renders the audio….”

First, as the citations to the Specification of the instant Application shows, in both the Specification and Wexler, the claimed “keyword” is the name of the user of the hearing device that is being addressed by another speaker.
Second, all that the primary reference is missing, and for the teaching of which Wexler was required, is that the “further sound collected … response to the further sound representing a predetermined keyword.”  Otherwise, the “further microphone” and the provision of the sound to the user/wearer of the device via an auditory transducer/speaker is taught by Conliffe and not disputed by the Applicant. 
Conliffe is directed to hearing aids and displaying the text of the incoming person to the person wearing the hearing aid.  Conliffe teaches that in addition to the speech that is collected from the direction of the gaze of the user (wearer of the device), it also collects audio from other directions and around and behind the user and presents it to the user.  The other audio that Conliffe collects is disclosed as potential sirens or listening to music and the like:  “[0058] … In an Active Mode, a wide a beam as possible may be recorded. This may be particularly useful, for example, during times when it is important for the wearer to hear things around them, such as oncoming cars or police sirens. In a Music Mode, the wearer may be listening to music and the apparatus may operate so that the music is not canceled out via filtering….”  
Conliffe does not teach that the audio collected from directions other than the gaze direction is provided to the user only if a keyword is detected in that audio.
Wexler, the secondary reference, is also directed to hearing aids and also includes an option of recognizing the speech of a particular speaker and presenting it to the wearer of the device on a display (Figure 38A and [0480], [0481], [0511).  Accordingly, there is no question that the two references are compatible and the combination is proper.  (This is in addition to the fact that only compatibility with the claimed invention is required for a proper combination and not between the references themselves.)
Wexler is primarily directed to conditioning of the audio signal and providing it to the user (wearer of the device) as sound:  “… Based on the retrieved information, the processor may cause selective conditioning of at least one audio signal received by the wearable microphone from a region associated with the at least one sound-emanating object; and cause transmission of the at least one conditioned audio signal to a hearing interface device configured to provide sounds to an ear of the user.”  Abstract.  Applicant focuses on the “storing of an indication” of an audio event in Wexler as if the storing is the end goal.  Wexler is replete with references to “retrieving” the stored information for various purposes of audio conditioning and presenting to the user be in audio or as text.  See, Figure 46B, “[0563] In step 4658, the processing device may retrieve from a database information associated with the at least one sound. …”   and “[0566] In step 4662, the processing device may cause transmission of the conditioned audio signals to hearing interface device 4406, which may be configured to provide sounds to an ear of user 100….” 
The “INDICATORS” or “INDICATIONS” of Wexler are stored in order to be retrieved and used in the generation of the sound that is provided to the ear of the user:
Figure 31C and corresponding Written Description includes another mention to the recognizing of “keyword”:
[0360] In some embodiments, the at least one indicator that the audio signal is related to a public announcement may include a recognized sound, word or phrase associated with the audio signal. For example, the processor may recognize one or more words or phrases that are associated with an airport announcement (e.g., a flight number), and determine that the audio signal is related to a public announcement based on the recognized word or phrase. As another example, the audio signal may include a word (or phrase) such as "help," "watch out," "attention," "announcement" (or similar words or phrases in other languages), or the like, or a combination thereof. The processor may analyze the audio signal and recognize such word (or phrase) and determine that the audio signal is related to a public announcement based on the recognized word (or phrase). Alternatively or additionally, the at least one indicator that the audio signal is related to a public announcement may include a volume level of the audio signal relative to an ambient noise level, which may indicate that the audio signal relates to a yell, scream, a public announcement over a loudspeaker, or the like, or a combination thereof. For example, the processor may determine that the volume level of the audio signal is greater than the ambient noise level by a threshold, and determine that the audio signal may be related to a public announcement or an event that needs attention. Alternatively or additionally, the at least one indicator that the audio signal is related to a public announcement includes at least one signal component associated with the audio signal indicative of production of the audio signal by a loudspeaker. For example, the audio signal may be related to a broadcast over one or more loudspeakers, which may include one or more signal characteristics indicating amplification of the voice or reproduction of the voice over a public address system.
…
[0364] At step 3163, the hearing aid system may cause transmission of the selectively amplified audio signal to a hearing interface device….

Further, with respect to the cited Figures 39B and 40B, Wexler continues:
[0510] In some embodiments, processes 39A-39B or 40A-40B may include additional steps. For example, processor 3803 may perform those additional steps after any step of 39A-39B or 40A-40B.


With respect to teaching a “second microphone” by Wexler in addition to the teachings of Conliffe see Wexler at:
[0141] Various views of apparatus 110 are illustrated in FIGS. 4E through 4K. For example, FIG. 4E shows a view of apparatus 110 with an electrical connection 441. Electrical connection 441 may be, for example, a USB port, that may be used to transfer data to/from apparatus 110 and provide electrical power to apparatus 110. In an example embodiment, connection 441 may be used to charge a battery 442 schematically shown in FIG. 4E. FIG. 4F shows F-view of apparatus 110, including sensor 220 and one or more microphones 443. In some embodiments, apparatus 110 may include several microphones 443 facing outwards, wherein microphones 443 are configured to obtain environmental sounds and sounds of various speakers communicating with user 100. FIG. 4G shows R-view of apparatus 110. In some embodiments, microphone 444 may be positioned at the rear side of apparatus 110, as shown in FIG. 4G. Microphone 444 may be used to detect an audio signal from user 100. It should be noted, that apparatus 110 may have microphones placed at any side (e.g., a front side, a rear side, a left side, a right side, a top side, or a bottom side) of apparatus 110. In various embodiments, some microphones may be at a first side (e.g., microphones 443 may be at the front of apparatus 110) and other microphones may be at a second side (e.g., microphone 444 may be at the back side of apparatus 110).

Accordingly, under at least three different rationales, a combination of Conliffe and Wexler teaches the Claim:
(1) all that is required from Wexler is a teaching that it responds to a predetermined keyword which Wexler teaches either by teaching that it responds to the name of the user being called or keywords indicating, e.g., a particular announcement.
(2) Wexler teaches the use of several microphones located on different positions on the device (Figures 4E through 4K) in addition to the presence of this teaching in Conliffe.
(3) Wexler teaches the use of “stored indications/indicators” for the generation of sound that is output to the ear of the user.  Thus, a teaching that the indicators of keyword are used for output of sound (and are not merely stored for sake of being stored) is either implied or made obvious by a combination of different embodiments that are taught by Wexler.

Patentability of the other independent Claim 25 that is similar to Claim 1 is argued based on their similarity to Claim 1.  Response 13.  Accordingly, the above provides a reply to those arguments as well.
Patentability of the dependent Claims is argued based on their dependence from their base independent Claims.  Response 13. Accordingly, the above provides a reply to those arguments as well.

Claim 6
Claim 6 has been amended to include that the extracted speech is presented “as single words presented in a temporal series”:
6. A head-wearable apparatus comprising: 
a microphone; 
a display panel visible to the wearer; 
a gaze tracker configured to determine a direction of a gaze of a wearer of the head-wearable apparatus; 
a controller configured to: 
extract speech from sound collected by the microphone from the determined direction, and 
present the extracted speech on the display panel as single words presented in a temporal series; and 
an off-axis projector configured to project the extracted words onto the display panel; 
wherein the display panel comprises a transflective diffuser.

	Applicant argues that the cited Basson does not teach presenting the text on the display panel as single words presented in a temporal series.  Response 14.  Applicant argues that Basson shows all of the words as appearing and being presented simultaneously.
	In Reply:
	First: what does it is mean to “present the extracted speech on the display panel as single words presented in a temporal series.”? 
	Specification provides:  “[0054] … As another example, the text may be presented as single words in a temporal series. That is, one word is presented at a time….”  
	This could mean that the words are presented one by one as they are recognized OR that single words appear in the time order that they were recognized Or that only one word is shown to the user at any time.  In sing-along applications (or Karaoke) the words appear one by one as they are being output as sound and this type of display teaches “presenting the text as single words in a temporal series.”   The older words disappear from the screen as the new words are added.  Most recognition results consist of single words appearing in a time-series format unless speech is arriving asynchronously.  An input may be a single word and then only a single word would appear.  
	This could also mean that the text consists of a plurality of words and the text is presented on the display one word at a time in order such that all of the plurality of words appear on the display over time but no more than one word is being shown at any time. To convey this concept, the Claim needs more words.  For example, speech that consists of a single word will always appear a single word.
	As is, the claimed language is open to interpretation and the Specification too is not determinative.  There is only one single mention to this feature in the Specification and the goal of this feature is not explained based upon which a more determinative interpretation can be deduced.
	Basson teaches: “…An  image player coupled to the visual feature extractor presents an image segment with a corresponding decoded word. The image segment may be presented as an animation of successive images in time, whereby a user is provided multiple sources of information for comprehending the utterance and can more easily ascertain the relationship between the body movements and the corresponding decoded speech.”  Abstract.  
Considering that each animation corresponds to a word, Basson teaches that “extracted speech {is presented as} single words … in a temporal series.” 
In order to achieve the language that is commensurate with the argued concept, more words are required.  For example, state that “the text to be displayed includes a plurality of words and only a single word is displayed at any time.”
	 Finally, note the references provided in the Conclusion section that teach showing only a single word at any one time.

Patentability of the other independent Claim 16 that is similar to Claim 6 is argued based on their similarity to Claim 6.  Response 16.  Accordingly, the above provides a reply to those arguments as well.
Patentability of the dependent Claims is argued based on their dependence from their base independent Claims.  Response 16.  Accordingly, the above provides a reply to those arguments as well.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3 and 7-15 are rejected under 35 U.S.C. 103 as being unpatentable over Conliffe (U.S. 20150088500) in view of Wexler (U.S.20220021985).
Regarding Claim 1, Conliffe teaches:
1. A head-wearable apparatus comprising: [Conliffe, Figures 1 and 2 showing a “wearable communication apparatus 100” as eyeglasses.]
a microphone; [Conliffe, Figures 1 and 2, “[0022] In some embodiments, frame 102 may include one or more microphones 106….”]
a display panel visible to the wearer; [Conliffe, Figure 6. “[0027] FIG. 6 depicts an embodiment of the wearable apparatus that shows a display as viewed by the wearer through lens 110A/B….”  “10….. wherein the at least one lens is configured to display visual feedback to the wearer ….”]
a gaze tracker configured to determine a direction of a gaze of a wearer of the head-wearable apparatus; [Conliffe, Figure 2, 202A, b, “[0021] … In some embodiments, infrared cameras 202A and 202B may be configured to track the eye of the wearer of the apparatus. The term "camera" as used herein may refer to its ordinary meaning as well as to any device that may be used to track the movement of an object and/or to provide video with regard to a particular target.”  Figure 8, 802: “[0076] …Process 10 may include tracking (802) an eye of a wearer using a camera associated with a frame, the frame having a processor and a memory associated therewith….”]
a further microphone configured to collect further sound from the sides and rear of the wearer's head; [Conliffe, Figure 1, shows two microphones 106 that are on the sides of the eye glasses.  Figure 3, “Beamformer 302” can get the sound from any direction to which it is steered: “[0031] … A beamformer such as beamformer 302, may be configured to process signals emanating from a microphone array to obtain a combined signal in such a way that signal components coming from a direction different from a predetermined wanted signal direction are suppressed. Microphone arrays, unlike conventional directional microphones, may be electronically steerable which gives them the ability to acquire a high-quality signal or signals from a desired direction or directions while attenuating off-axis noise or interference….”]
an auditory transducer; and [Conliffe, Figures 1 and 2, “speakers 108.”  “[0018] … As shown in FIG. 1 some components may include, but are not limited to, front facing camera 104, at least one microphone 106, one or more speakers 108, transparent lenses 110a and 110b, and rearward facing camera (shown in FIG. 2). …” (auditory transducer is not a term of art; it could be a mic or a speaker; according to the Specification of the instant Application, a speaker is intended.  See [0038] of published Application).]
a controller configured to: [Conliffe, Figures 1 and 2, “[0018] …Frame 102 may include at least one memory and processor onboard that may be configured to communicate with some or all of the components associated with frame 102….”]
extract speech from sound collected by the microphone from the determined direction, [Conliffe,  “[0017] …Some embodiments may include an assistive wearable device that may utilize eye-tracking to direct acoustic beamforming for the purpose of directed speech recognition. ….”  “[0022] …Microphones 106 may be configured to receive speech input signals from one or more individuals and/or alternative input signal sources within the range of apparatus 100….”  Figure 8, 806 and 808, after adjusting the direction of the microphone based on the direction of the eye of the wearer of the glasses, “receiving an audio signal at the … microphone.”]
present the extracted speech on the display panel, and [Conliffe,  Figure 6 shows providing text of close captioned speech to the user.  Figure 8, 810 provides the speech to the wearer using the speakers but the description indicates: “[0026] In some embodiments, the onboard processor may be configured to receive the input signals from microphones 106. The received input signals may be processed and transmitted to speakers 108 for the benefit of the wearer. Additionally and/or alternatively, the onboard processor may convert the received audio signal to text for the wearer to read, thus providing a closed-captioning functionality an example of which is depicted in FIG. 6 discussed below.”]
provide, to the auditory transducer, audio representing the further sound collected by the further microphone responsive to the further sound representing a predetermined keyword, wherein the auditory transducer renders the further audio. [Conliffe, Figure 6, lower right corner shows the “conversation modes”:  “[0058] Referring again to FIG. 6, in some embodiments apparatus 100 may include various conversation modes, which may be selected and/or automatically triggered depending upon the environment. In addition to Normal/Automatic Mode, which is the default setting and which behaves as described above. These different modes may be used to specify the width of the beam as well as the intensity of the filtering and enhancement that may be applied to the audio signal before it is sent to the headphones and speech recognition software. In an Active Mode, a wide a beam as possible may be recorded. This may be particularly useful, for example, during times when it is important for the wearer to hear things around them, such as oncoming cars or police sirens. In a Music Mode, the wearer may be listening to music and the apparatus may operate so that the music is not canceled out via filtering….”]

The last limitation of the Claim says: collect audio from the environment (not the gaze direction) and keyword search this audio and present it to the user if keyword is found.  It seems to want to alert the user if someone is calling him from behind.  Published Application [0056]-[0059] discuss the “keyword” feature.
Conliffe steers the microphone array in the direction of the gaze of the user who is wearing the eye glasses and provides the speech of a person whom the user is looking at to the user in audio or as speech-recognized text.  Conliffe also collects environmental audio and presents it to the user so the user does not miss hearing a siren, for example.
Conliffe provides/renders the environmental audio to the user according to the Mode selected by the user.
Conliffe does not teach keyword searching the audio and presenting it responsive to detecting the keyword.
Wexler teaches:
provide, to the auditory transducer, audio representing the further sound collected by the further microphone responsive to the further sound representing a predetermined keyword, [Wexler, Figure 1A where the device is a hearing aid in an eyeglass.  Figure 39B is directed to determining if an individual is speaking to the user.  Similarly, Figures 40A and 40B receive the voice of a person in the vicinity and determine if this voice is directed to the user of the device:  4012: “Is second individual’s speech directed to user?” or to a number of other individuals in the sound scene.  One way of determining if the speech that is detected is directed to the user is to see if the name of the addressee is detected in the speech.  This name teaches the “predetermined keyword” of the Claim.   “[0495] At step 3916, processor 3803 may determine whether the speech associated with the voice of the first individual is directed toward user 100. In some embodiments, processor 3803 may determine whether the speech associated with the voice of the first individual is directed toward user 100 based on at least one of a detected look direction of user 100 or a detected look direction of the first individual. For example, processor 3803 may determine the look direction of user 100 based on detection of a chin of user 100 in at least one of the images. For another example, processor 3803 may determine the look direction of the first individual based on detection of one or more eyes of the first individual in at least one of the images and based on at least one characteristic of the one or more detected eyes. For another example, processor 3803 may determine the look direction of the first individual based on gestures, gaits, or body movement features of the first individual detected from at least one of the images. For another example, processor 3803 may determine the look direction of the first individual based on the user's name included in the speech of the first individual.”  “[0506] At step 4016, processor 3803 may determine whether the speech associated with the voice of the second individual is directed toward the first individual. Step 4016 may be implemented in a manner similar to step 3916 or 4012. In some embodiments, processor 3803 may determine whether the speech associated with the voice of the second individual is directed toward the first individual based on a look direction of the second individual detected based on analysis of at least one of the images. In some embodiments, processor 3803 may determine whether the speech associated with the voice of the second individual is directed toward the first individual based on detection of a name associated with the first individual in the speech of the second individual.”]
wherein the auditory transducer renders the further audio. [Wexler, Figure 46A, 4614.  Wexler in Figure 40B, 4014 teaches storing an indication that the speech is directed to the user.  But it also adds in Figure 45 and 46A that an “electroacoustic transducer” can provide the conditioned audio signal to the user.   “[0557] After selective conditioning of audio signals associated with the sound-emanating object, the hearing aid system may provide the conditioned audio signals to user 100 (step 4614). The conditioned audio signals may be provided to user 100 using electroacoustic transducer 4502 of hearing interface device 4406….”]

Conliffe and Wexler pertain to weareable devices that are worn on the eye (eye glasses) and used as hearing aid and it would have been obvious to use the name/keyword detection of Wexler that is used as an added method of determining if the speech is directed at the user and therefore providing the content to him with the system of Conliffe that uses eye tracking/gaze detection (also taught by Wexler as an alternative to name detection) or setting of modes in order to account for sounds that come from directions other than that gazed at by the user and possibly from behind and where both references acknowledge that some background noises are important and should not be filtered out.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.  (Wexler:  “[0543] Users of hearing aids systems typically find it intrusive when irrelevant background noises are amplified. Some existing hearing aids systems filter out low-frequency sounds to reduce background noises. This solution eliminates some of the background noises, but it provides a partial solution as it may eliminate important parts of speech sounds or other sounds in the environment of user 100. Other existing hearing aids systems use directional microphones to reduce the sounds from beside and behind the user. This solution provides a better signal-to-noise ratio in certain specific scenarios, but it also provides a partial solution, as some background noises are important and should not be eliminated. The disclosed hearing aid system may include a wearable device (e.g., apparatus 110) that causes selective conditioning of audio signals generated by a sound-emanating object in the environment of the user, and a hearing interface device (e.g., hearing interface device 1710) to provide selectively modified sounds to an ear of user 100. The disclosed hearing aid system may use image data to determine if the background noises are important and cause selective conditioning accordingly. For example, the hearing aid system may amplify background noises determined to be important and attenuate background noises determined not to be important.”)

    PNG
    media_image1.png
    799
    369
    media_image1.png
    Greyscale
 
    PNG
    media_image2.png
    365
    425
    media_image2.png
    Greyscale

    PNG
    media_image3.png
    796
    551
    media_image3.png
    Greyscale


    PNG
    media_image4.png
    507
    794
    media_image4.png
    Greyscale


Regarding Claim 2, Conliffe teaches:
2. The head-wearable apparatus of claim 1, wherein the controller is further configured to present the extracted speech as text on the display panel. [Conliffe, Figure 6, “David: So, that all I was saying.”  Presenting the close-captioned speech on the display of the eyewear.  See also:  “[0017] … embodiments may include an assistive wearable device that may utilize eye-tracking to direct acoustic beamforming for the purpose of directed speech recognition….”]  “[0029] … Accordingly, apparatus 100 may use speech recognition to intelligently categorize elements of the conversation….”]

Regarding Claim 3, Conliffe teaches:
3. The head-wearable apparatus of claim 2, wherein the controller is further configured to present the text on the display panel as multiple words concurrently. [Conliffe, Figure 6, “David: So, that all I was saying.” ]

Regarding Claim 7, Conliffe teaches:
7. The head-wearable apparatus of claim 1, wherein the display panel is one of: transparent; and occluded. [Conliffe teaches the use of transparent or partially transparent lenses as its display.  “transparent lenses 110a and 110b,”  [0018].  “20…. displaying visual feedback to the wearer at the at least one lens,…”  “3…. at least one lens configured to receive the text results from the processor and to provide the text to the wearer.”  “[0027] FIG. 6 depicts an embodiment of the wearable apparatus that shows a display as viewed by the wearer through lens 110A/B. As discussed above, lenses 110A/B may be configured to receive the text results from the processor and to provide the text to the wearer via a display visible to the wearer. Lenses 110A/B may include transparent or partially transparent screens that allow the user to view their surroundings while also providing the closed-captioning feedback shown in FIG. 6. In some embodiments, lenses 110A/B may be configured to display various types of visual feedback to the wearer, for example, via the display shown in FIG. 6. The visual feedback may include, but is not limited to, beam shape, beam direction, and the identification of the non-wearer of the apparatus.”]

Regarding Claim 8, Conliffe teaches and therefore suggests:
8. The head-wearable apparatus of claim 1, wherein the microphone is a directional microphone. [Conliffe teaches that it uses the signal processing technique of “beamforming” on signals from a “microphone array” that it finds superior to the use of “directional microphones”:  “[0031] … Microphone arrays, unlike conventional directional microphones, may be electronically steerable which gives them the ability to acquire a high-quality signal or signals from a desired direction or directions while attenuating off-axis noise or interference….”  By this teaching that “directional microphone” is an inferior option, Conliffe suggests that directional microphones can be used in lieu of mic array.] [Wexler expressly teaches:  “[0204] Apparatus 110 may further comprise one or more microphones 1720 for capturing sounds from the environment of user 100. Microphone 1720 may also be configured to determine a directionality of sounds in the environment of user 100. For example, microphone 1720 may comprise one or more directional microphones, which may be more sensitive to picking up sounds in certain directions….”]

Regarding Claim 9, Conliffe teaches: 
9. The head-wearable apparatus of claim 1, wherein the microphone comprises an array of microphone elements. [Conliffe uses “mic arrays”:  “[0031] The term "Beamforming", as used herein, may generally refer to a signal processing technique used in sensor arrays for directional signal transmission or reception. Beamforming methods may be used for background noise reduction in a variety of different applications. A beamformer such as beamformer 302, may be configured to process signals emanating from a microphone array to obtain a combined signal in such a way that signal components coming from a direction different from a predetermined wanted signal direction are suppressed….”]

Regarding Claim 10, Conliffe teaches: 
10. The head-wearable apparatus of claim 1,
wherein the controller is further configured to provide, to the auditory transducer, audio representing the sound collected by the microphone from the determined direction, wherein the auditory transducer renders the audio representing the sound collected by the microphone from the determined direction. [Conliffe teaches that the sound “isolated” from the direction of the eye/gaze is provided to the wearer of the device.  Figure 8, 810.  “[0076] … Process 10 may further include adjusting (806) a direction of the microphone, based upon the directional instruction, receiving (808) an audio signal at the at least one microphone, and providing (810) the audio signal to the wearer using a speaker associated with the frame.”   (Note 112(b) rejection: Examiner is going to interpret this as the sound from the direction of the gaze for the purpose of applying art.)] [In view of the 112(b) rejection and if “isolated sound” is intended to refer to the background environmental sound such as the siren of Conliffe, Conliffe teaches that as well:  “[0058] … In an Active Mode, a wide a beam as possible may be recorded. This may be particularly useful, for example, during times when it is important for the wearer to hear things around them, such as oncoming cars or police sirens. In a Music Mode, the wearer may be listening to music and the apparatus may operate so that the music is not canceled out via filtering. In a Large Room Mode, additional filters for Acoustic Echo Cancellation may be added to reduce echo and reverb on speakers 108 as well as to further minimize background noise. In a Manual Mode, the directed beam may be matched precisely to eye direction. For example, using a dial or digital interface for specifying beam width. This may be used for situations where the described modes do not apply and the wearer needs to have finer control over the shape and direction of the beam….”  These teachings all indicating “isolating” a particular sound or sound from a particular direction.]
(Note also Wexler: [0141] Various views of apparatus 110 are illustrated in FIGS. 4E through 4K. For example, FIG. 4E shows a view of apparatus 110 with an electrical connection 441. Electrical connection 441 may be, for example, a USB port, that may be used to transfer data to/from apparatus 110 and provide electrical power to apparatus 110. In an example embodiment, connection 441 may be used to charge a battery 442 schematically shown in FIG. 4E. FIG. 4F shows F-view of apparatus 110, including sensor 220 and one or more microphones 443. In some embodiments, apparatus 110 may include several microphones 443 facing outwards, wherein microphones 443 are configured to obtain environmental sounds and sounds of various speakers communicating with user 100. FIG. 4G shows R-view of apparatus 110. In some embodiments, microphone 444 may be positioned at the rear side of apparatus 110, as shown in FIG. 4G. Microphone 444 may be used to detect an audio signal from user 100. It should be noted, that apparatus 110 may have microphones placed at any side (e.g., a front side, a rear side, a left side, a right side, a top side, or a bottom side) of apparatus 110. In various embodiments, some microphones may be at a first side (e.g., microphones 443 may be at the front of apparatus 110) and other microphones may be at a second side (e.g., microphone 444 may be at the back side of apparatus 110).)

Regarding Claim 11, Conliffe teaches: 
11. The head-wearable apparatus of claim 10, further comprising: a hearing aid system comprising the auditory transducer. [Conliffe teaches that its embodiments are used as hearing aids, “[0017] … In this way, embodiments of the present disclosure may allow deaf or hard of hearing people to communicate more easily than existing hearing aids.”  See also Background [0002] which is one paragraph only and noting the shortcomings of hearing aids.]

Regarding Claim 12, Conliffe teaches: 
12. The head-wearable apparatus of claim 1, 
wherein the controller is further configured to provide, to the auditory transducer, audio representing the extracted speech, wherein the auditory transducer renders the audio. [Conliffe, Figure 8, 810 providing the audio signal to the wearer of the glasses.  Figure 6 showing that the speech of a particular speaker is being provided to the user Figure 3 showing the “postfiler 304” and “adaptive blocking matrix 306.”  “[0058] …These different modes may be used to specify the width of the beam as well as the intensity of the filtering and enhancement that may be applied to the audio signal before it is sent to the headphones and speech recognition software....”]

Regarding Claim 13, Conliffe teaches: 
13. The head-wearable apparatus of claim 12, wherein the auditory transducer comprises at least one of: a loudspeaker; an ear speaker; and a bone conduction auditory system. [Conliffe, Figure 1, speakers 108: “[0024] …Speakers 108 may be included within a headphone that may be in communication with the processor. Any suitable speaker may be used without departing from the scope of the present disclosure, including, but not limited to, the ear-bud speakers depicted in FIG. 1. In some embodiments, speakers 108 may be connected with frame 102 using any suitable approach, for example, using an audio jack, hardwired, and/or other connection.”] [Wexler, “276. … wherein the electroacoustic transducer includes a bone conduction microphone.”]

Regarding Claim 14, Conliffe teaches: 
14. The head-wearable apparatus of claim 1, wherein the head-wearable apparatus is a pair of eyeglasses. [Conliffe, Figures 1 and 2 showing the eye glasses worn by a user.]

Regarding Claim 15, Conliffe does not teach the use of its glasses a part of VR headset.
Wexler teaches: 
15. The head-wearable apparatus of claim 1, wherein the head-wearable apparatus is an extended reality headset. [Wexler, “[0117] FIG. 1A illustrates a user 100 wearing an apparatus 110 that is physically connected (or integral) to glasses 130, consistent with the disclosed embodiments. Glasses 130 may be prescription glasses, magnifying glasses, non-prescription glasses, safety glasses, sunglasses, etc. Additionally, in some embodiments, glasses 130 may include parts of a frame and earpieces, nosepieces, etc., and one or no lenses. Thus, in some embodiments, glasses 130 may function primarily to support apparatus 110, and/or an augmented reality display device or other optical display device. ….”  “[0118] … In some embodiments, computing device 120 may be included in an augmented reality display device or optical head mounted display provided integrally or mounted to glasses 130…”]
Rationale for combination as provided for Claim 1.  AR is another use of the device described by Conliffe and it would have been obvious combine the functionalities of the two references to arrive at the limitation of the Claim.

Claims 4-5 are rejected under 35 U.S.C. 103 as being unpatentable over Conliffe and Wexler in view of Basson (U.S. 20020161582).
Regarding Claim 4, Conliffe teaches showing the text of the speech of the target speaker on the lens displays of the eye glasses.  (Figure 6).  The speaker can say a single word and that scenario teaches Claim 4.
Wexler does not discuss this scenario in particular.
Basson teaches: 
4. The head-wearable apparatus of claim 2, wherein the controller is further configured to present the text on the display panel as single words presented in a temporal series. [Basson is directed to alignment of speech and text and display of aligned text to a viewer, for example, for teaching or assisting a deaf person to lip read.  Basson includes showing one word at a time because of its purpose which is showing the user which word corresponds to which picture as opposed to just getting across the entire intent of the speaker.  See Figure 1 showing “I Love New York” one word per one corresponding image.  See Figure 3 showing the time/temporal alignment of the images with words.  “[0015] …By coordinating the images representing a words(s) in the utterance with corresponding decoded speech, a hearing-impaired person or other user can quickly and easily ascertain the relationship between the images and the decoded speech text. Thus, the invention has wide application, for example, for enhancing the accuracy of the ASR system by enabling the user to compare and verify the decoded speech text with the images corresponding to the recognized text, or to assist the user in developing lip reading and/or sign language skills.” “[0024] Each image animation is preferably displayed in close relative proximity to its corresponding decoded speech text, either in a separate text window 114 or in the same window as the image animation. In this manner, a user can easily ascertain the relationship between the images of facial movements and the decoded speech text associated with a particular utterance or portion thereof….”  “[0025] By way of example only, FIG. 1 shows a display 108 including three separate image windows 116, 118 and 120 for displaying animated images of facial movements corresponding to an utterance "I love New York," along with a text window 114 below the image windows for displaying the corresponding decoded textual speech of the utterance. Display window 116 corresponds to the word "I," window 118 corresponds to the word "love" and window 120 corresponds to the words "New York."…”]
Conliffe/Wexler and Basson are related to speech recognition and display of recognized speech to a viewer.  It would have been obvious to combine the word by word display of Basson which is used for teaching (note also the various sing-along applications) with the system of combination to allow explicitly for a word by word display of the recognized speech.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 5, Conliffe does not teach showing hand signs.
Wexler does not discuss this scenario in particular.
Basson teaches: 
5. The head-wearable apparatus of claim 1, wherein the controller is further configured to present the extracted speech as hand signs on the display panel. [Basson is directed to alignment of speech and text and display of aligned text to a viewer.  Basson includes a visual feature extractor that can extract “hand movements of a sign language interpreter” and aligns them with the words of speech.  Basson teaches that hand signs can be aligned with speech and be displayed in addition to speech.  See Figure 4 showing the alignment of hand signs with the words of “I Love New York.”  “[0006] In accordance with one aspect of the invention, a visual feature extractor captures and processes images of body movements (e.g., lip movements of a speaker or hand movements of a sign language interpreter) representing a given utterance. The visual feature extractor comprises a visual detector, for capturing the images of body movements, and an image preparator coupled to the visual detector. The image preparator processes the images from the visual detector and synchronizes the images with decoded speech from the ASR system. Using time information from the ASR system relating to starting and ending time intervals of a particular decoded word(s), the image preparator groups or separates the images into one or more image segments comprising a time sequence of images corresponding to each decoded word in the utterance.”   “[0038] …By way of example only, the text, "I love New York," is displayed in text window 114 below image windows 116, 118, 120, with each image window displaying sign language hand movements for its corresponding word(s) (e.g., text window 114 displays the word "I", while the corresponding image window 116 displays hand movements presenting the word "I" in sign language). In accordance with the principals set forth herein, the present invention contemplates that the method thus described may be employed for teaching sign language.”]
Conliffe/Wexler and Basson are related to speech recognition and display of recognized speech to a viewer.  It would have been obvious to combine the hand sign display of Basson which is used for teaching with the system of combination to allow for teaching of sign language along with the display of the recognized speech.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claims 25, 49-50, 53, and 55-58 are rejected under 35 U.S.C. 103 as being unpatentable over Conliffe in view of Wexler.
Claim 25 is a CRM Claim with limitations similar to the limitations of apparatus Claim 1 and is rejected under similar rationale.  Additionally:
Regarding Claim 25, Conliffe teaches: 
25. A non-transitory machine-readable storage medium encoded with instructions executable by a hardware processor of a computing component, the machine-readable storage medium comprising instructions to cause the hardware processor to perform a method for a head-wearable apparatus, the method comprising: [Conliffe, “[0079] … Furthermore, the present disclosure may take the form of a computer program product on a computer-usable storage medium having computer-usable program code embodied in the medium.”]
determining a direction of a gaze of a wearer of the head-wearable apparatus; 
collecting sound emanating from the determined direction; extracting speech from the collected sound; 
presenting the extracted speech on a display panel of the head-wearable apparatus, wherein the display panel is visible to the wearer; 
collecting further sound from the sides and rear of the wearer's head; and 
providing further audio representing the further sound to an auditory transducer of the head-wearable apparatus responsive to the further sound representing a predetermined keyword, wherein the auditory transducer renders the further audio. 

Claim 49 has limitations similar to the limitations of method Claim 2 which are rejected under similar rationale.
Claim 50 has limitations similar to the limitations of method Claim 3 which are rejected under similar rationale.
Claim 53 has limitations similar to the limitations of method Claim 7 which are rejected under similar rationale.
Regarding Claim 55, Conliffe teaches: 
55. The non-transitory machine-readable storage medium of claim 25, the method further comprising: providing audio representing the collected sound to an auditory transducer of the head-wearable apparatus, wherein the auditory transducer renders the audio. 
 [Conliffe, Figure 8, 810, [0076] …Process 10 may further include adjusting (806) a direction of the microphone, based upon the directional instruction, receiving (808) an audio signal at the at least one microphone, and providing (810) the audio signal to the wearer using a speaker associated with the frame.”  Figure 1, “speakers 108” teach the “audio transducer.”  This audio could be speech of a particular speaker like David in Figure 6 or music or sounds of sirens or police cars.  See [0058].]  (Claim 55 has limitations similar to the limitations of method Claim 23.)

Regarding Claim 56, Conliffe teaches: 
56. The non-transitory machine-readable storage medium of claim 25, the method further comprising: providing audio representing the extracted speech to an auditory transducer of the head-wearable apparatus, wherein the auditory transducer renders the audio.  [ See Figures 6, 8, [0076] and [0058] and rejection of Claim 23.  The audio provided to the user via the speakers 108 (Figure 1) could be speech or other sounds determined by the user.  See the “conversation modes” shown in Figure 6 for focusing on certain sounds or speech or speaking person.] (Claim 56 has limitations similar to the limitations of method Claim 24.)
Claim 57 has limitations similar to the limitations of method Claim 14 which are rejected under similar rationale.
Claim 58 has limitations similar to the limitations of method Claim 15 which are rejected under similar rationale.

Claims 51 and 54 are rejected under 35 U.S.C. 103 as being unpatentable over Conliffe and Wexler in view of Basson. 
Claim 51 has limitations similar to the limitations of method Claim 4 which are rejected under similar rationale.
Claim 54 has limitations similar to the limitations of method Claim 5 which are rejected under similar rationale.

Claim 52 is rejected under 35 U.S.C. 103 as being unpatentable over Conliffe and Wexler in view of Sulai (U.S. 20210080763). 
Claim 52 has limitations of “off-axis projector” and “transflective diffuser” similar to the limitations of independent Claims 6 and 16 and which are rejected under similar mapping rationale.  See rejection of Claim 6 for details.
Conliffe/Wexler and Sulai pertain to weareable devices that are worn on the eye (eye glasses) and it would have been obvious to use the particular material and method of Sulai which is used for projecting an image on eye glasses of a user who needs to use them both for seeing through (transmissive) and as a display panel (reflective) in the device of the combination which does not specify the details of implementation.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 6, 36-37, and 39-48 are rejected under 35 U.S.C. 103 as being unpatentable over Conliffe in view of Sulai and further in view of Basson. 
Regarding Claim 6, Conliffe teaches: 
6. A head-wearable apparatus comprising: [Conliffe, “[0018] Referring to FIG. 1, there is shown an embodiment depicting a wearable communication apparatus 100 in accordance with the present disclosure. Apparatus 100 may include a frame 102 having numerous components associated therewith. As shown in FIG. 1 some components may include, but are not limited to, front facing camera 104, at least one microphone 106, one or more speakers 108, transparent lenses 110a and 110b, and rearward facing camera (shown in FIG. 2). Frame 102 may include at least one memory and processor onboard that may be configured to communicate with some or all of the components associated with frame 102. Frame 102 may be constructed out of any suitable material and may be configured to be worn on the head of a user as indicated by FIG. 1.”  “[0021] Referring also to FIG. 2, an embodiment of a wearable apparatus 200 is shown. Wearable apparatus 200 may include any or all of the components discussed with reference to FIG. 1. Additionally and/or alternatively, wearable apparatus 200 may include one or more cameras 202A and 202B (e.g. infrared cameras), which may be associated with frame 102 and in communication with the onboard processor. In some embodiments, infrared cameras 202A and 202B may be configured to track the eye of the wearer of the apparatus. The term "camera" as used herein may refer to its ordinary meaning as well as to any device that may be used to track the movement of an object and/or to provide video with regard to a particular target.”]
a microphone; [Conliffe, Figure 1, microphones 106 see [0018].]
a display panel visible to the wearer; [Conliffe, Figure 1, “transparent lenses 110a and 110b” are used as the “display panel.”  [0018].]
a gaze tracker configured to determine a direction of a gaze of a wearer of the head-wearable apparatus; [Conliffe, Figure 8, 802, “tracking an eye of a wearer …”  “[0021] …In some embodiments, infrared cameras 202A and 202B may be configured to track the eye of the wearer of the apparatus….”   Figure 8, 804 and 806: “adjusting a direction of the microphone, based upon the directional instruction” which is usually coincident with the direction of the eye gaze but can be adjusted by the user.   “[0023] In some embodiments, an identification of the non-wearer of the apparatus may be performed based upon an input from front facing camera 104 and may be based upon the user selecting the non-wearer as the person of interest or the person's whose speech the wearer is interested in focusing upon (e.g. using an eye movement, audible selection, physical button selection, etc.). … In some embodiments, at the wearer's option, a directional instruction may be sent to microphones 106 based upon this identification and selection.”  See also [0058] and [0085].]
a controller configured to: [Conliffe, “Embodiments disclosed herein may include a wearable apparatus including a frame having a memory and processor associated therewith….”  Abstract. [0018] above.]
extract speech from sound collected by the microphone from the determined direction, and [Conliffe, Figure 8, 808, after focusing the microphones on a particular source of audio (including speech) that audio is received.   “[0030] … As discussed above, the systems of FIGS. 3-5 may employ various beamforming techniques, which may be configured to generate one or more directional instructions that may be received by microphones 106 and may be used to focus upon a particular speaker or source of sound….”  See [0036] and Figure 3 for extracting the “desired speech.”]
present the extracted speech on the display panel as single words presented in a temporal series; and [Conliffe, Figure 6, what David said is being presented to the user on the lens of the eye glass.  “[0053] In some embodiments, apparatus 100 may be configured to clarify and enhance the speech signal for the wearer while also improving the accuracy of the speech recognition software running on apparatus 100. Apparatus 100 may provide real time captioning via the displays associated with lenses 110A/B, and may also store audio and/or textual transcriptions of conversations for reference or for querying later.”]
an off-axis projector configured to project the extracted words onto the display panel; 
wherein the display panel comprises a transflective diffuser. 

Conliffe teaches display of the text of the speech using the lens of the eye wear for display: “10 … wherein the at least one lens is configured to display visual feedback to the wearer, the visual feedback including at least one of beam shape, beam direction, and an identified non-wearer of the apparatus.”
Conliffe does not teach the use of an “off-axis projector” or that the display is a “transflective diffuser.”  It leaves out the implementation details of the display lens or the method of projection onto it.
 Sulai teaches:
an off-axis projector configured to project the extracted words onto the display panel; [Sulai, Figure 1 showing that the device of Sulai is an eyeglass like the device of Conliffe. Figures 12A and 12b:  “[0198] As shown in FIGS. 12A and 12B, diffusive display 1214 has an optical axis 1216 (e.g., a central axis) along a normal direction (e.g., the y-axis) to surface 1214-1 that intersects a middle (e.g., center) of diffusive display 1214. In some embodiments, as shown, projector 1210 is disposed at position that is offset from the middle of diffusive display 1214 (e.g., the optical axis 1216 does not intersect with projector 1210, an off-axis position relative to the optical axis 1216 of diffusive display 1214 in one or more directions). As shown in FIG. 12B, projector 1210 may be located to the left or the right of diffusive display 1214, and/or above or below diffusive display 1214.”  “[0242] In some embodiments, (step 2110-A) the one or more projectors are disposed at an off-axis position relative to an optical axis of the display, and the one or more projectors are located less than 2 inches from the display.”]
wherein the display panel comprises a transflective diffuser. [Sulai is directed to a “DISPLAY DEVICE WITH SWITCHABLE DIFFUSIVE DISPLAY AND SEE-THROUGH LENS ASSEMBLY.”  This means that the display is a “transflective diffuser” because it is both transparent and reflective at the same time and also has a light diffusive property:  “A display device includes a display having optically anisotropic molecules disposed between a front surface and a back surface. The display is configurable to either receive image light and diffuse the image light to output diffused image light from the front surface, or to transmit ambient light from the back surface to the front surface. The display device also includes an optical assembly that has a substrate, a reflector, and a beam splitter. The optical assembly is configurable to transmit a portion of the diffused image light at a first optical power via an optical path including reflections at the reflector and at the beam splitter and to transmit a portion of the ambient light output from the front surface of the display at a second optical power without reflection at the reflector. The second optical power is less than the first optical power.”  Abstract.]
Conliffe and Sulai pertain to weareable devices that are worn on the eye (eye glasses) and must simultaneously transmit light so the wearer can see through them and also be used as a display for showing information to the wearer and thus must be reflective of light.   It would have been obvious to use the particular material and method of Sulai in the device of Conliffe which does teach that the lenses of the eyeglass are used as a display medium as well as one type of material and method that can be used in Conliffe which does not specify its material or method of projection.  (See Sulai: “[0072] FIG. 1 illustrates a perspective view of display device 100 in accordance with some embodiments. In some embodiments, display device 100 is configured to be worn on a head of a user (e.g., by having the form of spectacles or eyeglasses, as shown in FIG. 1, or to be included as part of a helmet that is to be worn by the user). When display device 100 is configured to be worn on a head of a user, display device 100 is called a head-mounted display. Alternatively, display device 100 is configured for placement in proximity of an eye or eyes of the user at a fixed location, without being head-mounted (e.g., display device 100 is mounted in a vehicle, such as a car or an airplane, for placement in front of an eye or eyes of the user). As shown in FIG. 1, display device 100 includes display 110. Display 110 is configured for presenting visual contents (e.g., augmented reality contents, virtual reality contents, mixed-reality contents, or any combination thereof) to a user.”)

Conliff and Sulai do not teach the amended feature of “present the extracted speech on the display panel as single words presented in a temporal series,” for which Basson was cited.
Conliffe/Wexler and Basson are related to speech recognition and display of recognized speech to a viewer.  It would have been obvious to combine the word by word display of Basson which is used for teaching (note also the various sing-along applications) with the system of combination to allow explicitly for a word by word display of the recognized speech.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 36 has limitations similar to the limitations of method Claim 2 which are rejected under similar rationale. (Limitation mapped to Conliffe.)
Claim 37 has limitations similar to the limitations of method Claim 3 which are rejected under similar rationale. (Limitation mapped to Conliffe.)
Claim 39 has limitations similar to the limitations of method Claim 5 which are rejected under similar rationale. 
Claim 40 has limitations similar to the limitations of method Claim 7 which are rejected under similar rationale. (Limitation mapped to Conliffe.)
Claim 41 has limitations similar to the limitations of method Claim 8 which are rejected under similar rationale. (Limitation mapped to Conliffe.)
Claim 42 has limitations similar to the limitations of method Claim 9 which are rejected under similar rationale. (Limitation mapped to Conliffe.)
Claim 43 has limitations similar to the limitations of method Claim 10 which are rejected under similar rationale. (Limitation mapped to Conliffe.)
Claim 44 has limitations similar to the limitations of method Claim 11 which are rejected under similar rationale. (Limitation mapped to Conliffe.)
Claim 45 has limitations similar to the limitations of method Claim 12 which are rejected under similar rationale. (Limitation mapped to Conliffe.)
Claim 46 has limitations similar to the limitations of method Claim 13 which are rejected under similar rationale. (Limitation mapped to Conliffe.)
Claim 47 has limitations similar to the limitations of method Claim 14 which are rejected under similar rationale. (Limitation mapped to Conliffe.)

Regarding Claim 48, Conliffe does not teach the use of its glasses a part of VR headset.
Sulai teaches: 
48. The head-wearable apparatus of claim 6, wherein the head-wearable apparatus is an extended reality headset.  [Sulai, Figure 1 showing eyeglasses that can be used to display augmented reality content: “[0072] FIG. 1 illustrates a perspective view of display device 100 in accordance with some embodiments. In some embodiments, display device 100 is configured to be worn on a head of a user (e.g., by having the form of spectacles or eyeglasses, as shown in FIG. 1, or to be included as part of a helmet that is to be worn by the user). When display device 100 is configured to be worn on a head of a user, display device 100 is called a head-mounted display. Alternatively, display device 100 is configured for placement in proximity of an eye or eyes of the user at a fixed location, without being head-mounted (e.g., display device 100 is mounted in a vehicle, such as a car or an airplane, for placement in front of an eye or eyes of the user). As shown in FIG. 1, display device 100 includes display 110. Display 110 is configured for presenting visual contents (e.g., augmented reality contents, virtual reality contents, mixed-reality contents, or any combination thereof) to a user.”]
Conliffe and Sulai pertain to weareable devices that are worn on the eye (eye glasses) and it would have been obvious to use the device of the combination for other suitable purposes such as those indicated by Sulai which has a similar structure and therefore similar usage.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.  (Note also Claim 48 has limitations similar to the limitations of method Claim 15 which were mapped to Wexler.)

Claims 16-18, 21-24, and 26-27 are rejected under 35 U.S.C. 103 as being unpatentable over Conliffe in view of Sulai and Basson.
Claim 16 is a CRM Claim with limitations similar to the limitations of apparatus Claim 6 and is rejected under similar rationale.  Additionally:
16. A non-transitory machine-readable storage medium encoded with instructions executable by a hardware processor of a computing component, the machine-readable storage medium comprising instructions to cause the hardware processor to perform a method for a head-wearable apparatus, [Conliffe, “[0079] … Furthermore, the present disclosure may take the form of a computer program product on a computer-usable storage medium having computer-usable program code embodied in the medium.”]
the method comprising: 
determining a direction of a gaze of a wearer of the head-wearable apparatus; [Conliffe, Figure 8, 802, “tracking an eye of a wearer …”]
collecting sound emanating from the determined direction; [Conliffe, Figure 8, 804 and 806: “adjusting a direction of the microphone, based upon the directional instruction” which is usually coincident with the direction of the eye gaze but can be adjusted by the user.   “[0023] In some embodiments, an identification of the non-wearer of the apparatus may be performed based upon an input from front facing camera 104 and may be based upon the user selecting the non-wearer as the person of interest or the person's whose speech the wearer is interested in focusing upon (e.g. using an eye movement, audible selection, physical button selection, etc.). … In some embodiments, at the wearer's option, a directional instruction may be sent to microphones 106 based upon this identification and selection.”  See also [0058] and [0085].]
extracting speech from the collected sound; [Conliffe teaches filtering and speech recognition which requires extracting the speech from the rest of the sounds:  “[0017] … Some embodiments may include an assistive wearable device that may utilize eye-tracking to direct acoustic beamforming for the purpose of directed speech recognition…”  “[0022] … Microphones 106 may be configured to receive speech input signals from one or more individuals and/or alternative input signal sources within the range of apparatus 100. Some alternative input sources may include, but are not limited to, televisions, radios, cellphones, and/or any other source of sound….”  See also [0023] above.  See Figure 5, “acoustic echo cancellation.”  “[0047] Referring now to FIG. 5, an embodiment of a system 500 configured to implement an acoustic echo cancellation process, which may be associated with wearable apparatus 100 is provided. System 500 may include a number of filters 502, 504….”]
presenting the extracted speech on a display panel of the head-wearable apparatus as single words presented in a temporal series, wherein the display panel is visible to the wearer; and [Conliffe, Figure 6 shows speech recognition on a speech of a particular speaker (David) and presenting it to the wearer of the device on a display of the device.]
projecting the text onto the display panel using an off-axis projector; wherein the display panel is a transflective diffuser. 
Conliffe does not teach using an off-axis projection on a diffusive and transmissive and reflective surface and leaves such details of implementation of its lenses out.
Sulai teaches:
projecting the text onto the display panel using an off-axis projector; wherein the display panel is a transflective diffuser. [Sulai teaches eye glasses using a diffusive partially reflective and partially transmissive (transreflective diffuser) lens to display information to the wearer of the eye glasses and Figures 12A and 12B teach an off-axis projection onto this transreflective diffusive display panel.  See rejection of Claim 6 for more detail.)
Rationale for combination as provided for Claim 6.
Basson teaches “presenting the extracted speech on a display panel … as single words presented in a temporal series,” as provided for Claim 6 and under similar rationale.

Claim 17 has limitations similar to the limitations of method Claim 2 which are rejected under similar rationale. (Limitation mapped to Conliffe.)
Claim 18 has limitations similar to the limitations of method Claim 3 which are rejected under similar rationale. (Limitation mapped to Conliffe.)

Claim 21 has limitations similar to the limitations of method Claim 7 which are rejected under similar rationale. (Limitation mapped to Conliffe.)
Claim 22 has limitations similar to the limitations of method Claim 39 (or 5) which are rejected under similar rationale.

Regarding Claim 23, Conliffe teaches: 
23. The non-transitory machine-readable storage medium of claim 16, the method further comprising: 
providing audio representing the collected sound to an auditory transducer of the head-wearable apparatus, wherein the auditory transducer renders the audio. [Conliffe, Figure 8, 810, [0076] …Process 10 may further include adjusting (806) a direction of the microphone, based upon the directional instruction, receiving (808) an audio signal at the at least one microphone, and providing (810) the audio signal to the wearer using a speaker associated with the frame.”  Figure 1, “speakers 108” teach the “audio transducer.”  This audio could be speech of a particular speaker like David in Figure 6 or music or sounds of sirens or police cars.  See [0058].]

Regarding Claim 24, Conliffe teaches: 
24. The non-transitory machine-readable storage medium of claim 16, the method further comprising: 
providing audio representing the extracted speech to an auditory transducer of the head-wearable apparatus, wherein the auditory transducer renders the audio. [ See Figures 6, 8, [0076] and [0058] and rejection of Claim 23.  The audio provided to the user via the speakers 108 (Figure 1) could be speech or other sounds determined by the user.  See the “conversation modes” shown in Figure 6 for focusing on certain sounds or speech or speaking person.]

Claim 26 has limitations similar to the limitations of method Claim 14 which are rejected under similar rationale. (Limitation mapped to Conliffe.)
Claim 27 has limitations similar to the limitations of method Claim 15 which are rejected under similar rationale. (Limitation mapped to Sulai.)
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Sipolins (U.S. 20200004326):
See [0037].  “[0037] In one or more embodiments, displaying text at the reader's gaze direction includes (at 220) displaying entire single words centered on the reader's gaze direction. In one or more embodiments, displaying text at the reader's gaze direction includes (at 216) displaying a time sequence of words at the reader's gaze direction, interrupting the time sequence of words during movements of the reader's eye, and resuming the time sequence of words between movements of the reader's eye. In one or more embodiments, interrupting the time sequence of words includes identifying a current word at the beginning of a movement of the reader's eye, and resuming the time sequence of words includes displaying the current word after the end of the movement of the reader's eye.”
Gibson (U.S. 20030043196):

    PNG
    media_image5.png
    767
    431
    media_image5.png
    Greyscale


Reber (U.S. 20040155889):

    PNG
    media_image6.png
    278
    366
    media_image6.png
    Greyscale


Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499. The examiner can normally be reached on 9 to 5, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Fariba Sirjani/
Primary Examiner, Art Unit 2659