Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. TW108118259, filed on 05/27/2019.
Drawings
The drawing submitted on 05/26/2020 is being considered by the examiner.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-4 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Conliffe (US 2015/0088500 A1).

Regarding Claim 1, Conliffe teaches: A real time speech translating communication system (Fig.1), comprising a wearable device (wearable communication apparatus 100 ) that includes ([0017] Embodiments provided herein are directed towards a wearable device that may assist the user in communicating with other individuals. Some embodiments may also provide real-time closed captioning, as well as enhanced and amplified audio, to the wearer. [0018] Referring to FIG. 1, there is shown an embodiment depicting a wearable communication apparatus 100 in accordance with the present disclosure. Apparatus 100 may include a frame 102 having numerous components associated therewith. As shown in FIG. 1 some components may include, but are not limited to, front facing camera 104, at least one microphone 106, one or more speakers 108, transparent lenses 110a and 110b, and rearward facing camera (shown in FIG. 2). Frame 102 may include at least one memory and processor onboard that may be configured to communicate with some or all of the components associated with frame 102. Frame 102 may be constructed out of any suitable material and may be configured to be worn on the head of a user as indicated by FIG. 1. [0057] In some embodiments, apparatus 100 may be configured with language packs for localization and translation. For example, a French speaking wearer may receive French captions, a Spanish speaking wearer Spanish captions, etc. These language packs may be paired with translational software so that a French-speaking wearer could listen to a Spanish-speaking person, and receive captioning in French.): a headset (Fig.1, wearable communication apparatus 100) to be worn by a user ([0018] Referring to FIG. 1, there is shown an embodiment depicting a wearable communication apparatus 100 in accordance with the present disclosure. Apparatus 100 may include a frame 102 having numerous components associated therewith. Frame 102 may include at least one memory and processor onboard that may be configured to communicate with some or all of the components associated with frame 102. Frame 102 may be constructed out of any suitable material and may be configured to be worn on the head of a user as indicated by FIG. 1.); an output unit (Speakers 108) mounted on said headset ([0024] In some embodiments, frame 102 may also include one or more speakers 108 as shown in FIG. 1. Speakers 108 may be associated with frame 102 and may be configured to provide an audio signal received at microphone 106 to the wearer. In some embodiments, the audio signals may undergo processing prior to output at speakers 108. Speakers 108 may be included within a headphone that may be in communication with the processor.); an audio recording unit that includes a microphone array (microphones 106) which are mounted on said headset (Fig.1 microphones mounted on headset frame 102) and which includes a plurality of microphones spaced apart (Fig.1, microphones 106 are on the two different side of the frame 102) from one another ([0022] In some embodiments, frame 102 may include one or more microphones 106. Although, two microphones are depicted in FIG. 1 any number of microphones may be used without departing from the scope of the present disclosure. Microphones 106 may be configured to receive speech input signals from one or more individuals and/or alternative input signal sources within the range of apparatus 100.); and a processor (onboard processor in headset 100) disposed in said headset and coupled to said output unit and said audio recording unit ([0022] In some embodiments, frame 102 may include one or more microphones 106. Microphone 106 may also be in communication with the onboard processor and may be configured to receive one or more instructions from the processor. [0024] In some embodiments, frame 102 may also include one or more speakers 108 as shown in FIG. 1. Speakers 108 may be associated with frame 102 and may be configured to provide an audio signal received at microphone 106 to the wearer. In some embodiments, the audio signals may undergo processing prior to output at speakers 108. Speakers 108 may be included within a headphone that may be in communication with the processor.), wherein said processor is programmed to control said microphone array of said audio recording unit to perform unidirectional sound collection, so as to obtain incoming audio data ([0022] Microphones 106 may be configured to receive speech input signals from one or more individuals and/or alternative input signal sources within the range of apparatus 100. Microphone 106 may also be in communication with the onboard processor and may be configured to receive one or more instructions from the processor. [0030] As discussed above, the systems of FIGS. 3-5 may employ various beamforming techniques, which may be configured to generate one or more directional instructions that may be received by microphones 106 and may be used to focus upon a particular speaker or source of sound. System 300 may be associated with one or microphones such as those described above and may be incorporated within apparatus 100. [0031] A beamformer such as beamformer 302, may be configured to process signals emanating from a microphone array to obtain a combined signal in such a way that signal components coming from a direction different from a predetermined wanted signal direction are suppressed. Microphone arrays, unlike conventional directional microphones, may be electronically steerable which gives them the ability to acquire a high-quality signal or signals from a desired direction or directions while attenuating off-axis noise or interference.), perform a speech recognition operation on the incoming audio data to obtain speech data that corresponds with the incoming audio data, perform a translation operation on the speech data to obtain translated data in a predetermined language ([0053] In some embodiments, apparatus 100 may be configured to clarify and enhance the speech signal for the wearer while also improving the accuracy of the speech recognition software running on apparatus 100. [0054] In some embodiments, apparatus 100 may be configured to utilize front-facing camera 104 to recognize the faces of speakers it records. Accordingly, apparatus 100 may associate this data with the speaker's voice-biometric data to generate an identity profile entry for that person, including contact information if available. [0056] In some embodiments, apparatus 100 may be configured to recognize when a person has introduced themselves, using language modeling and speech recognition. [0057] In some embodiments, apparatus 100 may be configured with language packs for localization and translation. For example, a French speaking wearer may receive French captions, a Spanish speaking wearer Spanish captions, etc. These language packs may be paired with translational software so that a French-speaking wearer could listen to a Spanish-speaking person, and receive captioning in French. [0061] The teachings of the present disclosure may also be applied to real-time translations. Accordingly, apparatus 100 may incorporate the speech-recognition engine inside the device into a translation engine. In this way, one user wearing the device could speak English, and another could speak French, and both could read what the other was saying in their selected language. As such, apparatus 100 may function as a universal translator.), and control said output unit to output the translated data ([0024] In some embodiments, frame 102 may also include one or more speakers 108 as shown in FIG. 1. Speakers 108 may be associated with frame 102 and may be configured to provide an audio signal received at microphone 106 to the wearer. In some embodiments, the audio signals may undergo processing prior to output at speakers 108. Speakers 108 may be included within a headphone that may be in communication with the processor. [0026] In some embodiments, the onboard processor may be configured to receive the input signals from microphones 106. The received input signals may be processed and transmitted to speakers 108 for the benefit of the wearer. Additionally and/or alternatively, the onboard processor may convert the received audio signal to text for the wearer to read, thus providing a closed-captioning functionality an example of which is depicted in FIG. 6 discussed below. [0029] In some embodiments, apparatus 100 may include various types of speech recognition software, which may be executed in whole or in part by the onboard processor.).

Regarding Claim 2, Conliffe teaches: The real time speech translating communication system of Claim 1, wherein: said output unit includes a display module (Fig.1, displays associated with lenses 110A/B) mounted on said headset; and said processor is programmed to generate a text from the translated data, and control said display module to display the text (See rejection of claim 1 and [0004] The apparatus may also include at least one lens configured to receive the text results from the processor and to provide the text to the wearer. [0006] In some embodiments, the method may include converting, using the processor, the audio signal to text. The method may also include receiving the text results from the processor and providing the text to the at least one lens. [0026] In some embodiments, the onboard processor may be configured to receive the input signals from microphones 106. The received input signals may be processed and transmitted to speakers 108 for the benefit of the wearer. Additionally and/or alternatively, the onboard processor may convert the received audio signal to text for the wearer to read, thus providing a closed-captioning functionality an example of which is depicted in FIG. 6 discussed below. [0027] FIG. 6 depicts an embodiment of the wearable apparatus that shows a display as viewed by the wearer through lens 110A/B. As discussed above, lenses 110A/B may be configured to receive the text results from the processor and to provide the text to the wearer via a display visible to the wearer. Lenses 110A/B may include transparent or partially transparent screens that allow the user to view their surroundings while also providing the closed-captioning feedback shown in FIG. 6.). 

Regarding Claim 3, Conliffe teaches: The real time speech translating communication system of Claim2, wherein said display module includes: a transparent lens mounted on said headset such that when said headset is worn by the user, said transparent lens is placed in front of the eyes of the user; and an image projecting component (display) that is controlled by said processor to project an image (closed-captioning functionality) that includes the text on said lens (See rejection of claim 2).

Regarding Claim 4, Conliffe teaches: The real time speech translating communication system of Claim 2, wherein said display module includes a transparent display (transparent lens 110A/B) screen that is controlled to display the text (See rejection of claim 2).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Conliffe in view of Gutierrez (US 2020/0125643 A1).

Regarding Claim 5, Conliffe teaches:  The real time speech translating communication system of Claim 2, wherein: said headset to select one of a plurality of pre-stored languages as an input language of the incoming audio data and another one of the plurality of pre-stored languages as the predetermined language (See rejection of claim 1 [0057] In some embodiments, apparatus 100 may be configured with language packs for localization and translation. For example, a French speaking wearer may receive French captions, a Spanish speaking wearer Spanish captions, etc. These language packs may be paired with translational software so that a French-speaking wearer could listen to a Spanish-speaking person, and receive captioning in French. ); and said processor is programmed to perform the speech recognition on the incoming audio data to obtain the speech data in the input language, and perform a translation operation on the speech data to obtain the translated data in the predetermined language (See rejection of claim 1 and [0061] Accordingly, apparatus 100 may incorporate the speech-recognition engine inside the device into a translation engine. In this way, one user wearing the device could speak English, and another could speak French, and both could read what the other was saying in their selected language. As such, apparatus 100 may function as a universal translator.).
Conliffe however does not specifically teach: said headset includes a user input unit for enabling the user to select one of a plurality of pre-stored languages as an input language of the incoming audio data and another one of the plurality of pre-stored languages as the predetermined language.

Gutierrez teaches: a user input unit for enabling the user to select one of a plurality of pre-stored languages as an input language of the incoming audio data and another one of the plurality of pre-stored languages as the predetermined language ([0025] Referring now more specifically to the drawings by numerals of reference, there is shown in FIGS. 1-4, various views of a mobile translation application 100. FIG. 1 shows a mobile translation application 100 during an ‘in-use’ condition 150, according to an embodiment of the present disclosure. Here, the mobile translation application 100 may be beneficial for use by a user 140 to provide communication capabilities, including audio, video, and text translations, in real-time and across different languages by an electronic device. The electronic device may include smart-phone 10, a tablet-computer, a desktop-computer, a smart-television, or other suitable devices. Each user 140 may be able to select a preferred language, and the mobile translation application 100 may be useful for providing user 140 with a platform for audio and visual communications from one location to another. [0027] Speech-to-speech module 110 may include translation capabilities to translate oral speech from a first-language to at least one second-language, and speech-to-text module 115 may include translation capabilities to translate oral speech from the first-language into text in at least one second-language.).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention for Conliffe to include the teaching of Gutierrez above in order to provide a user to select a preferred language with a platform of communication capabilities, including audio, video, and text translations, in real-time and across different languages.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Conliffe in view of Lovitt et al. (US 2020/0134026 A1).
Regarding Claim 6, Conliffe teaches: The real time speech translating communication system of Claim 1, wherein: said output unit includes an audio output module mounted on said headset; and said processor is programmed to generate the translated data, and control said audio output module to output (See rejection of claim 1).
Conliffe does not teach: said processor is programmed to generate a voice file (a generated voice) from the translated data, and control said audio output module to output the voice file.
Lovitt et al. teach: processor is programmed to generate a voice file from the translated data, and control said audio output module to output the voice file ([0031] For instance, if a speaking user speaks a language not understood by the listening user, the listening user will not understand the speaking user when they speak. Thus, the embodiments herein may perform noise cancellation on the speaking user's voice, such that the listening user does not hear the speaking user. While the speaking user's speech is being silenced by noise cancellation, the systems described herein may determine what words the speaking user is saying and may translate those words into the language understood by the listening user. The systems herein may also convert the translated words into speech which is played back into the user's ears via speakers or other sound transducers. In this manner, the listening user's ease of understanding the speaking user may improve significantly. Instead of having one user speak into an electronic device and wait for a translation, the embodiments herein may operate as the speaking user is speaking. Thus, as the speaking user speaks in one language, the listening user hears, in real-time, a generated voice speaking translated words to the listening user. This process may be seamless and automatic. Users may converse with each other, without delays, each speaking and hearing in their own native tongue. [0084] As shown in computing environment 1000 of FIG. 10, the speaking user's words (e.g., in audio input 1006 from speaking user 1007) may be fed to the speech-to-text module 1005 where the words are converted to text or some other digital representation of a word. The translation module 1004 may then use the words in text form to perform the translation from one language to another language. Once the translation has been performed, the text-to-speech module 1003 may convert the written words to speech. That speech may be included in audio output 1002. This audio output 1002 may then be sent to the listening user 1001. Accordingly, some embodiments may use STT and TTS to perform the conversions between speech and text, and back to speech.).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention for Conliffe to include the teaching of Lovitt et al. above in order to provide in real-time, a generated voice speaking translated words to the listening user.

Claims 7-8, are rejected under 35 U.S.C. 103 as being unpatentable over Conliffe in view of Lovitt et al. further in view of Gutierrez.

Regarding Claim 7, Conliffe teaches:  The real time speech translating communication system of Claim 6, wherein: said headset to select one of a plurality of pre-stored languages as an input language of the incoming audio data and another one of the plurality of pre-stored languages as the predetermined language (See rejection of claim 1 [0057] In some embodiments, apparatus 100 may be configured with language packs for localization and translation. For example, a French speaking wearer may receive French captions, a Spanish speaking wearer Spanish captions, etc. These language packs may be paired with translational software so that a French-speaking wearer could listen to a Spanish-speaking person, and receive captioning in French. ); and said processor is programmed to perform the speech recognition on the incoming audio data to obtain the speech data in the input language, and perform a translation operation on the speech data to obtain the translated data in the predetermined language (See rejection of claim 1 and [0061] Accordingly, apparatus 100 may incorporate the speech-recognition engine inside the device into a translation engine. In this way, one user wearing the device could speak English, and another could speak French, and both could read what the other was saying in their selected language. As such, apparatus 100 may function as a universal translator.).
Conliffe however does not specifically teach: said headset includes a user input unit for enabling the user to select one of a plurality of pre-stored languages as an input language of the incoming audio data and another one of the plurality of pre-stored languages as the predetermined language.
Lovitt et al. teach: processor is programmed to generate a voice file from the translated data, and control said audio output module to output the voice file ([0031] For instance, if a speaking user speaks a language not understood by the listening user, the listening user will not understand the speaking user when they speak. Thus, the embodiments herein may perform noise cancellation on the speaking user's voice, such that the listening user does not hear the speaking user. While the speaking user's speech is being silenced by noise cancellation, the systems described herein may determine what words the speaking user is saying and may translate those words into the language understood by the listening user. The systems herein may also convert the translated words into speech which is played back into the user's ears via speakers or other sound transducers. In this manner, the listening user's ease of understanding the speaking user may improve significantly. Instead of having one user speak into an electronic device and wait for a translation, the embodiments herein may operate as the speaking user is speaking. Thus, as the speaking user speaks in one language, the listening user hears, in real-time, a generated voice speaking translated words to the listening user. This process may be seamless and automatic. Users may converse with each other, without delays, each speaking and hearing in their own native tongue. [0084] As shown in computing environment 1000 of FIG. 10, the speaking user's words (e.g., in audio input 1006 from speaking user 1007) may be fed to the speech-to-text module 1005 where the words are converted to text or some other digital representation of a word. The translation module 1004 may then use the words in text form to perform the translation from one language to another language. Once the translation has been performed, the text-to-speech module 1003 may convert the written words to speech. That speech may be included in audio output 1002. This audio output 1002 may then be sent to the listening user 1001. Accordingly, some embodiments may use STT and TTS to perform the conversions between speech and text, and back to speech.).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention for Conliffe to include the teaching of Lovitt et al. above in order to provide in real-time, a generated voice speaking translated words to the listening user.
Conliffe in view of Lovitt et al. do not teach: said headset includes a user input unit for enabling the user to select one of a plurality of pre-stored languages as an input language of the incoming audio data and another one of the plurality of pre-stored languages as the predetermined language.

Gutierrez teaches: a user input unit for enabling the user to select one of a plurality of pre-stored languages as an input language of the incoming audio data and another one of the plurality of pre-stored languages as the predetermined language ([0025] Referring now more specifically to the drawings by numerals of reference, there is shown in FIGS. 1-4, various views of a mobile translation application 100. FIG. 1 shows a mobile translation application 100 during an ‘in-use’ condition 150, according to an embodiment of the present disclosure. Here, the mobile translation application 100 may be beneficial for use by a user 140 to provide communication capabilities, including audio, video, and text translations, in real-time and across different languages by an electronic device. The electronic device may include smart-phone 10, a tablet-computer, a desktop-computer, a smart-television, or other suitable devices. Each user 140 may be able to select a preferred language, and the mobile translation application 100 may be useful for providing user 140 with a platform for audio and visual communications from one location to another. [0027] Speech-to-speech module 110 may include translation capabilities to translate oral speech from a first-language to at least one second-language, and speech-to-text module 115 may include translation capabilities to translate oral speech from the first-language into text in at least one second-language.).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention for Conliffe in view of Lovitt et al. to include the teaching of Gutierrez above in order to provide a user to select a preferred language with a platform of communication capabilities, including audio, video, and text translations, in real-time and across different languages.

Regarding Claim 8, Conliffe teach: The real time speech translating communication system of Claim 7, wherein said audio recording unit further includes a microphone device that is mounted on said headset and that includes a plurality of microphones spaced apart from one another, said output unit includes a speaker, and said processor is further programmed to: control said microphone device of said audio recording unit to perform unidirectional sound collection, so as to obtain outgoing audio data from the user; perform the speech recognition on the outgoing audio data obtained from the user so as to obtain user speech data in the predetermined language; perform a translation operation on the user speech data to obtain translated data in the input language; and control said speaker to output the translated data (See rejection of claim 1).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Conliffe in view of Gabai (US 2019/0028817 A1).

Regarding Claim 9, Conliffe teaches: The real time speech translating communication system of Claim 2, further comprising an image capturing unit that is coupled to said processor, wherein said processor is further programmed to: control said image capturing unit to continuously capture images in front of the user; perform an image processing procedure on the images to detect human faces, and to control said audio recording unit to perform the unidirectional sound collection with respect to a direction toward the human face, so as to obtain the incoming audio data(See rejection of claim 1 and [0021] Additionally and/or alternatively, wearable apparatus 200 may include one or more cameras 202A and 202B (e.g. infrared cameras), which may be associated with frame 102 and in communication with the onboard processor. In some embodiments, infrared cameras 202A and 202B may be configured to track the eye of the wearer of the apparatus. The term "camera" as used herein may refer to its ordinary meaning as well as to any device that may be used to track the movement of an object and/or to provide video with regard to a particular target.  [0031] A beamformer such as beamformer 302, may be configured to process signals emanating from a microphone array to obtain a combined signal in such a way that signal components coming from a direction different from a predetermined wanted signal direction are suppressed. Microphone arrays, unlike conventional directional microphones, may be electronically steerable which gives them the ability to acquire a high-quality signal or signals from a desired direction or directions while attenuating off-axis noise or interference. [0054] In some embodiments, apparatus 100 may be configured to utilize front-facing camera 104 to recognize the faces of speakers it records.).
Conliffe however does not teach: an image capturing unit that is coupled to said processor, wherein said processor is further programmed to: perform an image processing procedure on the images to detect human faces, and to determine whether a human face with lips that are moving is detected; when it is determined that a human face with lips that are moving is detected, control said audio recording unit to perform the unidirectional sound collection with respect to a direction toward the human face, so as to obtain the incoming audio data.

Gabai teaches: an image capturing unit that is coupled to said processor, wherein said processor is further programmed to: control said image capturing unit to continuously capture images in front of the user; perform an image processing procedure on the images to detect human faces, and to determine whether a human face with lips that are moving is detected; when it is determined that a human face with lips that are moving is detected, control said audio recording unit to perform the unidirectional sound collection with respect to a direction toward the human face, so as to obtain the incoming audio data ([0108] Beamforming alteration procedure 500 may be performed by a directional hearing aid apparatus such as any of the devices described above with reference to FIGS. 1 to 3. Particularly by a controller or CPU operative to execute instructions of a software program implementing beamforming alteration procedure 500. Such processor may be similar to central control unit 150 described above with reference to FIG. 1. [0110] A shown in FIG. 5, beamforming alteration procedure 500 may be executed following the setting of the beam to a first direction (direction of reference) as described with reference to FIG. 4. Beamforming alteration procedure 500 may then start with step 510 by detecting head motion of the user using the hearing aid device. [0111] This head motion may be detected, for example, using camera 135 as shown and described with reference to FIG. 1, and/or using upward-facing cameras 235 of collar-mounted unit 220 as shown and described with reference to FIG. 2. [0112] Step 510 eventually determines that the head of the user is directed in a second direction. For example, apparatus 100 of FIG. 1 detects a motion of the user's head via one or more images captured via camera 135. [0113] Beamforming alteration procedure 500 may then proceed to step 520 to detect a human figure in the second direction. For example, in apparatus 100 of FIG. 1, the second direction is covered by a wide-angle front-facing camera 130, allowing apparatus 100 to determine the presence of a human figure in the second direction based on images captured via camera 130. [0116] If step 520 detects a human figure in the second direction, then beamforming alteration procedure 500 may proceed to step 540 to detect lip motion of the human figure detected in the second direction. [0117] For example, in apparatus 100 of FIG. 1, the second direction is covered by a wide-angle front-facing camera 130, allowing apparatus 100 to detect lip motion of the human figure in the second direction based on images captured via camera 130. [0119] If step 540 detects lip motion of the human figure in the second direction, then beamforming alteration procedure 500 may proceed to step 550 to detect a human voice coming from the second direction. For example, hearing aid 100 may use input signal received via an array of microphones 141-144 to detect human voice coming from the second direction. [0121] If step 550 detects human voice in the second direction, then beamforming alteration procedure 500 may proceed to step 560, to alter the beamforming setting, typically by directing the beam towards the second direction.).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention for to include the teaching of Gabai above in order to produce a directional acoustic beam by directing the microphones array at a direction to detect human voice.

Claims10-11, and 13, are rejected under 35 U.S.C. 103 as being unpatentable over Conliffe in view of Gabai further in view of Kim et al.(Us 2014/0362253 A1).

Regarding Claim10, Conliffe teach: The real time speech translating communication system of Claim 2, further comprising an image capturing unit that is coupled to said processor, wherein said processor is further programmed to: control said image capturing unit to continuously capture images in front of the user; perform an image processing procedure on the images to detect human faces, and to control said audio recording unit to perform the unidirectional sound collection with respect to a direction toward the human face, so as to obtain the incoming audio data(See rejection of claim 1 and [0021] Additionally and/or alternatively, wearable apparatus 200 may include one or more cameras 202A and 202B (e.g. infrared cameras), which may be associated with frame 102 and in communication with the onboard processor. In some embodiments, infrared cameras 202A and 202B may be configured to track the eye of the wearer of the apparatus. The term "camera" as used herein may refer to its ordinary meaning as well as to any device that may be used to track the movement of an object and/or to provide video with regard to a particular target.  [0031] A beamformer such as beamformer 302, may be configured to process signals emanating from a microphone array to obtain a combined signal in such a way that signal components coming from a direction different from a predetermined wanted signal direction are suppressed. Microphone arrays, unlike conventional directional microphones, may be electronically steerable which gives them the ability to acquire a high-quality signal or signals from a desired direction or directions while attenuating off-axis noise or interference.  [0051] In some embodiments, the eye tracking capabilities discussed herein may include allowing the beam to follow the eye position of the wearer and/or to allow the wearer to use his/her eyes as a pointing device. For example, a wearer may focus the acoustic beam on an individual, may look at them and also "signal" that that person is the target (e.g., after the wearer looks at the person, by looking above them and below them. [0054] In some embodiments, apparatus 100 may be configured to utilize front-facing camera 104 to recognize the faces of speakers it records. [0056] In some embodiments, apparatus 100 may be configured to recognize when a person has introduced themselves, using language modeling and speech recognition. In this way, apparatus 100 may prompt the wearer to edit or save this information alongside the speaker's paired face/voice identity profile. For example, the voice assistant may remind the wearer via speakers 108 or the reminder may be presented in text via the display associated with lenses 110A/B.). 
Gabai teaches: The real time speech translating communication system of Claim 9, wherein said processor is further programmed to: from each of the images captured by said image capturing unit, generate an edited image (detect lip motion by the human figure in front of the user based on one or more images captured via camera 130) that focuses on the human face with lips that are moving (See rejection of Claim 9 and [0055] External unit 120 may include two or more cameras typically arranged in the perimeter of external unit 120, where the two or more cameras are pointing in different directions. For example, as shown in FIG. 1, for example, a first camera 130 is preferably front-facing and operative to capture images of one or more persons in front of the user, preferably providing a wide angle frame. A second camera 135 is preferably upward-facing and/or backwards-facing and is operative to capture images of the user's own head and/or motion thereof. Preferably, external unit 120 also comprises one or more cameras facing backwards and/or to the sides and allowing the capture of images of persons in different directions from the user. [0080] As described above with reference to external unit 120, central control unit 250 may receive input from the cameras and/or microphones. The CPU, and/or the central control unit 250, and/or the software program may use images received via any of the cameras to detect images of human objects, detect the head of such human objects, and detect lip motion of such human objects. [0081] The CPU, and/or the central control unit 250, and/or the software program may be operative to detect human voices received via the microphones. The CPU may be operative to detect the direction in which the user is looking based in input via camera 235. Thus based on the inputs, central control unit 250 is operative to determine the direction from which comes the voice that the user is most likely to be interested in hearing. The CPU is then operative to set the beamforming for microphone array, thereby to extract the voice and to filter out ambient noise. [0127] Apparatus 100 then detects a human figure in front of the user based on one or more images captured via camera 130. [0128] For example, apparatus 100 is operative to detect lip motion by the human figure in front of the user based on one or more images captured via camera 130.).
Conliffe in view of Gabai do not teach: control said display module to display the edited images generated respectively from the images captured by said image capturing unit.
Kim et al. teach: control said display module to display the edited images generated respectively from the images captured by said image capturing unit ([0074] Herein, the object to apply the beamforming can be selected using an image captured by an image sensor of the electronic device, or using voice recognition. For example, referring to FIG. 3, at least one object in a displayed image can be selected using hovering or touch. For example, a particular face in the displayed image may be recognized and automatically selected. For example, a lip motion of the person in the displayed image may be automatically selected using recognition or lip recognition. For example, the face recognition may trace a particular person and perform the beamforming based on the particular person, the lip motion recognition may trace the speaker and perform the beamforming based on the speaker, and the lip recognition may recognize the lips in the face and enhance the beamforming performance by measuring the accurate distance to the lips from which the sound of the object is produced. [0076] For example, the object can be selected using all of the voice recognition, the face recognition, and the lip recognition. For example, the beamforming can be performed by recognizing a particular person in the displayed image and tracing the lip motion of the particular person. [0087] Referring to FIG. 3A, an electronic device 350 can display an image through the image sensor, and the user can touch one object 300 in the image. For example, a particular face 300 may be recognized and automatically selected in the displayed image. For example, the beamforming can be performed by recognizing a particular person in the displayed image and tracing the lip motion of the particular person.).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Conliffe in view of Gabai to include the teaching of Kim et al. above in order to perform beamforming on a particular person based on tracing lip motion of the particular person in the displayed image captured by an image sensor of the electronic device.

Regarding Claim 11: The real time speech translating communication system of Claim 10, further comprising a user input unit, wherein said processor is further programmed to: when it is determined that a plurality of human faces with lips that are moving are detected, select one of the human faces as a target speaker, and perform the unidirectional sound collection with respect to a direction toward the target speaker, so as to obtain the incoming audio data; in response to receipt of a user command (selecting an image using touch) associated with a selected one of the human faces from said user input unit, make the selected one of the human faces serve as the target speaker, and perform the unidirectional sound collection with respect to a direction toward the target speaker, so as to obtain the incoming audio data (See rejection of claim 10).

Regarding Claim 13: The real time speech translating communication system of Claim 9, further comprising a user input unit, wherein said processor is further programmed to: when it is determined that a plurality of human faces with lips that are moving are detected, select one of the human faces as a target speaker, and perform the unidirectional sound collection with respect to a direction toward the target speaker, so as to obtain the incoming audio data; in response to receipt of a user command associated with a selected one of the human faces from said user input unit, make the selected one of the human faces serve as the target speaker, and perform the unidirectional sound collection with respect to a direction toward the target speaker, so as to obtain the incoming audio data (See rejection of claim 11).

11.	Claim 12 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Conliffe in view of Gabai further in view of Kim et al. further in view of Vilkamo (US 2019/0132674 A1).
Regarding Claim 12, Conliffe in view of Gabai further in view of Kim et al. teach: The real time speech translating communication system of Claim 10, further comprising a handheld device includes a touchscreen that is configured to display the images captured by said image capturing unit and enable the user to input a user command (user touch selection of an image) associated with a selected one of the human faces; wherein, in response to receipt of the user command, said processor is programmed to make the selected one of the human faces serve as the target speaker, and perform the unidirectional sound collection with respect to a direction toward the target speaker, so as to obtain the incoming audio data (See Kim et al. teaching in the rejection of claim 10).
Conliffe in view of Gabai further in view of Kim et al. do not teach: The real time speech translating communication system of Claim 10, further comprising a handheld device communicating with said wearable device.

Vikamo teaches: a handheld device communicating with said wearable device ([0006] The Nokia VR Audio format, for which the methods described herein are relevant, is defined specifically for VR use. The SPAC metadata itself is transmitted alongside a set of audio channels obtained from microphone signals. The SPAC decoding takes place at the receiver end to the given setup, being loudspeakers or headphones.[0224] For example in some embodiments the device 1200 is a mobile device, user equipment, tablet computer, computer, audio playback apparatus, etc. [0235] Furthermore the device 1200 may comprise an audio subsystem output 1215. An example as shown in FIG. 9 the audio subsystem output 1215 is an output socket configured to enabling a coupling with headphones. However the audio subsystem output 1215 may be any suitable audio output or a connection to an audio output. For example the audio subsystem output 1215 may be a connection to a multichannel speaker system. [0236] In some embodiments the digital to analogue converter 1213 and audio subsystem 1215 may be implemented within a physically separate output device. For example the DAC 1213 and audio subsystem 1215 may be implemented as cordless earphones communicating with the device 1200 via the transceiver 1209.).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Conliffe in view of Gabai further in view of Kim et al.  to include the teaching of Vilkamo above in order to provide output from the audio subsystem to a physically separate output device such as headphones.

Regarding Claim 14: The real time speech translating communication system of Claim 9, further comprising a handheld device communicating with said wearable device, wherein said handheld device includes a touchscreen that is configured to display the images captured by said image capturing unit and enable the user to input a user command associated with a selected one of the human faces; wherein, in response to receipt of the user command, said processor is programmed to make the selected one of the human faces serve as the target speaker, and perform the unidirectional sound collection with respect to a direction toward the target speaker, so as to obtain the incoming audio data (See rejection of Claim 12).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art of record Zabetian (US 2019/0056908 A1) teach: [0007] In one embodiment, a method may include receiving, by a server, audio signals data associated with a live presentation from a first electronic device associated with one or more panelists, wherein a language of content corresponding to the audio signals data is in a source language. The method may further include receiving, by the server, a request from a second electronic device associated with a user, wherein the request comprises a first target language selected by the user through a first graphical user interface. The method may further include selecting, by the server, a first interpreter based on the source language and the target language, wherein the server displays a second graphical user interface on a third electronic device associated with the first interpreter requesting the first interpreter to input an incoming language and an outgoing language, and wherein the server selects the first interpreter in response to the incoming language matching the source language and the outgoing language matching the first target language. The method may further include transmitting, by the server, the audio signals data to the third electronic device, wherein the first interpreter translates the audio signals data from the source language to the first target language on the third electronic device in real time. The method may further include receiving, by the server, the audio signals data in the first target language from the third electronic device. The method may further include transmitting, by the server, the audio signals data in the first target language to the second electronic device associated with the user. When the user selects a second target language during the live presentation, the method may further include selecting, by the server, a second interpreter based on the source language and the second target language, wherein the server displays a third graphical user interface on a fourth electronic device associated with the second interpreter requesting the second interpreter to input the incoming language and the outgoing language, wherein the server selects the second interpreter in response to the incoming language matching the source language and the outgoing language matching the second target language, and wherein the second interpreter translates the audio signals data from the source language to the second target language in real time. The method may further include transmitting, by the server, the audio signals data in the second target language to the second electronic device associated with the user. [0008] In another embodiment, a system may include a first electronic device being operated by one or more panelists, a second electronic device being operated by a user, and a server connected to the first electronic device and the second electronic device via one or more networks. The server is configured to receive audio signals data associated with a live presentation from the first electronic device, wherein a language of content corresponding to the audio signals data is in a source language; receive a request from the second electronic device, wherein the request comprises a first target language selected by the user through a first graphical user interface; select a first interpreter based on the source language and the target language, wherein the server displays a second graphical user interface on a third electronic device associated with the first interpreter requesting the first interpreter to input an incoming language and an outgoing language, and wherein the server selects the first interpreter in response to the incoming language matching the source language and the outgoing language matching the first target language; transmit the audio signals data to the third electronic device, wherein the first interpreter translates the audio signals data from the source language to the first target language on the third electronic device in real time; receive the audio signals data in the first target language from the third electronic device; transmit the audio signals data in the first target language to the second electronic device associated with the user; when the user selects a second target language during the live presentation, select a second interpreter based on the source language and the second target language, wherein the server displays a third graphical user interface on a fourth electronic device associated with the second interpreter requesting the second interpreter to input the incoming language and the outgoing language, wherein the server selects the second interpreter in response to the incoming language matching the source language and the outgoing language matching the second target language, and wherein the second interpreter translates the audio signals data from the source language to the second target language in real time; and transmit the audio signals data in the second target language to the second electronic device associated with the user. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878. The examiner can normally be reached Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2656