DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to the amendment filed June 26, 2022.  Claims 1, 5-6, and 11-15 have been amended.   Claims 16-20 have been added.

Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-4, 6-13, and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Sato et al (WO2012161089) in view of Konno et al (US Patent No. 8,666,106) in view of Chan et al (KR 20120091625) in view of Sokoloff (US Patent No. 6,567,677) in view of Dair (US Patent No. 10,154,344).
Sato discloses a teleconference device.  Regarding claim 1, Sato teaches a computing device [para 0021-0022 – video conference device] comprising: a display [para 0029 – display 35]; a speaker array [para 0014; 0027 -- three or more speakers 34 are arranged in a horizontal direction and form a speaker array] operatively connected to the display, wherein the  speaker array is to focus audio output to a localized area adjacent to the display [para 0014; 0028 – sound signal is localized in the sound image direction; para 0086]; a camera operatively connected to the display [para 0014 – camera]. 
Sato fails to specifically teach the speaker array is a parametric speaker array arranged on a rotatable bar.  Sokoloff teaches a speaker arranged on a rotatable portion of an electronic device (col. 3, line 46 to col. 4, line 17) and specifically teaches the arrangement provides increased directability of speaker 105, and also increases the usefulness of camera 140.    One having ordinary skill in the art at the time of the invention would have recognized the advantages of implementing the speaker and rotatable portion arrangement, suggested by Sokoloff, for the purpose of increasing directability of the speaker, as suggested by Sokoloff.  Konno teaches a sound reproducing device including parametric speaker (Fig. 1, element 13; col. 2, lines 36-40) to reduce the possibility of others near-by to hear the produced sound (col. 2, lines 4-6).  One having ordinary skill at the time of the invention would have recognized the advantages of implementing the parametric speakers suggested by Konno, in the system of Sato, for the purpose of increasing conversation privacy by reducing the possibility of others near-by to hear the produced sound, as suggested by Konno.  
Sato fails to teach the camera is set to capture lip movements of a user in the localized area; and a processor to convert the lip movements into text and speech.    Chan teaches a system for speech recognition [para 0036-0050-- Gaussian mixture model classifies phonemes corresponding to the three-dimensional coordinate values of the minutiae points of the stereo image calculated by the three-dimensional coordinate calculation unit 15f using a Gaussian Mixture Model] using 3D real time lip feature points [para 0011-0017 –image processing unit…lip region extraction] based on a stereo camera [para 0028; 0032 – stereo camera 22; 0037] to generate text or speech [para 0051].  One having ordinary skill in the art at the time of the invention would have recognized the advantages of implementing the lip movement processing and conversion techniques suggested by Chan, in the system of Sato, for the purpose of providing a smart interface for communication assistance in noisy environments [para 0002] or for patients with speech impairments and the elderly [para 0021], as suggested by Chan.  
Sato fails to teach utilizing the processor to control operation of the rotatable bar.  Dair teaches an audio system for a display that provides an audio accessory of various shapes and sizes that have movement that are controlled wirelessly or by non-audio tones (Figs 26(A-C), 27C, 28, 29(A-C); col. 13, lines 5-55; col. 14, lines 8-21).   One having ordinary skill in the art would have recognized the advantages of implementing controlled audio accessory movement techniques suggested by Dair, for the purpose of enhancing the user’s experience with the audio/visual system or providing the speaker closer to the user’s ear level (col. 14, lines 10-11).

Regarding claim 2, the combination of Sato, Konno, Chan, Sokoloff, and Dair teaches a microphone [para 0023] to perform directional voice detection [Sato at para 0014 – sound direction information; para 0024; para 0118] and ambient noise reduction from the localized area [Sato at para 0067 –background noise suppression; para 0072 – noise removal; para 0113-echo cancelling]; 
Regarding claim 3, the combination of Sato, Konno, Chan, Sokoloff, and Dair teaches camera comprises a three- dimensional (3D) stereoscopic camera [Chan at para 0028; 0032 – stereo camera 22; 0037].  
Regarding claim 4, the combination of Sato, Konno, Chan, Sokoloff, and Dair teaches the parametric speaker array comprises a first speaker and a second speaker positioned on the display [Sato para 0115-0116 – speaker array at the lower portion of the display], and wherein the camera is positioned on the display [Sato at para 0015; 0022—video conference apparatus having functions of encoding side and decoding side in a single terminal—where the camera is an element of the encoding side].  
Regarding claim 6, .Sato teaches an electronic device [ig.1; para 0021-022 – video conference device] comprising: a display [para 0029 – display 35]; a speaker array [para 0014; 0027 -- three or more speakers 34 are arranged in a horizontal direction and form a speaker array] operatively connected to the display, wherein the parametric speaker array is to focus audio output to a localized area adjacent to the display [para 0014; 0028 – sound signal is localized in the sound image direction; para 0086]; wherein the localized area is set to accommodate a user [para 0120 – localized sound image]; a camera operatively connected to the display [para 0014 – camera]. 
a microphone array attached to the display [para 0114-0118 – two microphones], wherein the microphone array is to receive audio input from within the localized area and perform directional voice detection [para 0014 – sound direction information; para 0024; para 0114-0118] and ambient noise reduction from the localized area [Sato at para 0067 –background noise suppression; para 0072 – noise removal;  para 0113-echo cancelling]; 
Sato fails to specifically teach the speaker array is a parametric speaker array arranged on a rotatable bar.  Sokoloff teaches a speaker arranged on a rotatable portion of an electronic device (col. 3, line 46 to col. 4, line 17) and specifically teaches the arrangement provides increased directability of speaker 105, and also increases the usefulness of camera 140.    One having ordinary skill in the art at the time of the invention would have recognized the advantages of implementing the speaker and rotatable portion arrangement, suggested by Sokoloff, for the purpose of increasing directability of the speaker, as suggested by Sokoloff.  Konno teaches a sound reproducing device including parametric speaker (Fig. 1, element 13; col. 2, lines 36-40) to reduce the possibility of others near-by to hear the produced sound (col. 2, lines 4-6).  One having ordinary skill at the time of the invention would have recognized the advantages of implementing the parametric speakers suggested by Konno, in the system of Sato, for the purpose of increasing conversation privacy by reducing the possibility of others near-by to hear the produced sound, as suggested by Konno.  
Sato fails to teach a camera array attached to the display, wherein the camera array is to 8592528727 detect lip movements of the user; and a processor to identify speech patterns from the lip movements detected by the camera array and from the audio input received by the microphone array; transcribe the speech patterns into text; and transmit the text and audio input from the localized area. 
Chan teaches a system for speech recognition [para 0036-0050-- Gaussian mixture model classifies phonemes corresponding to the three-dimensional coordinate values of the minutiae points of the stereo image calculated by the three-dimensional coordinate calculation unit 15f using a Gaussian Mixture Model] using 3D real time lip feature points [para 0011-0017 –image processing unit…lip region extraction] based on a 2 stereo cameras [para 0028 – 2 cameras; 0032 – stereo camera 22; 0037] to generate text or speech [para 0051].  One having ordinary skill in the art at the time of the invention would have recognized the advantages of implementing the lip movement processing and conversion techniques suggested by Chan, in the system of Sato, for the purpose of providing a smart interface for communication assistance in noisy environments [para 0002] or for patients with speech impairments and the elderly [para 0021], as suggested by Chan.  
Sato fails to teach utilizing the processor controls operation of the rotatable bar.  Dair teaches an audio system for a display that provides an audio accessory of various shapes and sizes that have movement that are controlled wirelessly or by non-audio tones (Figs 26(A-C), 27C, 28, 29(A-C); col. 13, lines 5-55; col. 14, lines 8-21).   One having ordinary skill in the art would have recognized the advantages of implementing controlled audio accessory movement techniques suggested by Dair, for the purpose of enhancing the user’s experience with the audio/visual system or providing the speaker closer to the user’s ear level (col. 14, lines 10-11).
Regarding claim 7, the combination of Sato, Konno, Chan, Sokoloff, and Dair teaches parametric speaker array comprises: a first speaker positioned on the display [Sato para 0115-0116 – speaker array at the lower portion of the display]; and a second speaker positioned on the display [Sato para 0115-0116 – speaker array at the lower portion of the display], wherein the first speaker and the second speaker are selectively positioned to generate a sound lobe containing the localized area [Sato para 0014; 0028 – sound signal is localized in the sound image direction; para 0086-0088; para 0120];  and wherein the audio output outside of the sound lobe is diminished compared with the audio output within the sound lobe [Chan’s parametric speaker functionality].  
Regarding claim 8, the combination of Sato, Konno, Chan, Sokoloff, and Dair teaches the camera array comprises: a first camera positioned on the display; and a second camera positioned on display [Sato at para 0015; 0022—video conference apparatus having functions of encoding side and decoding side in a single terminal—where the camera is an element of the encoding side, in combination with the multiple cameras provided by Chan], wherein the first camera and the second camera are selectively positioned to collectively capture the lip movements from different angles [Chan para 0028; para 0036-0038 – acquired parallax images].  
Regarding claim 9, the combination of Sato, Konno, Chan, Sokoloff, and Dair teaches the camera array is to capture a three-dimensional (3D) rendering of the user [Chan para 0010-0011; para 0036-0038 – image processing unit for 3-D coordinate values].  
Regarding claim 10, the combination of Sato, Konno, Chan, Sokoloff, and Dair teaches a first microphone positioned on the display [Sato para 0114-0118 – two microphones--- arranged at lower portion of the display or upper part of display],  and a second microphone positioned on the display [Sato para 0114-0118 – two microphones--- arranged at lower portion of the display or upper part of display], wherein the first microphone and the second microphone are selectively positioned to receive the audio input from within the localized area [Sato para 0118 – directional microphone] and filter audio detected from outside the localized area [Sato at para 0067 –background noise suppression; para 0072 – noise removal; para 0113-echo cancelling].
Regarding claim 11. Sato teaches a machine-readable storage medium comprising computer-executable [para 0121] instructions that when executed cause a processor of a computing device Sato teaches a computing device [para 0021-0022 – video conference device] to: control a speaker to constrain audio output to a localized area adjacent to the computing device[para 0014; 0027 -- three or more speakers 34 are arranged in a horizontal direction and form a speaker array; para 0028 – sound signal is localized in the sound image direction; para 0086];  a camera operatively connected to the display [para 0014 – camera].
Sato fails to specifically teach the speaker array is a parametric speaker array arranged on a rotatable bar.  Sokoloff teaches a speaker arranged on a rotatable portion of an electronic device (col. 3, line 46 to col. 4, line 17) and specifically teaches the arrangement provides increased directability of speaker 105, and also increases the usefulness of camera 140.    One having ordinary skill in the art at the time of the invention would have recognized the advantages of implementing the speaker and rotatable portion arrangement, suggested by Sokoloff, for the purpose of increasing directability of the speaker, as suggested by Sokoloff.  Konno teaches a sound reproducing device including parametric speaker (Fig. 1, element 13; col. 2, lines 36-40) to reduce the possibility of others near-by to hear the produced sound (col. 2, lines 4-6).  One having ordinary skill at the time of the invention would have recognized the advantages of implementing the parametric speakers suggested by Konno, in the system of Sato, for the purpose of increasing conversation privacy by reducing the possibility of others near-by to hear the produced sound, as suggested by Konno.  
 Sato fails to teach controlling the camera to capture lip movements of a user in the localized area; and convert the lip movements into text and speech.  
Chan teaches a system for speech recognition [para 0036-0050-- Gaussian mixture model classifies phonemes corresponding to the three-dimensional coordinate values of the minutiae points of the stereo image calculated by the three-dimensional coordinate calculation unit 15f using a Gaussian Mixture Model] using 3D real time lip feature points [para 0011-0017 –image processing unit…lip region extraction] based on a 2 stereo cameras [para 0028 – 2 cameras; 0032 – stereo camera 22; 0037] to generate text or speech [para 0051].  One having ordinary skill in the art at the time of the invention would have recognized the advantages of implementing the lip movement processing and conversion techniques suggested by Chan, in the system of Sato, for the purpose of providing a smart interface for communication assistance in noisy environments [para 0002] or for patients with speech impairments and the elderly [para 0021], as suggested by Chan.  
Sato fails to teach utilizing the processor to control operation of the rotatable bar.  Dair teaches an audio system for a display that provides an audio accessory of various shapes and sizes that have movement that are controlled wirelessly or by non-audio tones (Figs 26(A-C), 27C, 28, 29(A-C); col. 13, lines 5-55; col. 14, lines 8-21).   One having ordinary skill in the art would have recognized the advantages of implementing controlled audio accessory movement techniques suggested by Dair, for the purpose of enhancing the user’s experience with the audio/visual system or providing the speaker closer to the user’s ear level (col. 14, lines 10-11).

Regarding claim 12, the combination of Sato, Konno, Chan, Sokoloff, and Dair teaches the processor to compare the lip movements with previously received lip movements to improve an accuracy of a transcription of captured audio by using artificial intelligence to generate any of the text and the speech [Chan para 0045-0051 – Active Representation Model utilizing a learning database and Hidden Markov Model].  
Regarding claim 13, the combination of Sato, Konno, Chan, Sokoloff, and Dair teaches  controlling a microphone to receive the captured audio from the localized area [Sato at para 0014 – sound direction information; para 0024; para 0118 – directional microphone]; and generate text comprising the transcription of the captured audio [Chan para 0051].  
Regarding claim 15, the combination of Sato, Konno, Chan, Sokoloff, and Dair teaches reduce a volume of the captured audio required to generate any of the text and the speech [Konno col 9, lines 39-41].
Regarding claims 16 and 18, the combination of Sato, Konno, Chan, Sokoloff, and Dair teaches the rotatable bar is embedded into a housing of the display (Sato teaches an embedded bar at Fig. 17/18; para 117-118 – in combination with Sokoloff’s rotatable portion arrangement at col. 3, line 46 to col. 4, line 17).
Regarding claims 17 and 19, the combination of Sato, Konno, Chan, Sokoloff, and Dair teaches the rotatable bar is attached to a top of the display (Sato teaches bar at the top at Fig. 17; para 117--- in combination with Sokoloff’s rotatable portion arrangement at col. 3, line 46 to col. 4, line 17).
Regarding claim 20, the combination of Sato, Konno, Chan, Sokoloff, and Dair teaches instructions for the processor to control automatic operations of the rotatable bar without the need of user intervention by programming the processor with controlling instructions to operate the rotatable bar (Dair teaches an audio system for a display that provides an audio accessory of various shapes and sizes that have movement that are controlled wirelessly or by non-audio tones (Figs 26(A-C), 27C, 28, 29(A-C); col. 13, lines 5-55; col. 14, lines 8-21).   


Claim 5  is rejected under 35 U.S.C. 103 as being unpatentable over Sato in view of Konno in view of Chan in view of Sokoloff in view of Dair, as applied to claim 2 above, and further in view of Posa (US Patent Application Publication No. 2013/0322160).
Regarding claim 5, Sato, Konno and Chan fail to teach the camera, and the microphone are arranged on the rotatable bar.   In a similar field of endeavor, Posa teaches a device with lip reading and eye tracking capabilities to generate speech into text, where the camera and microphone elements can be positioned on a flip out or extendable arm [para 0025].  One having ordinary skill in the art at the time of the invention would have recognized the advantages of implementing the flip out/extendable arm configuration suggested by Posa, in the system of Sato, to allow the user to position the camera/microphone/speaker in an optimal position to ensure quality audio/video is captured and/or output and to ensure communications via the device are private.


Claim 14  is rejected under 35 U.S.C. 103 as being unpatentable over Sato in view of Konno in view of Chan in view of Sokoloff in view of Dair ,as applied to claim 13 above, and further in view of Haughay (US Patent Application Publication No. 2014/0188471).
Regarding claim 14, Sato, Konno, Chan, Sokoloff, and Dair fail to teach the processor operates to: identify a voice associated with the captured audio; and control operations of the computing device based on an identification of the voice.   Haughay teaches a system for user profiling for voice input processing which performs user/speaker identification via voice prints [para 0008; 0025] and allows the user to operate device by recognizing voice commands [para 0037].  One having ordinary skill in the art would have recognized the advantages of implementing the voice identification and device operation techniques suggested by Haughay, in the system of Sato, for the purpose of allowing the user to securely access and use the device in a hands-busy environment.

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the arguments do not apply to the new combination of references cited in the new ground of rejection.



Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANGELA A ARMSTRONG whose telephone number is (571)272-7598. The examiner can normally be reached M,T,TH,F 11:30-8:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ANGELA A. ARMSTRONG
Primary Examiner
Art Unit 2659



/ANGELA A ARMSTRONG/Primary Examiner, Art Unit 2659