DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .





Information Disclosure Statement
The information disclosure statement (IDS) submitted on April 22, 2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.










Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim 16-23, 25-29, 31 and 35 is/are rejected under 35 U.S.C. 103 as being unpatentable over Burenius (US 2016/0057385 A1) in view of Nimri et al. (US 2010/0118112 A1). 
As to claim 16. Burenius discloses a method performed at an electronic device [FIG.4] having an image sensor [Camera 112 on FIG. 4], a plurality of microphones [Microphones 118 on FIG. 4], a display [Display 114 on FIG. 4], one or more processors [Processor 444 on FIG. 4], and memory [Memory 448 on FIG. 4][paragraph 0001], the method comprising: 
obtaining audio signals by concurrently receiving audio data at each microphone of the plurality of microphones [As depicted in the example of FIG. 2, microphone array 118 is positioned adjacent to, and centered along, a bottom side of display 114 (i.e., below the display) so as to receive audio from participants 106′ and 106 in room 204. paragraphs 0013 and 0037]; 
determining based on the obtained audio signals that a person is speaking in a vicinity of the electronic device [If endpoint 104 determines that audio originates from one of the audio search regions, the endpoint declares this an “active” audio search region. paragraphs 0015 and 0038];
 obtaining video data from the image sensor [Video conference endpoint 104 may use face detection techniques to detect faces and associated positions thereof of participants 106 captured in the video to facilitate optimal video framing in each of FOVs 212M, 212R, and 212L. paragraph 0014]; 
determining based on analysis of the video data that the person is not within a field of view of the image sensor [Video conference endpoint 104 pans, tilts, and zooms video cameras 112 (in the mechanical or digital domain) as necessary to frame best views of 106 and 106′ in respective main and side FOVs 212M, and 212L and 212R. paragraphs 0014 and 0032]; 
reorienting the electronic device based on differences in the audio data received at respective microphones of the plurality of microphones [Automatically switches between the different FOVs in order to capture video of participants 106 and 106′ therein that are actively speaking (i.e., to capture video of the active speakers in the different FOVs). paragraph 0015]; after 
reorienting the electronic device, obtaining second video data from the image sensor and determining from the second video data that the person is within the field of view of the image sensor, wherein the determining that the person is within the field of view comprises identifying the person based on analysis of video data from the image sensor [Controller 408 initializes cameras, i.e., commands the cameras to initial pan, tilt, and zoom settings to establish main camera FOV 212M and one or more of left and right-side FOVs 212R and 212L so as to capture videos of scenes in the corresponding FOVs. paragraphs 0029-0031]; and 
attending to the person by directing the display toward the person [Techniques presented herein automatically switch between camera FOVs to capture talking participants positioned next to (i.e., on the side of) a video display and a microphone array of a video conference system and positioned in-front of the display and the microphone array based on audio from the participants. paragraph 0047].
 Burenius fails to disclose identifying the person based on audio signal.
However, Nimri teaches identifying the person based on the obtained audio signals [The microphone arrays can be adapted to detect the voice of a local participant, and produce audio signals representing the voice. paragraphs 0042-0043].
 would have found obvious using the microphones detection of the local participant detection system taught by Nimri in an automatic switching of camera system such as that of Burenius as suggested by Nimri, for the obvious purpose of combining the audio signals and the video signals received from the cameras and microphones and transmits the combined signal to the remote participants, by combining prior art elements according to known methods to yield predictable results. 

As to claim 17. Burenius discloses the method of claim 16, wherein attending to the person includes directing the image sensor toward the person [Each of cameras 112 typically includes pan, tilt, and zoom (PTZ) features that may be implemented mechanically and/or digitally. Video conference endpoint 104 pans, tilts, and zooms video cameras 112 (in the mechanical or digital domain) as necessary to frame best views of the scenes encompassing participants 106 and 106′ in respective main and side FOVs 212M, and 212L and 212R.  paragraph 0014].  

As to claim 18. Burenius discloses the method of claim 17, wherein directing the image sensor toward the person comprises centering the person within the field of view of the image sensor [Automatically switches between the different FOVs in order to capture video of participants 106 and 106′ therein that are actively speaking (i.e., to capture video of the active speakers in the different FOVs).  paragraph 0015].  
In other embodiments, face detection and/or camera PTZ control may be implemented at the remote/receiving video conference endpoint, or even in the cloud (e.g., at conference server 102). paragraph 0014].  

As to claim 20. Burenius discloses the method of claim 16, further comprising: after determining that the person is not within the field of view of the image sensor, determining a preferred direction for reorienting on the person [Automatically switches between the different FOVs in order to capture video of participants 106 and 106′ therein that are actively speaking (i.e., to capture video of the active speakers in the different FOVs). paragraph 0015]; and  
060963-7328-US3 Preliminary Amendmentwherein reorienting the electronic device comprises rotating the image sensor in the preferred direction [Controller 408 initializes cameras, i.e., commands the cameras to initial pan, tilt, and zoom settings to establish main camera FOV 212M and one or more of left and right-side FOVs 212R and 212L so as to capture videos of scenes in the corresponding FOVs. paragraphs 0029-0031]. 

As to claim 21. Burenius discloses the method of claim 20, further comprising: identifying a position of a surface in the vicinity of the electronic device, the surface having reflected at least a portion of the audio data received by a first microphone of the plurality of microphones; wherein the preferred direction is based the position of the surface [Controller 408 initializes cameras, i.e., commands the cameras to initial pan, tilt, and 212M and one or more of left and right-side FOVs 212R and 212L so as to capture videos of scenes in the corresponding FOVs. paragraphs 0029-0031].  

As to claim 22. Burenius discloses the method of claim 20, wherein the preferred direction is based on a layout of a room in which the electronic device is located [As depicted in the example of FIG. 2, microphone array 118 is positioned adjacent to, and centered along, a bottom side of display 114 (i.e., below the display) so as to receive audio from participants 106′ and 106 in room 204. paragraph 0013].  

As to claim 23. Burenius discloses the method of claim 16, wherein reorienting the electronic device comprises moving the image sensor along multiple axes [Controller 408 initializes cameras, i.e., commands the cameras to initial pan, tilt, and zoom settings to establish main camera FOV 212M and one or more of left and right-side FOVs 212R and 212L so as to capture videos of scenes in the corresponding FOVs. paragraphs 0029-0031].  

As to claim 25. Burenius discloses an electronic device [FIG. 4] comprising: 
an image sensor [Camera 112 on FIG. 4]; 
a plurality of microphones [Microphone 118 on FIG. 4]; 
a display [Display 114 on FIG. 4]; 
one or more processors [Processor 444 on FIG. 4]; and 
Processor 444 may include a collection of microcontrollers and/or microprocessors, for example, each configured to execute respective software instructions stored in the memory 448. paragraph 0023], the one or more programs including instructions for:  
060963-7328-US4Preliminary Amendmentobtaining audio signals by concurrently receiving audio data at each microphone of the plurality of microphones [As depicted in the example of FIG. 2, microphone array 118 is positioned adjacent to, and centered along, a bottom side of display 114 (i.e., below the display) so as to receive audio from participants 106′ and 106 in room 204. paragraphs 0013 and 0037]; 
determining based on the obtained audio signals that a person is speaking in a vicinity of the electronic device [If endpoint 104 determines that audio originates from one of the audio search regions, the endpoint declares this an “active” audio search region. paragraphs 0015 and 0038]; 
obtaining video data from the image sensor [Video conference endpoint 104 may use face detection techniques to detect faces and associated positions thereof of participants 106 captured in the video to facilitate optimal video framing in each of FOVs 212M, 212R, and 212L. paragraph 0014]; 
determining based on analysis of the video data that the person is not within a field of view of the image sensor [Video conference endpoint 104 pans, tilts, and zooms video cameras 112 (in the mechanical or digital domain) as necessary to frame best views of the scenes encompassing participants 106 and 106′ in respective main and side FOVs 212M, and 212L and 212R. paragraphs 0014 and 0032]; 
Automatically switches between the different FOVs in order to capture video of participants 106 and 106′ therein that are actively speaking (i.e., to capture video of the active speakers in the different FOVs). paragraph 0015]; 
after reorienting the electronic device, obtaining second video data from the image sensor and determining from the second video data that the person is within the field of view of the image sensor, wherein the determining that the person is within the field of view comprises identifying the person based on analysis of video data from the image sensor [Controller 408 initializes cameras, i.e., commands the cameras to initial pan, tilt, and zoom settings to establish main camera FOV 212M and one or more of left and right-side FOVs 212R and 212L so as to capture videos of scenes in the corresponding FOVs. paragraphs 0029-0031]; and 
attending to the person by directing the display toward the person [Techniques presented herein automatically switch between camera FOVs to capture talking participants positioned next to (i.e., on the side of) a video display and a microphone array of a video conference system and positioned in-front of the display and the microphone array based on audio from the participants. paragraph 0047].
 Burenius fails to disclose identifying the person based on audio signal.
However, Nimri teaches identifying the person based on the obtained audio signals signals [The microphone arrays can be adapted to detect the voice of a local participant, and produce audio signals representing the voice. paragraphs 0042-0043].
 would have found obvious using the microphones detection of the local participant detection system taught by Nimri in an automatic switching of camera system such as that of Burenius as suggested by Nimri, for the obvious purpose of combining the audio signals and the video signals received from the cameras and microphones and transmits the combined signal to the remote participants, by combining prior art elements according to known methods to yield predictable results.

As to claim 26, see the rejection of claim 17 above. 

As to claim 27, see the rejection of claim 18 above.

As to claim 28, see the rejection of claim 20 above.

As to claim 29, see the rejection of claim 21 above. 

As to claim 31, Burenius discloses a non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions [paragraph 0025], which when executed by an electronic device [FIG. 4] having an image sensor [Camera 112 on FIG 4], a plurality of microphones [Microphones 118 on FIG. 4], a display [Display 114 on FIG. 4], one or more processors [Processor 444 on FIG. 4], and memory [Memory 448 on FIG. 4], cause the electronic device to perform the method of: 
As depicted in the example of FIG. 2, microphone array 118 is positioned adjacent to, and centered along, a bottom side of display 114 (i.e., below the display) so as to receive audio from participants 106′ and 106 in room 204. paragraphs 0013 and 0037]; 
determining based on the obtained audio signals that a person is speaking in a vicinity of the electronic device [If endpoint 104 determines that audio originates from one of the audio search regions, the endpoint declares this an “active” audio search region. paragraphs 0015 and 0038]; 
obtaining video data from the image sensor; determining based on analysis of the video data that the person is not within a field of view of the image sensor [Video conference endpoint 104 may use face detection techniques to detect faces and associated positions thereof of participants 106 captured in the video to facilitate optimal video framing in each of FOVs 212M, 212R, and 212L. paragraph 0014]; 
reorienting the electronic device based on differences in the audio data received at respective microphones of the plurality of microphones [Automatically switches between the different FOVs in order to capture video of participants 106 and 106′ therein that are actively speaking (i.e., to capture video of the active speakers in the different FOVs). paragraph 0015]; 
after reorienting the electronic device, obtaining second video data from the image sensor and determining from the second video data that the person is within the field of view of the image sensor, wherein the determining that the person is within the field of view comprises identifying the person based on analysis of video data from the image sensor Controller 408 initializes cameras, i.e., commands the cameras to initial pan, tilt, and zoom settings to establish main camera FOV 212M and one or more of left and right-side FOVs 212R and 212L so as to capture videos of scenes in the corresponding FOVs. paragraphs 0029-0031]; and 
attending to the person by directing the display toward the person [Techniques presented herein automatically switch between camera FOVs to capture talking participants positioned next to (i.e., on the side of) a video display and a microphone array of a video conference system and positioned in-front of the display and the microphone array based on audio from the participants. paragraph 0047].
 Burenius fails to disclose identifying the person based on audio signal.
However, Nimri teaches identifying the person based on the obtained audio signals [The microphone arrays can be adapted to detect the voice of a local participant, and produce audio signals representing the voice. paragraphs 0042-0043].
Burenius and Nimri are analogous because they are all directed to switching camera based on audio detection. One of ordinary skill in the art before the effective filing date of the claimed invention would have found obvious using the microphones detection of the local participant detection system taught by Nimri in an automatic switching of camera system such as that of Burenius as suggested by Nimri, for the obvious purpose of combining the audio signals and the video signals received from the cameras and microphones and transmits the combined signal to the remote participants, by combining prior art elements according to known methods to yield predictable results. 

As to claim 35, see the rejection of claim 3 above.  
Claims32 and 33 is/are rejected under 35 U.S.C. 103 as being unpatentable over Burenius and Nimri as applied to claim 31 above, and further in view of Waldo (US 10,140,515 B1). 
As to claim 32, Burenius and Nimri disclose wherein the one or more programs further comprise instructions for: after reorienting the electronic device, determining that a plurality of persons is within the field of view [Techniques presented herein automatically switch between camera FOVs to capture talking participants positioned next to (i.e., on the side of) a video display and a microphone array of a video conference system and positioned in-front of the display and the microphone array based on audio from the participants. paragraph 0047]. 
Burenius and Nimri fail to disclose assigning a confidence score to each person.
However, Waldo teaches assigning a confidence score to each person of the plurality of persons; and 060963-7328-US6 Preliminary Amendmentwherein determining that the person is within the field of view comprises determining that the person is assigned the highest confidence score [column 18, lines 28-60].
Burenius, Nimri and Waldo are analogous because they are all directed to switching camera based on audio detection. One of ordinary skill in the art before the effective filing date of the claimed invention would have found obvious using the matching of images and audio data taught by Waldo in an automatic switching of camera system such as that of Burenius and Nimri  as suggested by Waldo, for the obvious purpose of the filter of Waldo can be applied to a category of content items being viewed by the system, by combining prior art elements according to known methods to yield predictable results. 

As to claim 33, Burenius and Nimri disclose the non-transitory computer-readable storage medium.

However, Waldo teaches, wherein the confidence score is based on an analysis of the obtained audio signals and/or video signals from the image sensor [column 18, lines 28-60].
Burenius, Nimri and Waldo are analogous because they are all directed to switching camera based on audio detection. One of ordinary skill in the art before the effective filing date of the claimed invention would have found obvious using the matching of images and audio data taught by Waldo in an automatic switching of camera system such as that of Burenius and Nimri  as suggested by Waldo, for the obvious purpose of the filter of Waldo can be applied to a category of content items being viewed by the system, by combining prior art elements according to known methods to yield predictable results.


Allowable Subject Matter
Claims 24, 30 and 34 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.




Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO-892 Form.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GERALD GAUTHIER whose telephone number is (571)272-7539.  The examiner can normally be reached on 8:00 AM to 4:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, FAN TSANG can be reached on (571) 272-7547.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

February 26, 2021