DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

	In the response to this office action, the Examiner respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Examiner in prosecuting this application.

Priority

2.	Applicant states that this application is a continuation or divisional application of the prior-filed application. A continuation or divisional application cannot include new matter.  Applicant is required to change the relationship (continuation or divisional application) to continuation-in-part because this application contains the following matter not disclosed in the prior-filed application: See, for example, specification page 2, paragraph [0007] to page 5, paragraph [0021], e.g., “A fourth aspect”; “A fifth aspect”. 

Specification

3.    The disclosure is objected to because of the following informalities:

	In the Specification, page 1, first paragraph, line 1, the status of this application should be changed because this application contains the following matter not disclosed in the prior-filed application: See specification page 2, paragraph [0007] to page 5, paragraph [0021], e.g., “A fourth aspect”; “A fifth aspect”.  

Appropriate correction is required.

Information Disclosure Statement

4.	The information disclosure statement filed on January 02, 2020 has been considered and placed in the application file.

Claim Objections

5.	Claims 11-20 objected to because of the following informalities:  Independent claim 1, line 9, the word “indicting” is a typographical error, which should be - - indicating - -. 
	Appropriate correction is required. 

 Claims 12-20, these claims depend from Claim 11 therefore rejected for the same reasons.

Claim Rejections - 35 USC § 112

6.	The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


7.	Claims 8, 10, and 17-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

	Claim 8 recites the limitation “the command” in line 2.  There is insufficient antecedent basis for this limitation in the claim.

	Claim 8 is indefinite because it is unclear whether limitation “a remote geographic location” in lines 1-2 is the same “a remote geographic location” as recited in lines 2-3 of claim 6.  If it is, the examiner suggests that applicant can amend “a remote geographic location” in lines 1-2 of claim 8 to read “the remote geographic location” in lines 1-2” to overcome this problem.


	Claim 10 recites the limitation “the focus direction” in line 2.  There is insufficient antecedent basis for this limitation in the claim.

	Claim 17 recites the limitation “the command” in line 2.  There is insufficient antecedent basis for this limitation in the claim.

	Regarding Claim 18, this claim depends from Claim 17 therefore rejected for the same reasons.

	Claim 18 is indefinite because it is unclear whether limitation “a desired direction of focus for the microphone array” in lines 2-3 is the same “a desired direction of focus for the microphone array“ as recited in lines 1-2 of claim 16.  If it is, the examiner suggests that applicant can amend a desired direction of focus for the microphone array” in lines 2-3 of claim 18 to read “the desired direction of focus for the microphone array” to overcome this problem.

	Claim Rejections - 35 USC § 102

8.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


9.	Claims 1, and 4 are rejected under 35 U.S.C. 102(a)(1)  as being anticipated by  Tico et al. U.S. Patent Application Publication 20100026780 (hereinafter, “Tico”). 

	Regarding claim 1, Tico teaches a system (audio-visual source tracking system 30, Fig. 1, par [0036], see Tico) comprising: 
		a microphone array (34, Fig. 1, par [0036], see Tico); 
		a camera system (28, Fig. 1, par [0036], see Tico) for detecting a visual characteristic of a user (The video camera may be oriented and configured to capture an image of a face of a user of the device. The video camera may be used for video capture when performing video calling with the device, par [0036], see Tico); and 
		a control system (600, Fig. 10, par [0058], see Tico) coupled with the microphone array and the camera system (see controller 600, coupled to tracking system 30, 300, Fig. 10), the control system programmed to control the microphone array (The audio-visual tracking system 30, 330 is also coupled to the controller 600. The device 10, 300 may programmed to automatically change capture or sense an audio signal, Fig. 10, par [0058], see Tico) by:
			 receiving a camera signal from the camera system (wherein the tracking system is configured to receive a video signal from the first video camera, Fig. 8, par [0056], see Tico); and 
			focusing the microphone array in response to the camera signal indicating the visual characteristic of the user (With audio-visual source tracing it is possible to steer, i.e., focusing, the maximum sensitivity of the microphone array towards the desired source (generally the user's face (or mouth)), par [0059], see Tico), prior to receiving an audio input from the user at the microphone array (wherein the tracking system is configured to capture an audio signal from the microphone at least partially in response to the video signal (step 406), Fig. 8, par [0056], see Tico), i.e., video signal prior audio signal. 
	Tico thus teaches all the claimed limitations. 

	Regarding claim 4, Tico teaches the system of claim 1, wherein the visual characteristic comprises at least one of: a detected presence of the user, a detected location of the user, a detected identity of the user, or a detected gesture by the user (The audio-visual source tracking system 30 detects and tracks the user's face in the video frames captured by the secondary camera (as the secondary camera is oriented towards the user of the device), par [0046], see Tico), i.e., a detected presence of the user. 

10.	Claims 11, and 14 are rejected under 35 U.S.C. 102(a)(1)  as being anticipated by  Hershey et al. U.S. Patent Application Publication 20070016426 (hereinafter, “Hershey”). 

	Regarding claim 11, Hershey teaches an audio and video conferencing system (Embodiments of the present invention include an audio-visual control system which has varied applications as illustrated in FIGS. 2-8. In each of the illustrated embodiments or applications, visual input is used in controlling an audio function, such comprising: 
		a microphone array (an audio device or audio receiving device 202 (such as a microphone or microphone array), Fig. 2, par [0031], see Hershey); 
		a camera system (a visual perception device 204 (such as a camera), Fig. 2, par [0031], see Hershey) for detecting a visual characteristic of a user at a first geographic location (In the "look-to-talk" application shown, the visual processing component 205 detects when the user is facing towards a given location, for example, Fig. 2, par [0034], see Hershey); and 
		a control system (controller 206, Fig. 2, par [0033], see Hershey) coupled with the microphone array and the camera system (The "look-to-talk" controller 206 controls the audio function or component 208 based upon input from the visual perception device, such as a camera, 204 as processed by the visual processing component 205, Fig. 2, par [0033], see Hershey), the control system programmed to control the microphone array (In the embodiment shown in FIG. 2, visual processing component 205 receives a visual input from device 204 and provides an input to controller 206 based on the content of the visual input or signal. Controller 206, in turn, controls an audio function or component 208 (such as a speech recognition function or component) which receives input from the audio device 202. Controller 206 controls the audio function or component 208 based on the input from visual processing component 205, Fig. 2, par [0032], see Hershey) by:
			 receiving a camera signal from the camera system (In one embodiment illustrated in FIG. 2, the control system includes an audio device or audio ; and 
			focusing the microphone array (the focus direction of a microphone array (or beam), par [0051], see Hershey; Audio from the focussed microphone device or array is processed and provided to an audio function or component 208 such as a speech recognition component, or video teleconferencing component, Fig. 5, par [0053], see Hershey) in response to the camera signal indicating the visual characteristic of the user at the first geographic location (As shown in FIG. 5, visual processing component 205 processes input from the visual perception device 204 to detect and locate the user as described previously. In one embodiment, the processing component 205 uses face detection, motion tracking and depth tracking functions to locate the user as previously described. Output from the visual processing component 205 is provided to the visual focus controller 260 to focus or orient the audio device 202 relative to the location of the user based upon the visual input, Fig. 5, par [0052], see Hershey), prior to receiving an audio input from the user at the microphone array (in accordance with one embodiment, the visual focus controller 260 visually tracks the user to keep the microphone device focussed on the user.  This improves signal-to-noise ratio for the user's speech. Audio from the focussed microphone device or array is processed and provided to an audio function or component 208 such as a speech recognition component, or video teleconferencing component, Fig. 5, par [0053], see Hershey), i.e., video signal prior audio signal. 
	Hershey thus teaches all the claimed limitations.

	Regarding claim 14, Hershey teaches the system of claim 11, wherein the visual characteristic comprises at least one of: a detected presence of the user, a detected location of the user, a detected identity of the user, or a detected gesture by the user (In another embodiment, the "look-to-talk` control system is used to control other functions represented by block 208, such as a video teleconference function or component. The system detects a front face of the speaker or user prior to transmitting an utterance or speech. Thus, the audio signal of the speaker is transmitted to a remote participant while the user's front face is detected, par [0049], see Hershey), i.e., a detected presence of the user. 
 
Claim Rejections - 35 USC § 103

11.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.   

12.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


13.	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

14.	Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Tico et al. U.S. Patent Application Publication 20100026780 (hereinafter, “Tico”) in view of Udani U.S. Patent Application Publication 20120323796.

	Regarding claim 2, Tico teaches the system of claim 1. Tico further teaches 
wherein the control system is further configured to compare the visual characteristic of the user detected by the camera system with a stored visual characteristic (examples of the invention may include advanced techniques for robust and efficient face detection and tracking. Such techniques and algorithms have been developed by several research groups and they are currently in use in several applications (e.g. face recognition, face tracking, etc). Face tracking algorithms can achieve high performance and may be run in real time even relatively lower computational power machines, such as mobile phones, for example), wherein focusing the microphone array (With audio-visual source tracing it is possible to steer, i.e., focusing, the maximum sensitivity of the , is performed in response to the detected visual characteristic matching a visual characteristic in the stored visual characteristic  ((e.g. face recognition, face tracking, etc,) par [0059], see Tico) wherein the tracking system is configured to capture an audio signal from the microphone at least partially in response to the video signal (step 406), Fig. 8, par [0056], see Tico).
However, Tico does not explicitly disclose a stored visual characteristic being a stored visual characteristic library.
	Udani teaches methods and systems for recording verifiable documentation (see Title) in which an electronic device is described, the device comprising: a biometric characteristics capturing device; an application adapted to perform biometric recognition; and a database locally connected with the electronic device, wherein the biometric recognition is performed by comparing a set of biometric characteristics against a library of biometric characteristics, and wherein the library of biometric characteristics is located in the database (par [0023], see Udani). The term "biometrics" is defined to mean methods for uniquely recognizing individual human beings based on one or more intrinsic physical or behavioral traits. For example, identification authentication may involve, but is not limited to, facial recognition, (par [0067], see Udani). 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the methods and systems for recording verifiable documentation taught by Udani with the system of Tico such that to obtain a stored visual characteristic being a stored visual characteristic library as .

13.	Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Tico et al. U.S. Patent Application Publication 20100026780 (hereinafter, “Tico”) in view of Udani U.S. Patent Application Publication 20120323796, and further in view of Roller U.S. Patent Application Publication 20150278398.

	Regarding claim 3, Tico in view of Udani teaches the system of claim 2.  Tico in view of Udani, as modified, teaches the software portion of the methods and systems of the present disclosure may comprise a computer-readable medium that comprises instructions that, when executed, perform, at least in part, the various described methods in the present disclosure.  The computer-readable medium that comprises instructions to be executed may be accessed locally or remotely via a connected network. The instructions may be executed by a processor (e.g., a microprocessor, a microcontroller, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a standard logic integrated circuit, or a field programmable logic array (PLD, FPGA, etc.) (par [107], see Udani).
	However, Tico in view of does not explicitly disclose wherein the visual characteristic library comprises a thresholding mechanism for comparing the camera signal with pre-sorted video data or pre-sorted image data to determine whether the detected visual characteristic is linked with a focus direction for the microphone array.
	Roller teaches system and method for detecting potential matches between a candidate biometric and a dataset of biometrics (see Title) in which various implementations of the inventions described herein employ spectral clustering in order to identify potential matches or non-matches, as the case might be, between candidate or probe biometrics and gallery or dataset biometrics (par [0026], see Roller); the performance using the implied ROC curve (e.g., minimizing the percentage of false positive fraud cases while sacrificing the percentage of true fraud cases) is something that can be optimized based on prior statistics of match/non-match distributions, and the classification confusion matrices resulting from testing possible normal and fraud hypotheses against the clustering, classification, scoring and thresholding mechanism described above (par [0049], see Roller).  
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the system and method for detecting potential matches between a candidate biometric and a dataset of biometrics taught by Roller with the system of Tico in view of Udani such that to obtain wherein the visual characteristic library comprises a thresholding mechanism for comparing the camera signal with pre-sorted video data or pre-sorted image data to determine whether the detected visual characteristic is linked (wherein the tracking system is configured to capture an audio signal from the microphone at least partially in response to the video signal (step 406), Fig. 8, par [0056], see Tico) with a focus direction for the microphone array (With audio-visual source tracing it is possible to steer, i.e., focusing, the maximum sensitivity of the microphone array towards the desired source (generally the .

14.	Claims 5-7 are rejected under 35 U.S.C. 103 as being unpatentable over Tico et al. U.S. Patent Application Publication 20100026780 (hereinafter, “Tico”) in view of Hershey et al. U.S. Patent Application Publication 20070016426, (hereinafter, “Hershey”).
 
	Regarding claim 5, Tico teaches the system of claim 1.  Tico further teaches  using a mechanical device to move a camera towards a user's face for visual and audio tracking used in fixed teleconferencing conditions (par [0014], see Tico); an audio-visual source tracking system comprising a first video camera and an array of microphones, wherein the first video camera and the array of microphones are attached to the housing (par [0060], see Tico).
	However, Tico does not explicitly disclose wherein the microphone array is contained within an audio and video conferencing system. 
	Hershey teaches audio-visual control system (see Title) in which embodiments of the present invention include an audio-visual control system which has varied applications as illustrated in FIGS. 2-8. In each of the illustrated embodiments or applications, visual input is used in controlling an audio function, such as speech recognition, or video telephone conference audio inputs (par [0030], see Hershey); in accordance with one embodiment, the visual focus controller (260, Fig. 5) visually tracks the user to keep the microphone device focussed on the user. This improves signal-to-
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the audio-visual control system taught by Hershey with the system of Tico such that to obtain wherein the microphone array is contained within an audio and video conferencing system as claimed in order to improve audio quality, as suggested by Hershey in paragraph [0050].
 
	Regarding claim 6, Tico teaches in view of Hershey teaches the system of claim 5.  Tico teaches in view of Hershey, as modified, teaches wherein the audio and video conferencing system includes a video display for providing video captured by the camera system at a remote geographic location (In another embodiment, the "look-to-talk` control system is used to control other functions represented by block 208, such as a video teleconference function or component. The system detects a front face of the speaker or user prior to transmitting an utterance or speech. Thus, the audio signal of the speaker is transmitted to a remote participant while the user's front face is detected, par [0049], see Hershey). 
 
	Regarding claim 7, Tico teaches in view of Hershey teaches the system of claim 5.  Tico teaches in view of Hershey, as modified, teaches further comprising a directional controller (including visual focus controller 260, Fig. 5, par [0050], see for indicating a desired direction of focus for the microphone array (The visual focus controller 260 is used to control the audio device 202 (such as a microphone array) to track the location of the user, Fig. 5, par [0050], see Hershey). 

15.	Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Tico et al. U.S. Patent Application Publication 20100026780 (hereinafter, “Tico”) in view of Hershey et al. U.S. Patent Application Publication 20070016426, (hereinafter, “Hershey”), and further in view of Anand et. al. U.S. Patent Application Publication 20170301371 (hereinafter, “Anand”).
 
	Regarding claim 8, Tico teaches in view of Hershey teaches the system of claim 7.  Tico teaches in view of Hershey, as modified, teaches wherein the directional controller (including visual focus controller 260, Fig. 5, par [0050], see Hershey) when sending the command indicating the desired direction of focus (With audio-visual source tracing it is possible to steer, i.e., focusing, the maximum sensitivity of the microphone array towards the desired source (generally the user's face (or mouth)), par [0059], see Tico), and wherein the directional controller is configured to send a command signal through the audio and video conferencing system to indicate the desired direction of focus for the microphone array (in accordance with one embodiment, the visual focus controller (260, Fig. 5) visually tracks the user to keep the microphone device focussed on the user. This improves signal-to-noise ratio for the user's speech. Audio from the focussed microphone device or array is processed and provided to an audio function or 
	However, Tico in view of Hershey does not explicitly disclose wherein the directional controller being the directional controller is at a remote geographic location from the microphone array; to indicate the desired direction of focus for the microphone array being to indicate the desired direction of focus for the microphone array at the remote geographic location. 
	Anand teaches selection of an audio stream of a video for enhancement using images of the video (see Title) in which these multiple microphone systems are normally used to isolate a speaker's voice over the ambient noise. To capture a more remote speaker using beam forming requires that the user identify that speaker while the video recording is being made (par [0003], see Anand); another common video camera location is in a conference or meeting room. When there are multiple speakers in different locations in a conference room, some systems will provide microphones around a table so that there is a microphone near each speaker. In other systems, there are directional microphones at a central location or even a single microphone at that single location. A remote participant in the conference at another location (i.e., a remote geographic location) is, in some cases, able to choose a particular microphone and have that microphone amplified. In other systems, the user is able to select a particular direction and a central microphone or microphone array can use beam-forming, tagging, or some other approach to focus on a specific speaker and enhance the voice of that speaker for the remote participant, (e.g., at remote geographic location, par [0004], see Anand). 
.

16.	Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Tico et al. U.S. Patent Application Publication 20100026780 (hereinafter, “Tico”) in view of Krupat et al. U.S. Patent Application Publication 20180196432, (hereinafter, “Krupat”).
 
	Regarding claim 9, Tico teaches the system of claim 1.  Tico further teaches   examples of the invention may include advanced techniques for robust and efficient face detection and tracking. Such techniques and algorithms have been developed by several research groups and they are currently in use in several applications (e.g. face recognition, face tracking, etc). Face tracking algorithms can achieve high performance and may be run in real time even relatively lower computational power machines (par [0044], see Tico).  Referring now also to FIG. 3, an exemplary algorithm 100 of the audio-visual source tracking system 30 is illustrated. The algorithm may be provided for implementing the video tracking of the user and controlling the sensitivity of directional microphone beam (for the desired audio signal to be transmitted), (see Fig. 3, par [0044], see Tico).  The audio-visual tracking system 30, 330 is also coupled to the controller 600 (see Fig. 10, par [0058], see Tico).
 further comprising a directional controller for training a visual characteristic engine to identify the visual characteristic of the user. 
	Krupat teaches image analysis for two-sided data hub (see Title) in which the live streaming and image analysis (1200, Fig. 12, see Krupat) can be facilitated by a video capture device, a local server, a remote server, a semiconductor-based logic, and so on.  Some of the live streams, such as webcasts, online classes, sporting events, news, computer gaming, or video conferences can be scheduled, while others can be impromptu streams that are broadcast as needed or when desirable (Fig. 12, par [0082], see Krupat). The example 1200 shows a user 1210 broadcasting a video live stream and an audio live stream to one or more people as shown by a first person 1250, a second person 1260, and a third person 1270 (Fig. 12, par [0083], see Krupat).  The facial expression can be identified using a hidden layer from the one or more hidden layers. The weights can be provided on inputs to the multiple layers to emphasize certain facial features within the face. The convolutional neural network can be trained to identify facial expressions, voice characteristics, etc. The training can include assigning weights to inputs on one or more layers within the multilayered analysis engine (par [0123], see Krupat).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the image analysis for two-sided data hub taught by Krupat with the system of Tico such that to obtain further comprising a directional controller for training a visual characteristic engine to identify the visual characteristic of the user as claimed in order to improve the processing of the image, as suggested by Krupat in paragraph [0101]. 

17.	Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Tico et al. U.S. Patent Application Publication 20100026780 (hereinafter, “Tico”) in view of Beaucoup et al. U.S. Patent Application Publication 220030118200, (hereinafter, “Beaucoup”).

	Regarding claim 10, Tico teaches the system of claim 1.  Tico further teaches wherein the tracking system is configured to capture an audio signal from the microphone at least partially in response to the video signal (step 406, Fig. 8, par [0056], see Tico); the applications can include, for example, a telephone application, an Internet browsing application, a game playing application, a digital camera application, etc. (par [0058], see Tico).  This is where an audio-visual source tracking system can be beneficial because it can update the source location information in real time. Microphone(s) having spatial directivity improve the capture of audio signals by emphasizing sounds that are coming from the direction of the desired source by attenuating the sensitivity to other directions. With audio-visual source tracing it is possible to steer (i.e., focus) the maximum sensitivity of the microphone array towards the desired source (generally the user's face (or mouth)) (par [0059], see Tico), i.e., toward a physical location of the user as detected by the camera system.
	 However, Tico does not explicitly disclose wherein the control system includes a digital signal processor for modifying the focus direction of the microphone array toward a physical location of the user or away from the physical location of the user as detected by the camera system.
	Beaucoup teaches system and method of indicating and controlling sound pickup direction and location in a teleconferencing system (see Title) in which when audio signals are picked up by the microphones 62 and delivered to the DSP 70, the DSP, which executes the beamsteering algorithm, determines the location of the active talker in the surrounding environment. The active talker location is then used by the beamforming algorithm executed by the DSP 70 to steer the omni-directional microphone array towards the talker 150 by synthesizing narrow microphone beams 140 in the talker direction as shown in FIG. 6. The DSP 70 also signals the glue logic 72 to illuminate the LED 56b associated with the direction of the synthesized microphone beams 140. In this case, the illuminated LED is constantly illuminated and is green in color. If the talker 150 changes position and the direction of the strongest audio signals received by the microphone array changes, the DSP 70 re-steers the microphone array and illuminates the LED associated with the direction of the newly synthesized microphone beams. As will be appreciated, in this manner the LEDs provide a visual indication to the active talker and others surrounding the conference unit 14, of the microphone array microphone beam focus direction (see Figs. 3, 6, par [0052], see Beaucoup).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the system and method of indicating and controlling sound pickup direction and location in a teleconferencing system taught by Beaucoup with the system of Tico such that to obtain wherein the control system includes a digital signal processor for modifying the focus direction of the microphone array toward a physical location of the user or away from the physical location of the user as detected by the camera system as claimed for purpose of , enhancing the audio quality, as suggested by Beaucoup in paragraph [0045].

18.	Claims 12 is rejected under 35 U.S.C. 103 as being unpatentable over Hershey et al. U.S. Patent Application Publication 20070016426 (hereinafter, “Hershey”) in view of Udani U.S. Patent Application Publication 20120323796.

	Regarding claim 2, Hershey teaches the system of claim 1. Hershey further teaches wherein the control system is further configured to compare the visual characteristic of the user detected by the camera system with a stored visual characteristic (the visual perception device 204 includes a color camera or device to track color cues to locate the user. Different camera arrangements can be employed to provide different views or perspectives to track the user. Additionally, the visual image can be used in a recognition system to verify authenticity of the user, such as based on biometrics (corresponding to a stored visual characteristic), par [0045], see Hershey), wherein focusing the microphone array (the focus direction of a microphone array (or beam), par [0051], see Hershey; Audio from the focussed microphone device or array is processed and provided to an audio function or component 208 such as a speech recognition component, or video teleconferencing component, Fig. 5, par [0053], see Hershey), is performed in response to the detected visual characteristic matching a visual characteristic in the stored visual characteristic  (the visual image can be a stored visual characteristic), par [0045], see Hershey; In one embodiment, the processing component 205 uses face detection, motion tracking and depth tracking functions to locate the user as previously described. Output from the visual processing component 205 is provided to the visual focus controller 260 to focus or orient the audio device 202 relative to the location of the user based upon the visual input, Fig. 5, par [0052], see Hershey).
	However, Hershey does not explicitly disclose a stored visual characteristic being a stored visual characteristic library.
	Udani teaches methods and systems for recording verifiable documentation (see Title) in which an electronic device is described, the device comprising: a biometric characteristics capturing device; an application adapted to perform biometric recognition; and a database locally connected with the electronic device, wherein the biometric recognition is performed by comparing a set of biometric characteristics against a library of biometric characteristics, and wherein the library of biometric characteristics is located in the database (par [0023], see Udani). The term "biometrics" is defined to mean methods for uniquely recognizing individual human beings based on one or more intrinsic physical or behavioral traits. For example, identification authentication may involve, but is not limited to, facial recognition, (par [0067], see Udani). 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the methods and systems for recording verifiable documentation taught by Udani with the system of Hershey such a stored visual characteristic being a stored visual characteristic library as claimed for purpose of providing an increase in reliability and validity of the data collected as suggested by Udani in paragraph [0205].

19.	Claims 13 is rejected under 35 U.S.C. 103 as being unpatentable over Hershey et al. U.S. Patent Application Publication 20070016426, (hereinafter, “Hershey”) in view of Udani U.S. Patent Application Publication 20120323796, and further in view of Roller U.S. Patent Application Publication 20150278398.

	Regarding claim 13, Hershey in view of Udani teaches the system of claim 12.  Hershey in view of Udani, as modified, teaches the software portion of the methods and systems of the present disclosure may comprise a computer-readable medium that comprises instructions that, when executed, perform, at least in part, the various described methods in the present disclosure.  The computer-readable medium that comprises instructions to be executed may be accessed locally or remotely via a connected network. The instructions may be executed by a processor (e.g., a microprocessor, a microcontroller, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a standard logic integrated circuit, or a field programmable logic array (PLD, FPGA, etc.) (par [107], see Udani).
	However, Hershey in view of does not explicitly disclose wherein the visual characteristic library comprises a thresholding mechanism for comparing the camera signal with pre-sorted video data or pre-sorted image data to determine whether the detected visual characteristic is linked with a focus direction for the microphone array.
	Roller teaches system and method for detecting potential matches between a candidate biometric and a dataset of biometrics (see Title) in which various implementations of the inventions described herein employ spectral clustering in order to identify potential matches or non-matches, as the case might be, between candidate or probe biometrics and gallery or dataset biometrics (par [0026], see Roller); the performance using the implied ROC curve (e.g., minimizing the percentage of false positive fraud cases while sacrificing the percentage of true fraud cases) is something that can be optimized based on prior statistics of match/non-match distributions, and the classification confusion matrices resulting from testing possible normal and fraud hypotheses against the clustering, classification, scoring and thresholding mechanism described above (par [0049], see Roller).  
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the system and method for detecting potential matches between a candidate biometric and a dataset of biometrics taught by Roller with the system of Hershey in view of Udani such that to obtain wherein the visual characteristic library comprises a thresholding mechanism for comparing the camera signal with pre-sorted video data or pre-sorted image data to determine whether the detected visual characteristic is linked (the visual perception device 204 includes a color camera or device to track color cues to locate the user. Different camera arrangements can be employed to provide different views or perspectives to track the user. Additionally, the visual image can be used in a recognition system to verify authenticity of the user, such as based on biometrics (corresponding to a stored visual characteristic), par [0045], see Hershey), with a focus direction for the microphone array 
  
 20.	Claims 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Hershey et al. U.S. Patent Application Publication 20070016426, (hereinafter, “Hershey”). 

	Regarding claim 15, Hershey teaches the system of claim 11. Hershey teaches further comprising a video display and a speaker (A monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190. In addition to the monitor, computers may also include other peripheral output devices such as speakers 197 and printer 196, which may be connected through an output peripheral interface 190, Fig. 1, par [0027], see Hershey) located at user location ( In the "look-to-talk" application shown, the visual processing component 205 detects when the user is facing towards a given location, for example; Fig. 2, par [0034], see Hershey). 
	However, Hershey does not explicitly disclose further comprising a video display and a speaker located at a second, distinct geographic location.
	Hershey further discloses an embodiment of a computing environment 100 in which the invention can be implemented will be described with respect to FIG. 1 (see a second, distinct geographic location) while the user's front face is detected, Fig. 2, par [0049], see Hershey).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have utilized the display and monitor of Fig. 1 of Hershey such that to obtain further comprising a video display and a speaker located at a second, distinct geographic location as claimed in order to improve audio quality, as suggested by Hershey in paragraph [0050].
   
	Regarding claim 16, Hershey teaches the system of claim 15.  Hershey, as modified, teaches further comprising a directional controller (including visual focus controller 260, Fig. 5, par [0050], see Hershey) for indicating a desired direction of focus for the microphone array (The visual focus controller 260 is used to control the audio device 202 (such as a microphone array) to track the location of the user, Fig. 5, par [0050], see Hershey). 
 
	Regarding claim 17, Hershey teaches the system of claim 16.  Hershey, as modified, teaches wherein the directional controller (including visual focus controller is at the first, distinct geographic location when sending the command indicating the desired direction of focus (The system detects a front face of the speaker or user prior to transmitting an utterance or speech. Thus, the audio signal of the speaker is transmitted to a remote participant (corresponding to a second, distinct geographic location) while the user's front face is detected, Fig. 2, par [0049], see Hershey). 
	However, Hershey in view of does not explicitly disclose at the first, distinct geographic location being at the second, distinct geographic location.
	Hershey further discloses an embodiment of a computing environment 100 in which the invention can be implemented will be described with respect to FIG. 1 (see Fig. 1, par [0016], see Hershey).  Embodiments of the present invention include an audio-visual control system which has varied applications as illustrated in FIGS. 2-8. In each of the illustrated embodiments or applications, visual input is used in controlling an audio function, such as speech recognition, or video telephone conference audio inputs (see Figs. 2-8, par [0030], see Hershey).  In another embodiment, the "look-to-talk` control system is used to control other functions represented by block 208, such as a video teleconference function or component (par [0049], see Hershey).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have utilized the video teleconference function or component of Hershey such that to obtain at the first, distinct geographic location being at the second, distinct geographic location as claimed in order to improve audio quality, as suggested by Hershey in paragraph [0050].  In other words, Hershey teaches the 

	Regarding claim 18,  Hershey teaches the system of claim 16.  Hershey, as modified, teaches wherein the directional controller (including visual focus controller 260, Fig. 5, par [0050], see Hershey) is configured to send a command signal through the audio and video conferencing system (In each of the illustrated embodiments or applications, visual input is used in controlling an audio function, such as speech recognition, or video telephone conference audio inputs (see Figs. 2-8, par [0030], see Hershey).  In another embodiment, the "look-to-talk` control system is used to control other functions represented by block 208, such as a video teleconference function or component (par [0049], see Hershey) to indicate a desired direction of focus (The system detects a front face of the speaker or user prior to transmitting an utterance or speech. Thus, the audio signal of the speaker is transmitted to a remote participant (corresponding to a second, distinct geographic location) while the user's front face is detected, Fig. 2, par [0049], see Hershey) for the microphone array (The visual focus controller 260 is used to control the audio device 202 (such as a microphone array) to track the location of the user, Fig. 5, par [0050], see Hershey). 
	However, Hershey in view of does not explicitly disclose at the second geographic location being at the first geographic location.
	Hershey further discloses an embodiment of a computing environment 100 in which the invention can be implemented will be described with respect to FIG. 1 (see Fig. 1, par [0016], see Hershey), (e.g., monitor 191 or other type of display device, a second, distinct geographic location) while the user's front face is detected, Fig. 2, par [0049], see Hershey).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have utilized the display and monitor of Fig. 1 of Hershey such that to obtain at the second geographic location being at the first geographic location as claimed in order to improve audio quality, as suggested by Hershey in paragraph [0050]. In other words, Hershey teaches the same video teleconference function or component at the remote participant.  Here the user being the remote participant.

21.	Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Hershey et al. U.S. Patent Application Publication 20070016426, (hereinafter, “Hershey”) in view of Krupat et al. U.S. Patent Application Publication 20180196432, (hereinafter, “Krupat”).
 
	Regarding claim 19, Hershey teaches the system of claim 11.  Hershey as modified, teaches further comprising a directional controller (including visual focus controller 260, Fig. 5, par [0050], see Hershey); although a particular search and tracking algorithm or methodology is described, the application is not limited to the 
	However, Hershey does not explicitly disclose further comprising a directional controller for training a visual characteristic engine to identify the visual characteristic of the user. 
	Krupat teaches image analysis for two-sided data hub (see Title) in which the live streaming and image analysis (1200, Fig. 12, see Krupat) can be facilitated by a video capture device, a local server, a remote server, a semiconductor-based logic, and so on.  Some of the live streams, such as webcasts, online classes, sporting events, news, computer gaming, or video conferences can be scheduled, while others can be impromptu streams that are broadcast as needed or when desirable (Fig. 12, par [0082], see Krupat). The example 1200 shows a user 1210 broadcasting a video live stream and an audio live stream to one or more people as shown by a first person 1250, a second person 1260, and a third person 1270 (Fig. 12, par [0083], see Krupat).  The facial expression can be identified using a hidden layer from the one or more hidden layers. The weights can be provided on inputs to the multiple layers to emphasize certain facial features within the face. The convolutional neural network can be trained to identify facial expressions, voice characteristics, etc. The training can include assigning weights to inputs on one or more layers within the multilayered analysis engine (par [0123], see Krupat).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the image analysis for two-further comprising a directional controller for training a visual characteristic engine to identify the visual characteristic of the user as claimed in order to improve the processing of the image, as suggested by Krupat in paragraph [0101].

 22.	Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Hershey et al. U.S. Patent Application Publication 20070016426, (hereinafter, “Hershey”) in view of Beaucoup et al. U.S. Patent Application Publication 20030118200, (hereinafter, “Beaucoup”).

	Regarding claim 20, Hershey teaches the system of claim 1.  Hershey further teaches processing unit (120, Fig. 1, par [0024], see Hershey); digital versatile disks, digital video tape (par [0025], see Hershey).  In one embodiment, the processing component 205 uses face detection, motion tracking and depth tracking functions to locate the user as previously described  Output from the visual processing component 205 is provided to the visual focus controller 260 to focus or orient the audio device 202 relative to the location of the user based upon the visual input (Fig. 5, par [0052], see Hershey); For example, for video teleconferencing, the control system uses visual detection and tracking functions to focus or point the microphone device or array in the direction of the speaker instead of in the direction of the loudest noise which could be unrelated to the user and merely background noise (par [0052], see Hershey).
	 However, Hershey does not explicitly disclose wherein the control system includes a digital signal processor for modifying the focus direction of the microphone array toward a physical location of the user or away from the physical location of the user as detected by the camera system.
	Beaucoup teaches system and method of indicating and controlling sound pickup direction and location in a teleconferencing system (see Title) in which when audio signals are picked up by the microphones 62 and delivered to the DSP 70, the DSP, which executes the beamsteering algorithm, determines the location of the active talker in the surrounding environment. The active talker location is then used by the beamforming algorithm executed by the DSP 70 to steer the omni-directional microphone array towards the talker 150 by synthesizing narrow microphone beams 140 in the talker direction as shown in FIG. 6. The DSP 70 also signals the glue logic 72 to illuminate the LED 56b associated with the direction of the synthesized microphone beams 140. In this case, the illuminated LED is constantly illuminated and is green in color. If the talker 150 changes position and the direction of the strongest audio signals received by the microphone array changes, the DSP 70 re-steers the microphone array and illuminates the LED associated with the direction of the newly synthesized microphone beams. As will be appreciated, in this manner the LEDs provide a visual indication to the active talker and others surrounding the conference unit 14, of the microphone array microphone beam focus direction (see Figs. 3, 6, par [0052], see Beaucoup).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the system and method of indicating and controlling sound pickup direction and location in a teleconferencing system taught by Beaucoup with the system of Hershey such that to obtain wherein the control system includes a digital signal processor for modifying the focus direction of the microphone array toward a physical location of the user or away from the physical location of the user as detected by the camera system as claimed for purpose of enhancing the audio quality, as suggested by Beaucoup in paragraph [0045].

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CON P TRAN whose telephone number is (571)272-7532. The examiner can normally be reached on M-F (08:30 AM- 05:00 PM) ET.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VIVIAN C. CHIN, can be reached on 571-272-7848. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 






/C.P.T/Examiner, Art Unit 2654 

/VIVIAN C CHIN/Supervisory Patent Examiner, Art Unit 2654