DETAILED ACTION

Introduction
1.         This office action is in response to Applicant’s Remarks submission filed on 05/02/2022. Claims 1, 4-9, 19, and 20 are pending in the application. As such, Claims 1, 4-9, 19, and 20 have been examined.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
3.         The response filed 05/02/2022 has been correspondingly accepted and considered in this Office Action.  Claims 1, 4-9, 19, and 20 have been examined.  

Response to Arguments 
4.         Applicant’s amendments and remarks with respect to Claims 1, 4-9, 19, and 20 have been fully reconsidered. In response, Examiner respectfully presents that the previous objections, and further rejections under 35 U.S.C. §§101, 102, and 103 are respectfully withdrawn in view of corresponding reconsidered remarks and claim amendments filed 05/02/2022 found earnestly persuasive. 

Allowable Subject Matter
5.       Claims 1, 4-9, 19, and 20 are found allowable over the prior art of record for at least the following rationale.  The teachings in Bernardin et al., (Bernardin, Keni, and Rainer Stiefelhagen. "Audio-visual multi-person tracking and identification for smart environments." Proceedings of the 15th ACM international conference on Multimedia. 2007), hereinafter referred to as BERNARDIN already of record, as specifically presented in the previous Non-Final Office Action mailed 02/02/2022, have been fully reconsidered. 

    PNG
    media_image1.png
    494
    631
    media_image1.png
    Greyscale
Examiner respectfully notes BERNARDIN discloses see e.g., a speech detection and speaker identification architecture comprising, see e.g., “…speech detection and speaker identification, coupled with a source localizer using the input from several microphone arrays, deliver precisely localized ID cues whenever a speaker becomes active…,” “…cepstral mean subtraction and feature warping are performed on the audio signal to reduce channel, noise and reverberation effects. The output of the speaker ID module is the identity of the speaker as well as the corresponding GMM’s a-posteriori probability for the analyzed segment, which is used as confidence measure…history of speech source location estimates is kept for the duration of a speech segment. Similarly, for the same time window, a record is kept in the fusion module of the positions of all visually tracked persons. The visual and acoustic tracks are then compared to associate the 
    PNG
    media_image2.png
    544
    591
    media_image2.png
    Greyscale
 recognized speaker ID to the best matching person track…” and where see e.g., how “…a set of steerable fuzzy-controlled pan-tilt-zoom cameras serves to smoothly track persons of interest and opportunistically capture facial close-ups for face identification. In parallel, speech segmentation, sound source localization and speaker identification are performed using several far-field microphones and arrays…”, and furthermore “…the fusion module uses the person track information from the multiple camera tracker as the basis upon which the scene model is updated and association of ID cues is performed. The scene model is composed of a number of active person models, and some optional information such as the position of the entrance door and of the whiteboard. A person model comprises the person’s 3D location, and a histogram of identification cues that were assigned to it over time. This “ID histogram” has as many bins as audio-visually trained in subjects, and the values accumulated in the respective bins are the confidences given by the face or speaker ID modules…” (See e.g., BERNARDIN, Abstract, §§ 2, 2.1, 2.2, 2.3, 3, Figs. 1, 4, 5).
Further, Sethi (I. K. Sethi, "Neural implementation of tree classifiers," in IEEE Transactions on Systems, Man, and Cybernetics, vol. 25, no. 8, pp. 1243-1249, Aug. 1995), hereinafter referred to SETHI already of record, discloses see e.g., “…three training schemes to incorporate soft decision making in a feedforward network representing a tree classifier…” “…branch adaptive implementation of decision trees because the inner links of the network correspond to tree branches… the partitioning layer is forced to adjust its  output during training 
    PNG
    media_image3.png
    472
    485
    media_image3.png
    Greyscale
by varying its gain… node adaptive implementation of decision trees because the neurons in the partitioning layer of the network represent the internal nodes of the decision tree…third scheme is a combination of above two methods in which the outputs of the partitioning layer neurons and the link weights for the AND layer are both adjusted during training…this method as the combined branch and node adaptive implementation. Similar to soft decision trees, the classification decision in all three schemes is made by the AND layer neuron producing the highest output, i.e. the class label associated with the neuron producing the highest output is taken as the tree classifier decision…,” (See e.g., SETHI, Abstract, §§I, II, III, Figs. 3, 5).

    PNG
    media_image4.png
    358
    541
    media_image4.png
    Greyscale
Furthermore, Mizumoto et al., (Mizumoto, T., Nakadai, K., Yoshida, T., Takeda, R., Otsuka, T., Takahashi, T., & Okuno, H. G. (2011, May). Design and implementation of selectable sound separation on the Texai telepresence system using HARK. In 2011 IEEE International Conference on Robotics and Automation (pp. 2130-2137). IEEE.), hereinafter referred to as MIZUMOTO already of record, teaches see e.g., how 
    PNG
    media_image5.png
    420
    509
    media_image5.png
    Greyscale
the “…Texai…localization module detects the direction of the sound, and the /talker node publishes a topic /hark, which consists of time stamp, id,  direction-of-arrival, and its power…specifies two parameters: (1) the center direction of the range to listen to, and (2) the angular width of the range, as shown in the center of Figure 6. From the parameters, the user interface publishes a topic /hark direction which consists of the beginning and the ending angles of user’s interest. Then, a remote user listens to only the sounds from the specified range…,” (See e.g., MIZUMOTO, Abstract, §§ I, II, III, Figs. 3, 6).
Notwithstanding, said aforementioned teachings of BERNARDIN, SETHI, and MIZUMOTO are respectfully reconsidered and found to fail to teach or fairly suggest either individually or in a reasonable combination the presented limitations in independent Claim 1, as specifically amended and recited.
Similarly, dependent Claims 4-9; and 19-20 further limit allowable independent Claim 1 correspondingly, and thus they are also found allowable over the prior art of record by virtue of their dependency.
Any comments considered necessary by Applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
6.       The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.  
Nishiguchi et al., (Nishiguchi, S., Higashi, K., Kameda, Y., & Minoh, M. (2003, July). A sensor-fusion method for detecting a speaking student. In 2003 International Conference on Multimedia and Expo. ICME'03. Proceedings (Cat. No. 03TH8698) (Vol. 1, pp. I-129). IEEE.) discloses, see e.g., “…detecting the location of the speaker that is a target of automatic video filming in distance learning and lecture archive…a face of a speaking student is filmed in a lecture video…to detect the location of a speaker. An acoustic sensor such as a microphone array is used widely to detect the location of a sound source. However, it is difficult to detect the location of a sound source precisely using only microphone array because of sound noise in a large space such as a lecture room… propose a method for detecting more precise location of a speaker in the lecture room using not only the microphone array but also visual sensors…” (See e.g., Nishiguchi et al., Abstract). 
Please, see for additional references PTO-892.
7.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Edgar Guerra-Erazo whose telephone number is (571) 270-3708.  The examiner can normally be reached on M-F 7:30a.m.-5:00p.m. EST. If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Bhavesh Mehta can be reached on (571) 272-7453.  The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at
http://www.uspto.gov/interviewpractice. 
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/EDGAR X GUERRA-ERAZO/Primary Examiner, Art Unit 2656