Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Applicant’s response dated 04/13/2021 has been received and made of record. 
Claims 1-5 and 10-20 have been amended. 
Claims 1-20 are currently pending in Application 16/393,119.

Response to Arguments
Applicant’s arguments, see Applicant’s Arguments/Remarks, filed 04/13/2021, with respect to the  35 USC 101 rejection of claims 13-20 have been fully considered and are persuasive.  The 35 USC 101 rejection of claims 13-20 has been withdrawn. 
Applicant’s arguments, see Applicant’s Arguments/Remarks, filed 04/13/2021, with respect to the rejection(s) of claim(s) 1-20 under 35 USC 102(a)(1)/35 USC 103 as anticipated by Kenoyer or obvious over Kenoyer in view of Rahman have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Wee (US 2005/0080849 A1). 
	Applicant argues that Kenoyer and Rahman do not disclose cropping video content (or switching between cameras) based on a region of interest (an active speaker) per the amended independent claims. Examiner agrees. 
	However, Wee discloses cropping video content and switching between cameras based on a region of interest as determined by a microphone array and video analysis (Wee: Paragraphs [0037]-[0038], “If the audio level measured by a microphone is higher than that of all others, then the camera view that covers the visible region around that microphone may be selected. When a speech/noise discriminator classifies an audio input as speech, then the view of the individual nearest that . 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 20 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. Claim 20 is written as dependent on claim 19, but contradicts the limitation therein (“includes only a name identifier displayed proximate to… the current speaker”, “includes a name identifier displayed proximate each… participant that is selected”). For purposes of examination, claim 20 will be interpreted as depending from claim 17.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6 and 9-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wee (US 2005/0080849 A1) in view of Kenoyer (US 2014/0192138 A1). 

Regarding claims 1, 10, and 13, Wee discloses A method, apparatus, and non-transitory computer readable medium (Wee: Claims 1, 21, and 31) comprising: 
	a memory configured to store instructions including one or more applications that support online meetings with endpoints over a network (Wee: Figure 5 and Paragraph [0053], “The processing resources 232 and 242 may include processors, memory”); 
	a display to display video content during an online meeting (Wee: Figure 5 and Paragraph [0049], “The rich media environment 252 has an arrangement of sensing and rendering components that includes a set of digital cameras 150-159, a set of microphones 170-174, a microphone array 175, a pair of video displays 210-212, and a set of speakers 190-194. A set of individuals 130-136 are shown along with a conference table 226. The individual 132 has a handheld device 224, the individual 130 has a handheld device 326 and the individual 135 has a handheld device 324”); 
	one or more cameras to capture video content within a meeting room at an endpoint during the online meeting (Wee: Figure 5 and Paragraph [0049], “The rich media environment 252 has an arrangement of sensing and rendering components that includes a set of digital cameras 150-159, a set ; 
	one or more microphones to capture audio content within the meeting room at the endpoint during the online meeting (Wee: Figure 5 and Paragraph [0049], “The rich media environment 252 has an arrangement of sensing and rendering components that includes a set of digital cameras 150-159, a set of microphones 170-174, a microphone array 175, a pair of video displays 210-212, and a set of speakers 190-194. A set of individuals 130-136 are shown along with a conference table 226. The individual 132 has a handheld device 224, the individual 130 has a handheld device 326 and the individual 135 has a handheld device 324”); 
	a communication interface configured to enable communications over a network (Wee: Figure 5 and Paragraph [0052], “The network resources 230 and 240 may include one or more network signal paths, network interfaces, client and server hardware and software, etc. The network resources 230 and 240 may be embodied as client systems that communicate with an external server (not shown) or may be embodied as clients/servers with respect to one another”); 
	and a processor to execute and control operations of the one or more applications (Wee: Figure 5 and Paragraph [0053], “The processing resources 232 and 242 may include processors, memory”) so as to: 
	during an online meeting, analyzing video content and audio content generated within a meeting room at an endpoint, wherein the video content is captured via one or more cameras within the meeting room, and the audio content is captured via one or more microphones within the meeting room (Wee: Paragraphs [0037]-[0038], “If the audio level measured by a microphone is higher than that of all others, then the camera view that covers the visible region around that microphone may be selected. ; 
	identifying a plurality of participants within the meeting room from the video content utilizing face recognition analysis of an isolated facial image of each participant within the video content (Wee: Paragraph [0075], “finding an individual may be performed by recognizing the individual using machine vision. The availability of the task of visual recognition may depend on the availability of digital cameras and processing and networking resources and software for obtaining an image of an individual from a digital camera and comparing the obtained image to stored images of known individuals. The task of finding an individual may be performed by voice recognition”); 
	determining an identified participant as a current speaker at the endpoint based upon an analysis of the video content and the audio content (Wee: Paragraph [0037], “If the audio level measured by a microphone is higher than that of all others, then the camera view that covers the visible region around that microphone may be selected. When a speech/noise discriminator classifies an audio input as speech, then the view of the individual nearest that microphone whose mouth and jaw are moving may be selected); 
	determining a region of interest (ROI) within the video content that includes the current speaker (Wee: Paragraph [0037], “If the audio level measured by a microphone is higher than that of all others, then the camera view that covers the visible region around that microphone may be selected. When a ; 
	adjusting the video content captured by the one or more cameras based upon the ROI, wherein the adjusting comprises cropping the video content and/or or making a scene change that comprises switching a camera view that captures video content from one camera to another camera in the meeting room (Wee: Paragraphs [0037]-[0038], “the best camera view or views may be selected… video-based face detection, motion detection, and skin-color detection methods may be used to digitally zoom, center, and/or crop the view to better focus the camera on the individuals with which it is associated”, and [0062], “the communication provider 18, for the interest thread of the main conversation, may select the digital camera 140 if the speaker involved in the main conversation moves out of the coverage areas of the digital cameras 143 and 144”); 
	generating, modifying and/or maintaining identified participant information in response to one or more changes associated with the video content and/or the audio content, wherein the one or more changes comprises one or more of the following: an identification of each participant at the endpoint, a determination of a newly identified participant as the current speaker at the endpoint, and a change in location of one or more identified participants at the endpoint (Wee: Paragraph [0030], “selects a subset of the sensing and rendering components of the rich media environments 12-14 for an interest thread in response to a location of each individual involved in the interest thread”, and Paragraphs [0088]-[0089], “user manager 404 identifies the individuals 360-363 as they enter the rich media environment 300… user manager 404 may identify the individuals 360-363 using the services provided by the service manager 400, e.g. image or voice recognition. The user manager 404 tracks the locations of the individuals 360-363 within the rich media environment 300 over time using the services provided by the service manager 400”).
	determining whether one or more criteria has been satisfied based upon the one or more changes associated with the video content and/or the audio content (Wee: Paragraph [0075], “The task of finding an individual may be performed by recognizing the individual using machine vision. The availability of the task of visual recognition may depend on the availability of digital cameras and processing and networking resources and software for obtaining an image of an individual from a digital camera and comparing the obtained image to stored images of known individuals. The task of finding an individual may be performed by voice recognition. The availability of a voice recognition task may depend on the availability of microphones and processing and networking resources and software for obtaining a speech sample of an individual and comparing the obtained speech sample to stored speech samples of known individuals”, also Paragraphs [0037]-[0038], “the best camera view or views may be selected… video-based face detection, motion detection, and skin-color detection methods may be used to digitally zoom, center, and/or crop the view to better focus the camera on the individuals with which it is associated”, and [0062], “the communication provider 18, for the interest thread of the main conversation, may select the digital camera 140 if the speaker involved in the main conversation moves out of the coverage areas of the digital cameras 143 and 144”; the image/voice recognition systems determine matches based on criteria)

	Wee does not explicitly disclose in response to a determination of one or more criteria being satisfied, providing the identified participant information in one or more video frames of the video content for transmission to a remote endpoint. 
	Kenoyer discloses in response to a determination of one or more criteria being satisfied, providing the identified participant information in one or more video frames of the video content for transmission to a remote endpoint (Kenoyer: Claim 1 and Paragraph [0109], “the local videoconferencing device may obtain the participant information for the participants whose faces 
	Wee and Kenoyer are analogous art in the same field of endeavor as the invention as both are drawn to video conferencing systems. The differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains; that is, it would have been obvious to incorporate Kenoyer’s information overlay into the system of Wee to allow meeting participants to be better informed and able to refer to speakers by name. 

Wee-Kenoyer teaches 2/11. The method/apparatus of claim 1/10, wherein the one or more microphones comprises a plurality of microphones, and the analysis comprises mapping audio content captured by at least one of the plurality of microphones with a spatial location within the meeting room (Wee: Paragraph [0037], “If the audio level measured by a microphone is higher than that of all others, then the camera view that covers the visible region around that microphone may be selected. When a speech/noise discriminator classifies an audio input as speech, then the view of the individual nearest that microphone whose mouth and jaw are moving may be selected”).

Wee-Kenoyer teaches 3/14. The method/medium of claim 1/13, wherein the one or more changes further comprises one or more of the following: a scene change for one or more video frames within the video content, and a crop change for one or more video frames within the video content (Wee: Paragraphs [0037]-[0038], “the best camera view or views may be selected… video-based face detection, motion detection, and skin-color detection methods may be used to digitally zoom, center, and/or crop the view to better focus the camera on the individuals with which it is associated”, and [0062], “the communication provider 18, for the interest thread of the main conversation, may select the digital camera 140 if the speaker involved in the main conversation moves out of the coverage areas of the digital cameras 143 and 144”).

Wee-Kenoyer teaches 4/15. The method/media of claim 1/14, wherein the identified participant information is provided in encoded video content to the remote endpoint in response to a determination of the one or more criteria being satisfied on a video frame-by-frame basis, such that the identified participant information is only provided in encoded video content for a current video frame in response to the one or more criteria being satisfied prior to encoding of the current video frame (Kenoyer: Paragraph [0110], “Facial recognition may also be used to dynamically change or move the displayed participant information as the video image changes. For example, participants may walk or move across the video frame. The local videoconferencing device may operate to track the participants' faces as they move across the video frame and may update the display of the participant information accordingly. For example, if a participant moves from one side of the scene to the other then his participant information may be moved to maintain proximity with the participant. If a participant disappears out of the scene then his participant information may be removed from the display screen. If a new participant appears in the scene then participant information for the new participant may be displayed”; the frames where participants have entered/exited the scene or are in motion have their participant information determined on a “frame-by-frame basis”).

Wee-Kenoyer teaches 5/12/16. The method/apparatus/media of claim 1/10/15, wherein the one or more criteria comprises one or more of the following: a set time interval has elapsed, any newly identified participant in the meeting room at the endpoint, movement of an identified participant that changes by a set distance, identification of a new speaker at the endpoint, a scene change for one or more video frames within the video content, and a crop change for one or more video frames within the video content (Wee: Paragraphs [0037]-[0038], “the best camera view or views may be selected… video-based face detection, motion detection, and skin-color detection methods may be used to digitally zoom, center, and/or crop the view to better focus the camera on the individuals with which it is associated”, and [0062], “the communication provider 18, for the interest thread of the main conversation, may select the digital camera 140 if the speaker involved in the main conversation moves out of the coverage areas of the digital cameras 143 and 144”).

Wee-Kenoyer teaches 6/17. The method/media of claim 5/16, further comprising: receiving the video content at the remote endpoint and extracting identified participant information from the video content; and displaying the video content at the remote endpoint, wherein displayed video content includes a name identifier displayed proximate an identified participant (Kenoyer: Paragraph [0110], “Facial recognition may also be used to dynamically change or move the displayed participant information as the video image changes. For example, participants may walk or move across the video frame. The local videoconferencing device may operate to track the participants' faces as they move across the video frame and may update the display of the participant information accordingly. For example, if a participant moves from one side of the scene to the other then his participant information may be moved to maintain proximity with the participant”, and Paragraph [0111], “the remote videoconferencing device at the remote endpoint may perform facial recognition techniques instead of the local videoconferencing device at the local endpoint”).

Wee-Kenoyer teaches 9/18. The method/media of claim 6/17, wherein the name identifier remains displayed proximate the identified participant after a crop change or a scene change of one or more video frames including the identified participant (Kenoyer: Paragraph [0110], “Facial recognition may also be used to dynamically change or move the displayed participant information as the video image changes. For example, participants may walk or move across the video frame. The local videoconferencing device may operate to track the participants' faces as they move across the video frame and may update the display of the participant information accordingly. For example, if a participant moves from one side of the scene to the other then his participant information may be moved to maintain proximity with the participant”).

Claims 7-8 and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wee and Kenoyer as applied above in further view of Rahman (US 2010/0085415 A1).

Regarding claims 7 and 19, Wee-Kenoyer teaches The method/medium of claim 6/17, wherein the displayed video content,(Kenoyer: Paragraphs [0072]-[0073], “In this embodiment, the portions of participant information are visually associated with their corresponding participants by displaying a box 503 around each participant and displaying the name 505 of each participant within each respective box 503… In some embodiments, multiple remote participants may be displayed simultaneously on the display device 122, but participant information may only be displayed for a subset of the participants shown on the display. For example, in some embodiments, the local videoconferencing device 120 may be operable to analyze the video information to determine which remote participant is currently speaking and may only display .

	Wee-Kenoyer does not explicitly disclose displaying or not displaying identification information (the selection of embodiments) is based upon user selection at the remote endpoint.
	However, Rahman explicitly discloses this feature (Rahman: Paragraph [0033], “all other conference participants may be notified with the details and personal information of the new participant(s). Each endpoint (either audio or video) could determine, based on user preferences, how or if it should display this information during an ongoing conference. Similarly, when a participant speaks and is identified, details of the speaking participant may be transmitted to all endpoints and each endpoint could configure how or if it should display this information during the conference”). 
	Rahman is analogous art in the same field of endeavor as the invention as both are drawn to participant identification in videoconferencing systems. The differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains; that is, it would have been obvious to incorporate Rahman’s endpoint user configuration options into the system of Wee-Kenoyer so that each user could customize displayed information according to their own preferences. 

Regarding claims 8 and 20, Wee-Kenoyer teaches The method/medium of claim 6/19[17], wherein the displayed video content, identifier displayed proximate an identified participant that is determined to be the current speaker (Kenoyer: Paragraphs [0072]-[0073], “In this embodiment, the portions of participant information are visually associated with their corresponding participants by displaying a box 503 around each participant and displaying the name 505 of each participant within each respective box 503… In some embodiments, multiple remote participants may be displayed simultaneously on the display device 122, but participant information may only be displayed for a subset of the participants shown on the display. For example, in some embodiments, the local videoconferencing device 120 may be operable to analyze the video information to determine which remote participant is currently speaking and may only display participant information for the current speaker. In some embodiments, one of the participants at the local endpoint may move a cursor around the display by operating a mouse or other input device. If the cursor hovers over an image of one of the remote participants, the local videoconferencing device 120 may display the participant information for that participant in response. Otherwise, in some embodiments, participant information may not be displayed for the participants”).

	Wee-Kenoyer does not explicitly disclose displaying or not displaying identification information (the selection of embodiments) is based upon user selection at the remote endpoint.
	However, Rahman explicitly discloses this feature (Rahman: Paragraph [0033], “all other conference participants may be notified with the details and personal information of the new participant(s). Each endpoint (either audio or video) could determine, based on user preferences, how or if it should display this information during an ongoing conference. Similarly, when a participant speaks and is identified, details of the speaking participant may be transmitted to all endpoints and each endpoint could configure how or if it should display this information during the conference”). 
	Rahman is analogous art in the same field of endeavor as the invention as both are drawn to participant identification in videoconferencing systems. The differences between the claimed invention Rahman’s endpoint user configuration options into the system of Wee-Kenoyer so that each user could customize displayed information according to their own preferences. 

	
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 	Kumar (US 2019/0215464 A1) discloses cropping video content based on a region of interest  in a video conferencing system (Kumar: Claim 7, Figure 8, and Paragraph [0077], “At step 810, video decomposer 312 may determine that the identity of the active speaker matches the identity associated with the first face. At step 812, video decomposer 312 may generate a second video stream that includes a first cropped version of the first video stream which displays the first face without displaying the second face”). 	Buckler (US 2014/0104374 A1) discloses selecting a camera and cropping video content based on a region of interest determined by a microphone array in a video conferencing system (Buckler: Paragraph [0099], “in some implementations, the cameras 103 do not change viewing directions, and zooming and panning are achieved by digital cropping and enlargement of portions of the images. In this case, a camera 103 can be selected from among the plurality of cameras 103 in the camera assembly 102 by choosing the camera with the optimal viewing angle”, Claim 2, “selecting a camera from at least two cameras comprises selecting a first one of the cameras if the one or more active speakers are located within a first predetermined region, and selecting a second one of the cameras if the one or more active speakers are located within a second predetermined region”, and Claim 16, “determining .
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to IMAD HUSSAIN whose telephone number is (571)270-3628.  The examiner can normally be reached on Monday-Friday 0900-1700 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamal Divecha can be reached on (571) 272-5863.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained 






/IMAD HUSSAIN/Primary Examiner, Art Unit 2453