DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
Claims 20-26 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  While claims 20-26 recite a “machine-readable medium”, the Applicant's disclosure fails to define the claimed machine-readable medium.  The Applicant’s specification defines a computer-readable storage medium as a non-transitory computer-readable storage medium that
excludes transitory signals in paragraph 0607.  However, claims 20-26 recite a different medium than the computer-readable storage medium.  The United States Patent and Trademark Office (USPTO) is obliged to give the claims their broadest reasonable interpretation consistent with the specifications during proceedings before the USPTO (see In re Zletz, 893 F.2d 319 Fed. Cir. 1989).  The broadest reasonable interpretation of a claim drawn to a computer readable medium typically covers forms of non-transitory tangible media and transitory propagating signals per se in view of the ordinary and customary meaning of computer readable media, particularly when the specification is silent (see MPEP 2111.01).  Thus, the definition of Applicants computer readable medium in the disclosure fails to limit the claim to only non-transitory tangible media, and therefore is non-statutory (see 1351 Off. Gaz. Pat. Office 212 (February 23, 2010)).  Applicant is suggested to replace “machine-readable medium” with “non-transitory computer-readable medium” in order to overcome the 35 U.S.C. 101 rejection.
To expedite a complete examination of the instant application, the claims rejected under 35 U.S.C. 101 as non-statutory subject matter are further rejected as set forth below in anticipation of applicant amending the claims to place them within the four categories of invention.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1, 2, 6, 7, 9, 10, 12, 20-23, 25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Silberman et al. (U.S. Patent 9,560,315) in view of Sugihara (U.S. Patent Application 20200099889).

In regards to claim 1, Silberman teaches a computer-implemented method [Fig. 4; e.g. method, c.13 L.45-52] comprising:
determining a plurality of first facial landmarks [Fig. 1; e.g. portions of the face including facial features, c.7 L.13-39] for each of two or more faces [Fig. 1; e.g. three participants’ faces, c.6 L.31-c.7 L.4] depicted in a first frame of a video segment [e.g. previously sent frame of a video stream, c.3 L.62-c.4 L.16, c.7 L.13-39];
determining a plurality of second facial landmarks [Fig. 1; e.g. portions of the face including facial features, c.7 L.13-39] for each of the two or more faces [Fig. 1; e.g. three participants’ faces, c.6 L.31-c.7 L.4] depicted in a second frame of the video segment [e.g. next frame of the video stream, c.3 L.62-c.4 L.16, c.7 L.13-39]; and
identifying a speaking one of the two or more faces [e.g. determining whether a participant is speaking, c.6 L.53-c.7 L.4, also see c.3 L.62-c.4 L.16], the speaking face being usable to modify the video segment by enlarging the speaking face within the second frame [Fig. 1; e.g. When the participant depicted in the frames of the video stream displayed in the window 112(2) begins speaking, the window 112(2) may be enlarged and the window 112(N) may be shrunk, c.6 L.53-c.7 L.4].
Silberman does not explicitly teach
identifying a speaking one of the two or more faces based at least in part on the plurality of first facial landmarks and the plurality of second facial landmarks, the speaking face being usable to modify the video segment by enlarging the speaking face within the second frame (emphasis added).
However, Sugihara teaches
identifying a speaking one of the two or more faces based at least in part on the plurality of first facial landmarks and the plurality of second facial landmarks [Fig. 8; e.g. identifies a speaker based on the motion of the mouth of each participant included in the site videos by detecting a temporal change of the position of the feature point such as upper and lower lips, 0047-0049], the speaking face being usable to modify the video segment by enlarging the speaking face within the second frame [Fig. 8; e.g. enlarging the partial video including the speaker’s face, 0047-0049].
Therefore, it would have been obvious to one of ordinary skill in the art to have modified Silberman’s method with the features of
identifying a speaking one of the two or more faces based at least in part on the plurality of first facial landmarks and the plurality of second facial landmarks, the speaking face being usable to modify the video segment by enlarging the speaking face within the second frame
in the same conventional manner as taught by Sugihara because Sugihara provides a method for appropriately controlling display sizes of the site videos captured at the sites which avoids frequently changing the display sizes of the speaking participants [0029].

In regards to claim 2, Silberman teaches the computer-implemented method of claim 1, further comprising:
detecting, within the first frame of the video segment, a plurality of first regions [Fig. 2; e.g. plurality of portions of the previously sent frame, c.10 L.66-c.12 L.30] each corresponding to a different one of the two or more faces [Fig. 1, 2; e.g. each of the participant’s faces, c.10 L.66-c.12 L.30], the plurality of first facial landmarks determined for each of the two or more faces being determined within a corresponding one of the plurality of first regions [Fig. 2; e.g. the plurality of facial features such as lips correspond to each of the portions of the participants, c.10 L.66-c.12 L.30]; and
detecting, within the second frame of the video segment, a plurality of second regions [Fig. 2; e.g. plurality of portions of the next frame, c.10 L.66-c.12 L.30] each corresponding to a different one of the two or more faces [Fig. 1, 2; e.g. each of the participant’s faces, c.10 L.66-c.12 L.30], the plurality of second facial landmarks determined for each of the two or more faces being determined within a corresponding one of the plurality of second regions [Fig. 2; e.g. the plurality of facial features such as lips correspond to each of the portions of the participants, c.10 L.66-c.12 L.30].

In regards to claim 6, Silberman does not explicitly teach the computer-implemented method of claim 1, wherein identifying the speaking face further comprises:
determining at least a moving one of the plurality of second facial landmarks has changed position with respect to one or more of the plurality of first facial landmarks.
However, Sugihara teaches the computer-implemented method of claim 1, wherein identifying the speaking face [see rejection of claim 1 above] further comprises:
determining at least a moving one of the plurality of second facial landmarks has changed position with respect to one or more of the plurality of first facial landmarks [e.g. determining a temporal change of the upper and lower lip positions based on the motion of the mouth, 0047].
Therefore, it would have been obvious to one of ordinary skill in the art to have modified Silberman’s method with the features of
determining at least a moving one of the plurality of second facial landmarks has changed position with respect to one or more of the plurality of first facial landmarks
in the same conventional manner as taught by Sugihara because Sugihara provides a method for appropriately controlling display sizes of the site videos captured at the sites which avoids frequently changing the display sizes of the speaking participants [0029].

In regards to claim 7, Silberman does not explicitly teach the computer-implemented method of claim 6, wherein identifying the speaking face further comprises determining the moving second facial landmark has moved with respect to a non-corresponding one of the plurality of first facial landmarks,
the moving second facial landmark corresponds to a corresponding one of the plurality of first facial landmarks,
the moving second facial landmark and the corresponding first facial landmark each represent a first one of a pair of lips, the non-corresponding first facial landmark represents a second one of the pair of lips, and the first lip is different from the second lip.
However, Sugihara teaches the computer-implemented method of claim 6, wherein identifying the speaking face [see rejection of claim 6 above] further comprises determining the moving second facial landmark has moved with respect to a non-corresponding one of the plurality of first facial landmarks [e.g. A temporal change of the upper and lower lip positions is determined based on the motion of the mouth. The Examiner interprets the second facial landmark as the position for the upper lip and the non-corresponding one of the plurality of first facial landmarks as the position for the lower lip, 0047],
the moving second facial landmark corresponds to a corresponding one of the plurality of first facial landmarks [e.g. The Examiner interprets the moving second facial landmark as the position of the upper lip after the temporal change. Also, the Examiner interprets the corresponding one of the plurality of first facial landmarks as the position of the upper lip before the temporal change, 0047],
the moving second facial landmark and the corresponding first facial landmark each represent a first one of a pair of lips [e.g. The Examiner interprets the moving second facial landmark as the position of the upper lip after the temporal change. Also, the Examiner interprets the corresponding one of the plurality of first facial landmarks as the position of the upper lip before the temporal change, 0047], the non-corresponding first facial landmark represents a second one of the pair of lips [e.g. The Examiner interprets the non-corresponding one of the plurality of first facial landmarks as the position for the lower lip, 0047], and the first lip is different from the second lip [e.g. the upper lip is different from the lower lip, 0047].
Therefore, it would have been obvious to one of ordinary skill in the art to have modified Silberman’s method with the features of determining the moving second facial landmark has moved with respect to a non-corresponding one of the plurality of first facial landmarks,
the moving second facial landmark corresponds to a corresponding one of the plurality of first facial landmarks,
the moving second facial landmark and the corresponding first facial landmark each represent a first one of a pair of lips, the non-corresponding first facial landmark represents a second one of the pair of lips, and the first lip is different from the second lip
in the same conventional manner as taught by Sugihara because Sugihara provides a method for appropriately controlling display sizes of the site videos captured at the sites which avoids frequently changing the display sizes of the speaking participants [0029].

In regards to claim 9, Silberman teaches the computer-implemented method of claim 1, further comprising:
modifying the second frame by enlarging the speaking face within the second frame [Fig. 1; e.g. When the participant depicted in the frames of the video stream displayed in the window 112(2) begins speaking, the window 112(2) may be enlarged and the window 112(N) may be shrunk. The enlargement of the window also enlarges the participant’s face, c.6 L.53-c.7 L.4]; and
transmitting the modified video segment to a recipient computing system for display thereby [Fig. 1; e.g. the enlarged window is sent to a computing device for display, c.6 L.31-c.7 L.4].

In regards to claim 10, Silberman teaches the computer-implemented method of claim 1, wherein the computer- implemented method is performed by a computing system [Fig. 1; e.g. computing system 100, c.6 L.4-52] that comprises a sender computing system [Fig. 1; e.g. one of the computing devices, c.6 L.4-52] and a recipient computing system [Fig. 1; e.g. another one of the computing devices, c.6 L.4-52], and the method further comprises:
obtaining, by the sender computing system, the video segment [Fig. 1; e.g. one of the computing devices captures frames for a video stream, c.6 L.4-52];
transmitting, by the sender computing system, the video segment to the recipient computing system [Fig. 1; e.g. one of the computing devices sends the frames of the video stream to another one of the computing devices, c.6 L.4-52], the recipient computing system determining the plurality of first and second facial landmarks [Fig. 1; e.g. the other computing device determines the portions of the face including facial features from the previously sent frame and the next frame of the video stream, c.3 L.62-c.4 L.16, c.7 L.13-39], identifying the speaking face, and modifying the video segment [Fig. 1; e.g. When the participant depicted in the frames of the video stream displayed in the window 112(2) begins speaking, the window 112(2) may be enlarged and the window 112(N) may be shrunk, c.6 L.53-c.7 L.4]; and
displaying, by the recipient computing system, the modified video segment [Fig. 1; e.g. displaying the enlarged window on the other computing device’s display device, c.6 L.4-c.7 L.4].

In regards to claim 12, Silberman teaches the computer-implemented method of claim 1, wherein the computer-implemented method is performed by a computing system [Fig. 1; e.g. computing system 100, c.6 L.4-52] that comprises a sender computing system [Fig. 1; e.g. one of the computing devices, c.6 L.4-52] and a recipient computing system [Fig. 1; e.g. another one of the computing devices, c.6 L.4-52], and the method further comprises:
receiving, by the sender computing system, the video segment [Fig. 1; e.g. one of the computing devices captures frames for a video stream, c.6 L.4-52], the sender computing system determining the plurality of first and second facial landmarks [Fig. 1; e.g. the computing device determines the portions of the face including facial features from the previously sent frame and the next frame of the video stream, c.3 L.62-c.4 L.16, c.7 L.13-39], and identifying the speaking face [Fig. 1; e.g. When the participant depicted in the frames of the video stream displayed in the window 112(2) begins speaking, the window 112(2) may be enlarged and the window 112(N) may be shrunk, c.6 L.53-c.7 L.4]; 
transmitting, by the sender computing system, the video segment and an identification of the speaking face to the recipient computing system [Fig. 1; e.g. the computing device sends the frames of a video stream and the identified facial features to the other computing device, c.6 L.4-52, c.7 L.13-39], the recipient computing system modifying the video segment based at least in part on the identification of the speaking face [Fig. 1; e.g. When the participant depicted in the frames of the video stream displayed in the window 112(2) begins speaking, the window 112(2) may be enlarged and the window 112(N) may be shrunk, c.6 L.53-c.7 L.4]; and 
displaying, by the recipient computing system, the modified video segment [Fig. 1; e.g. displaying the enlarged window on the other computing device’s display device, c.6 L.4-c.7 L.4].

In regards to claim 20, Silberman teaches a machine-readable medium [e.g. memory, c.18 L.66-c.19 L.17] having stored thereon a set of instructions [e.g. processor-executable program instructions, c.18 L.66-c.19 L.17], which if performed by one or more processors [e.g. processor, c.18 L.66-c.19 L.17], cause the one or more processors to at least:
obtain a plurality of frames [Fig. 1; e.g. each computing device captures a plurality of frames, c.6 L.4-52] each depicting a plurality of first conference participants participating in a video conference [Fig. 1; e.g. three participants in the videoconferencing session, c.6 L.4-52];
detect image regions [Fig. 1; e.g. decompose each frame into the set of portions, c.7 L.13-39] in each of the plurality of frames depicting faces of the plurality of first conference participants [Fig. 1; e.g. each of the frames depict faces of the videoconferencing participants, c.5 L.37-c.6 L.3];
determine facial landmarks in each of the image regions [Fig. 1; e.g. portions of the face including facial features, c.7 L.13-39]; and
identify a speaking one of the plurality of first conference participants in a subset of the image regions [e.g. determining whether a participant is speaking, c.6 L.53-c.7 L.4, also see c.3 L.62-c.4 L.16], the subset of the image regions each having been detected in a corresponding one of a subset of the plurality of frames [Fig. 1; e.g. the determined window of the speaking participant corresponds to the frames of the video stream of that speaking participant, c.6 L.53-c.7 L.4, also see c.3 L.62-c.4 L.16], each of the subset of the plurality of frames being modifiable by enlarging one of the subset of the image regions detected in the frame [Fig. 1; e.g. When the participant depicted in the frames of the video stream displayed in the window 112(2) begins speaking, the window 112(2) may be enlarged and the window 112(N) may be shrunk, c.6 L.53-c.7 L.4].
Silberman does not explicitly teach
the plurality of first conference participants sharing a common physical location;
identify a speaking one of the plurality of first conference participants in a subset of the image regions based at least in part on the facial landmarks (emphasis added).
However, Sugihara teaches
the plurality of first conference participants sharing a common physical location [Fig. 8; e.g. two or more speakers in a same site video, 0049];
identify a speaking one of the plurality of first conference participants in a subset of the image regions based at least in part on the facial landmarks [Fig. 8; e.g. identifies a speaker based on the motion of the mouth of each participant included in the site videos by detecting a temporal change of the position of the feature point such as upper and lower lips, 0047-0049].
Therefore, it would have been obvious to one of ordinary skill in the art to have modified Silberman’s method with the features of
the plurality of first conference participants sharing a common physical location;
identify a speaking one of the plurality of first conference participants in a subset of the image regions based at least in part on the facial landmarks
in the same conventional manner as taught by Sugihara because Sugihara provides a method for appropriately controlling display sizes of the site videos captured at the sites which avoids frequently changing the display sizes of the speaking participants [0029].

In regards to claim 21, the claim recites similar limitations as claim 9.  Therefore, the same rationale as claim 9 is applied.

In regards to claim 22, the claim recites similar limitations as claim 9.  Therefore, the same rationale as claim 9 is applied.

In regards to claim 23, Silberman teaches the machine-readable medium of claim 20, wherein the set of instructions, when performed by the one or more processors, cause the one or more processors to generate a user interface displaying the plurality of frames including the modified subset of the plurality of frames [Fig. 1; e.g. displaying one or more windows including the enlarged window, c.6 L.4-52].

In regards to claim 25, Silberman does not explicitly teach the machine-readable medium of claim 20, wherein the second facial landmark corresponds to a portion of a first one of an upper lip and a lower lip of the face of the speaking first conference participant and the first facial landmark corresponds to a portion of a different second one of the upper lip and the lower lip of the face of the speaking first conference participant.
However, Sugihara teaches the machine-readable medium of claim 20, wherein the second facial landmark corresponds to a portion of a first one of an upper lip and a lower lip of the face of the speaking first conference participant and the first facial landmark corresponds to a portion of a different second one of the upper lip and the lower lip of the face of the speaking first conference participant [e.g. A temporal change of the upper and lower lip positions is determined based on the motion of the mouth. The Examiner interprets the second facial landmark as the position for the upper lip and the lower lip after the temporal change, whereas the Examiner interprets the first facial landmark as the position for the upper lip and the lower lip before the temporal change, 0047].
Therefore, it would have been obvious to one of ordinary skill in the art to have modified Silberman’s method with the features of wherein the second facial landmark corresponds to a portion of a first one of an upper lip and a lower lip of the face of the speaking first conference participant and the first facial landmark corresponds to a portion of a different second one of the upper lip and the lower lip of the face of the speaking first conference participant in the same conventional manner as taught by Sugihara because Sugihara provides a method for appropriately controlling display sizes of the site videos captured at the sites which avoids frequently changing the display sizes of the speaking participants [0029].

Claim(s) 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Silberman et al. (U.S. Patent 9,560,315) in view of Sugihara (U.S. Patent Application 20200099889) as applied to claim 1 above, and further in view of Ai et al. (U.S. Patent Application 20150154960).

In regards to claim 3, Silberman teaches the computer-implemented method of claim 1, wherein a portion of the plurality of first facial landmarks determined for each of the two or more faces are first lip landmarks [Fig. 2; e.g. a portion 202(2) that includes the lips of the mouth for each of the participants from the previous frame, c.10 L.66-c.12 L.30], a portion of the plurality of second facial landmarks determined for each of the two or more faces are second lip landmarks [Fig. 2; e.g. a portion 202(2) that includes the lips of the mouth for each of the participants from the next frame, c.10 L.66-c.12 L.30], and identifying the speaking face [see rejection of claim 1 above].
Silberman does not explicitly teach
formulating flow vectors for each of the two or more faces using positions of the first lip landmarks determined for the face and positions of the second lip landmarks determined for the face;
classifying the flow vectors formulated for each of the two or more faces as indicating speaking activity has or has not occurred; and
identifying, as the speaking face, one of the two or more faces for which the flow vectors were classified as indicating speaking activity has occurred.
However, Sugihara teaches
formulating motion [e.g. motion of the mouth, 0047] for each of the two or more faces [e.g. each of the participant’s faces, 0047] using positions of the first lip landmarks [e.g. A known face detection process detects the positions of the upper and lower lips. A temporal change of the positions of the lips means that there is a frame before the change and a frame after the change. Accordingly, the first lip landmarks correspond to Sugihara’s positions of the upper and lower lips before the change, 0047] determined for the face and positions of the second lip landmarks determined for the face [e.g. As mentioned above, a temporal change of the positions of the lips means that there is a frame before the change and a frame after the change. Accordingly, the second lip landmarks correspond to Sugihara’s positions of the upper and lower lips after the change, 0047];
classifying the motion formulated for each of the two or more faces as indicating speaking activity has or has not occurred [e.g. detection of the motion determines whether or not a participant is speaking, 0047]; and
identifying, as the speaking face, one of the two or more faces for which the motion were classified as indicating speaking activity has occurred [e.g. identifies a speaker based on the motion of the mouth of each participant included in the site videos, 0047].
Therefore, it would have been obvious to one of ordinary skill in the art to have modified Silberman’s method with the features of
formulating motion for each of the two or more faces using positions of the first lip landmarks determined for the face and positions of the second lip landmarks determined for the face;
classifying the motion formulated for each of the two or more faces as indicating speaking activity has or has not occurred; and
identifying, as the speaking face, one of the two or more faces for which the motion were classified as indicating speaking activity has occurred
in the same conventional manner as taught by Sugihara because Sugihara provides a method for appropriately controlling display sizes of the site videos captured at the sites which avoids frequently changing the display sizes of the speaking participants [0029].
Silberman as modified by Sugihara does not explicitly teach
formulating flow vectors for each of the two or more faces using positions of the first lip landmarks determined for the face and positions of the second lip landmarks determined for the face (emphasis added);
classifying the flow vectors formulated for each of the two or more faces as indicating speaking activity has or has not occurred (emphasis added); and
identifying, as the speaking face, one of the two or more faces for which the flow vectors were classified as indicating speaking activity has occurred (emphasis added).
However, AI teaches
flow vectors [e.g. identify motion vectors, 0024];
Therefore, it would have been obvious to one of ordinary skill in the art to have modified the combination of Silberman’s method and the teachings of Sugihara with the features of flow vectors in the same conventional manner as taught by Ai because flow vectors are well known and commonly used in the art of facial recognition systems [0024].

Claim(s) 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Silberman et al. (U.S. Patent 9,560,315) in view of Sugihara (U.S. Patent Application 20200099889) as applied to claim 1 above, and further in view of Feng et al. (U.S. Patent Application 20170213071) and Chen et al. (U.S. Patent 9,462,293).

In regards to claim 4, Silberman teaches the computer-implemented method of claim 1, further comprising:
modifying the second frame by enlarging the speaking face within the second frame [Fig. 1; e.g. When the participant depicted in the frames of the video stream displayed in the window 112(2) begins speaking, the window 112(2) may be enlarged and the window 112(N) may be shrunk, c.6 L.53-c.7 L.4].
Silberman as modified by Sugihara does not explicitly teach 
the second frame having an image resolution;
enlarging the speaking face within the second frame comprising creating an upscaled image by upscaling the speaking face to the image resolution; and
replacing the second frame with the upscaled image.
However, Feng teaches the computer-implemented method of claim 1, further comprising:
the second frame having an image resolution [e.g. The lowermost level of the general image pyramid may be an image obtained by enlarging the target image. The target image would have a certain image resolution, 0084];
enlarging the face within the second frame comprising creating an upscaled image by upscaling the face to the image resolution [e.g. the target image having a target face size is upscaled to a resolution of 1024x1024 pixels, 0084, also see 0069].
Therefore, it would have been obvious to one of ordinary skill in the art to have modified the combination of Silberman’s method and the teachings of Sugihara with the features of
the second frame having an image resolution;
enlarging the face within the second frame comprising creating an upscaled image by upscaling the face to the image resolution
in the same conventional manner as taught by Feng because scaling images are well known and commonly used in the art of image processing [0069].
Feng does not explicitly teach the speaking face.
However, Silberman already taught the speaking face [e.g. determining whether a participant is speaking, c.6 L.53-c.7 L.4, also see c.3 L.62-c.4 L.16].
Therefore, it would have been obvious to one of ordinary skill in the art to have modified Feng’s detected face with Silberman’s speaking face because the resulting image after upscaling does not depend on what type of facial image is processed.  In other words, any facial image can be upscaled to generate an upscaled facial image.
Silberman as modified by Sugihara and Feng does not explicitly teach replacing the second frame with the upscaled image.
However, Chen teaches replacing the second frame with the upscaled image [e.g. replacing the first higher resolution image with the estimated second higher resolution image, see claim 1 of Chen].
Therefore, it would have been obvious to one of ordinary skill in the art to have modified the combination of Silberman’s method and the teachings of Sugihara and Feng with the features of replacing the second frame with the upscaled image in the same conventional manner as taught by Chen because Chen provides a more accurate motion estimation applied in the super resolution process which allows for more accurate high resolution frames of data [c.5 L.3-6]. This contributes to a more accurate resulting image with fewer artifacts [c.5 L.3-6].

Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Silberman et al. (U.S. Patent 9,560,315) in view of Sugihara (U.S. Patent Application 20200099889) and further in view of Feng et al. (U.S. Patent Application 20170213071) and Chen et al. (U.S. Patent 9,462,293) as applied to claim 4 above, and further in view of Saberian et al. (U.S. Patent Application 20170083752).

In regards to claim 5, Silberman as modified by Sugihara and Feng does not explicitly teach the computer-implemented method of claim 4, wherein a deep learning model is used to upscale the speaking face to the image resolution.
However, Saberian teaches the computer-implemented method of claim 4, wherein a deep learning model [e.g. deep convolutional neural network, 0038, 0048] is used to upscale the face [e.g. upscaling the facial image, 0064] to the image resolution [e.g. image size of 225x225 pixels, 0064].
Therefore, it would have been obvious to one of ordinary skill in the art to have modified the combination of Silberman’s method and the teachings of Sugihara, Feng, and Chen with the features of wherein a deep learning model is used to upscale the speaking face to the image resolution in the same conventional manner as taught by Saberian because Saberian provides a method for face detection that saves time and cost, as well as computational and memory resources [0043, 0046, 0064].
Saberian does not explicitly teach the speaking face.
However, Silberman already taught the speaking face [e.g. determining whether a participant is speaking, c.6 L.53-c.7 L.4, also see c.3 L.62-c.4 L.16].
Therefore, it would have been obvious to one of ordinary skill in the art to have modified Saberian’s detected face with Silberman’s speaking face because the resulting image after upscaling does not depend on what type of facial image is processed.  In other words, any facial image can be upscaled to generate an upscaled facial image.

Claim(s) 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Silberman et al. (U.S. Patent 9,560,315) in view of Sugihara (U.S. Patent Application 20200099889) as applied to claim 10 above, and further in view of Mayer et al. (U.S. Patent Application 20190058847).

In regards to claim 11, Silberman as modified by Sugihara does not explicitly teach the computer-implemented method of claim 10, further comprising:
receiving, by the recipient computing system, user input indicating an amount of enlargement, the recipient computing system modifying the video segment by enlarging the speaking face by the amount of enlargement.
However, Faulkner teaches the computer-implemented method of claim 10, further comprising:
receiving, by the recipient computing system, user input indicating an amount of enlargement [Fig. 1; e.g. a device can receive user input to adjust the aspect ratio by increasing the aspect ratio of the rendering, c.13 L.10-53], the recipient computing system modifying the video segment by enlarging the speaking face by the amount of enlargement [Fig. 1; e.g. the increase in aspect ratio of the rendering also increases the speaker’s face, c.13 L.10-53].
Therefore, it would have been obvious to one of ordinary skill in the art to have modified the combination of Silberman’s method and the teachings of Sugihara with the features of receiving, by the recipient computing system, user input indicating an amount of enlargement, the recipient computing system modifying the video segment by enlarging the speaking face by the amount of enlargement in the same conventional manner as taught by Faulkner because scaling or changing the window sizes of a video are commonly used and well known in the art of user interfaces.

Allowable Subject Matter









Claims 8, 24, 26 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

In regards to claim 8, Silberman as modified by Sugihara teaches the computer-implemented method of claim 6, wherein determining the moving second facial landmark has changed position with respect to the one or more first facial landmarks [please see rejection of claim 6 above].
Silberman as modified by Sugihara fails to teach or suggest
determining a rate at which the moving second facial landmark has changed position with respect to the one or more first facial landmarks; and
concluding the moving second facial landmark has changed position with respect to the one or more first facial landmarks when the rate exceeds a threshold value.
concluding the moving second facial landmark has changed position with respect to the one or more first facial landmarks when the rate exceeds a threshold value.

In regards to claim 24, the prior art of record fails to teach or suggest the machine-readable medium of claim 20, wherein a first of the image regions depicts a face of the speaking first conference participant, a second of the image regions depicts the face of the speaking first conference participant, the first image region was detected in a first of the plurality of frames, the second image region was detected in a second of the plurality of frames, the first frame occurs before the second frame, and identifying the speaking first conference participant comprises using at least one heuristic to determine at least a second one of the facial landmarks determined in the second image region has moved with respect to at least a first one of the facial landmarks determined in the first image region.

In regards to claim 26, the prior art of record fails to teach or suggest the machine-readable medium of claim 20, wherein the set of instructions, when performed by the one or more processors, cause the one or more processors to supply the facial landmarks to at least one neural network, which identifies the speaking first conference participant in each of the subset of the image regions.

Claims 13-19 are allowed.

In regards to claim 13, the claim recites similar limitations as claim 1, but with the addition of using one or more neural networks to identify the speaking conference participant.  Furthermore, the prior art of record fails to teach or suggest the use of one or more neural networks to identify the speaking conference participant.  Therefore, claim 13 is allowable over the prior art.

In regards to claims 14-19, the claims depend on claim 13.  Therefore, the claims 14-19 are allowable for at least the same reason as claim 13.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW SHIN whose telephone number is (571)270-5764. The examiner can normally be reached Monday - Friday from 11:00AM to 7:00PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on 5712722976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANDREW SHIN/Examiner, Art Unit 2612                                                                                                                                                                                                        
/JENNIFER MEHMOOD/Supervisory Patent Examiner, Art Unit 2612