DETAILED ACTION
This action is responsive to the Amendments and Remarks received 06/14/2022 in which claims 1–27 are cancelled, claim 36 is amended, and no claims are added as new claims.
Claim Objection
Examiner objects to claim 36 because the amendment eliminates one of three listed alternatives, but the format includes a comma before “and” that is likely unnecessary.  In other words, the amendment removing “posture data” takes the number of items from three down to two and thus an additional change to formatting is likely required.  This objection is intended to be helpful.  If Applicant disagrees, Examiner will withdraw the objection.
Response to Arguments
On page 11 of the Remarks, Applicant contends the teachings of Sunil are deficient because Sunil identifies low-interest portions of video to remove and the claimed invention identifies high-interest portions of video to keep.  It is asserted that such a distinction amounts to teaching features that are opposite.  Examiner disagrees.  Whether the computer programmer writes the function:  if (activity < threshold), then discard or writes the function if (activity > threshold), then keep, the solution is identical.  Both approaches are equivalent because the point is to keep the interesting material and discard the wasted-time material, a fundamental concept in video editing.  Applicant simply did not invent video editing and Applicant’s assertion to the contrary is unreasonable.  By analogy, even though “heads” and “tails” are opposite sides of a coin, a teaching of heads is a teaching of tails, not an opposite teaching.  If a programmer were to write a simple computer program to track the probability that a coin flip would yield heads, could one reasonably argue that another programmer “invented” an opposite computer program to track the probability that a coin flip would yield tails?  Of course not.  The skilled artisan is more sophisticated than what Applicant’s argument suggests and such an argument lacks an understanding of the art and the level of skill in the art.  Thus, Examiner is not persuaded of error.
On page 11 of the Remarks, Applicant attempts to distinguish Sanil from the claimed invention by contending that Sanil’s data is from cameras and microphones while, “Claim 28 clearly requires audio and video to be recorded in addition to behavioral data.”  Examiner finds the problem with this argument is that Sanil is not relied upon in the rejection to teach the averred feature.  Instead, the combination of Featherstone, Crampton, and Brunner is relied upon to teach the claimed behavioral data acquisition sensors.  Therefore, the argument is not probative of error.
On page 12 of the Remarks, Applicant seems to not comprehend Sanil’s discussion of salient portions of video, i.e. “hot spots” that are designated as such due to the amount of motion contained therein.  Examiner finds Applicant’s argument is unreasonable in view of the level of skill in the art regarding video surveillance technology and the use of clipping video to save storage space according to measured movements.  Therefore, the argument is unpersuasive.
On page 14 of the Remarks, Applicant contends it is different to analyze behavioral data captured by depth sensors than behavioral data captured by cameras.  Examiner disagrees.  First, Examiner notes Applicant does not describe how it is different, instead merely proclaiming that it is different.  Attorney arguments and conclusory statements unsupported by factual evidence are entitled to little probative value.  In re Geisler, 116 F.3d 1465, 1470 (Fed. Cir. 1997).  Second, Applicant does not claim or describe how the behavioral data is analyzed, so the prior art’s description that it can be accomplished the same way as known video analysis is better than Applicant’s black box.  Third, the depth data obtained by depth sensors is well-known to be analyzed using image processing techniques.  One can easily confirm this by searching using Google or YouTube, “depth image.”  In a depth image, there are still pixels, the pixels just have depth values instead of color values.  All other technology associated with image processing applies.  So, Examiner disagrees with Applicant that depth images are processed differently than color images.  Except for minor differences the skilled artisan can easily navigate, they are very similar.  See Latta, cited under the Conclusion Section of this Office Action.  In short, Applicant’s arguments regarding the technology and what the skilled artisan knows is unreasonably limited, and thus unpersuasive.
On page 14 of the Remarks, regarding claim 36, Applicant contends removing one in a list of claimed equivalents differentiates over the prior art.  Examiner disagrees.  According to patent doctrine, when Applicant recites in a claim a list of alternatives, it is a tacit admission that the listed alternatives would be obvious in view of one another.  Second, although not well-defined, according to paragraph [0306] of Applicant’s published Specification, the posture volume data appears to be the number of moving pixels (“amount the volume changed over time, corresponding to a large amount or a small amount of motion”).  Because Kyllonen incorporates by reference Latta, Kyllonen’s disclosure is viewed as sufficient.  See rejection, infra.  In addition, while not relied upon for the rejection of claim 36, Examiner notes Bi (US 2018/0295428 A1), cited under the Conclusion Section of this Office Action, teaches detecting a motion event by comparing a number of moving pixels of a blob (which is a term of art) to a threshold (¶ 0070).  Likewise, Zhu (US 2013/0176430 A1), cited under the Conclusion Section of this Office Action, teaches categorizing insignificant or repetitive motions to filter for saliency in video analysis (e.g. ¶ 0032).  For all these reasons, Examiner is not persuaded of patentability.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 28–32, 34, 35, and 38–47 are rejected under 35 U.S.C. 103 as being unpatentable over Featherstone (US 10,310,361 B1), Crampton (WO 98/28908), Brunner (US 2017/0227353 A1), and Advantage Video Systems, “Jeffrey Stansfield of AVS interviews rep about Air-Hush products at the 2019 NAMM Expo,” YouTube video, available at https://www.youtube.com/watch?v=nWzrM99qk_o, accessed 01/07/2021 (“AirHush”), and Sanil (US 2017/0024614 A1).
Regarding claim 28, the combination of Featherstone, Crampton, Brunner, AirHush, and Sanil teaches or suggests a kiosk comprising: a. a booth comprising: i. an enclosing wall forming a perimeter of the booth and defining a booth interior; A. wherein the enclosing wall extends between a bottom of the enclosing wall and a top of the enclosing wall; B. wherein the enclosing wall comprises: a front wall, a back wall, a first side wall, and a second side wall; C. wherein the first side wall and the second side wall extend from the front wall to the back wall; D. wherein the perimeter is at least 14 feet (4.3 meters) and not more than 80 feet (24.4 meters) (Examiner finds this physical arrangement and these dimensions are commensurate with a booth or kiosk of dimensions appropriate for a person; Featherstone, col. 8, ll. 8–10:  teaches a personal-sized booth for generating an image of an occupant, specifically measuring approximately 6x7x6 feet; It is arguable Featherstone does not teach or suggest that those dimensions define a substantially enclosed space as Examiner interprets the claim to require; However, Examiner finds Crampton’s teachings explain the booth is substantially enclosed all around and at least suggests the dimensions are similar to Featherstone’s; Crampton, Fig. 1:  teaches a booth for recording video; Crampton, pg. 12:  explains the view of the booth includes a ceiling and wall removed for illustration purposes; Applicant-admitted prior art (AAPA), Spec., ¶ 0323; AirHush, e.g. minute 0:00-1:00); ii. a chair disposed in the interior of the booth, wherein the chair comprises a seat surface, wherein the chair is approximately centered with respect to the back wall in a first position, wherein the chair is moveable (AirHush video, e.g. minute 0:00-1:00:  shows a moveable chair in the room); iii. a first camera, a second camera, and a third camera for taking video images, each of the cameras aimed toward the booth interior, wherein the first camera, the second camera, and the third camera are disposed adjacent to the front wall (Crampton, Fig. 9 and pg. 13:  teaches any number of cameras can be used and illustrates cameras disposed on the wall of the booth at a height greater than waist height, but less than head height (i.e. 30 to 70 inches) for a normally-sized adult); iv. a first microphone for capturing audio data of sound in the booth interior, wherein the microphone is disposed within the booth interior (Crampton, Fig. 12 and pg. 13:  teaches a microphone 119 in the booth; Featherstone, col. 7, ln. 1:  teaches a microphone; Airhush video, e.g. minute 0:00–1:00:  demonstrates an interview taking place wherein there is obviously a camera and microphone disposed therein (because it’s a video with audio)); v. a first depth sensor and a second depth sensor for capturing behavioral data, wherein the first depth sensor is configured to detect changes in foot position and the second depth sensor is configured to detect changes in torso position, A. wherein the first depth sensor and the second depth sensor are aimed toward the booth interior; B. wherein the first depth sensor is mounted on the first side wall or on the second side wall, and the second depth sensor is mounted on the back wall at a height above a height of the seat surface when the chair is in the first position (Crampton, pg. 7:  teaches the sensors are depth sensors that use structured light to achieve a 3D mesh of the person; Crampton, Fig. 9, Arrangement 2:  shows depth sensors at two heights wherein the heights correspond to Applicant’s claimed heights for the feet and torso of the occupant/participant; It is arguable that Crampton does not teach that structured light is used in a depth sensor; Featherstone, col. 6, ll. 16–21:  explicitly explains that a depth sensor utilizing structured light is one approach to taking depth measurements in a personal booth; Brunner, ¶ 0024 and Fig. 2:  teach the torso camera can be aimed at a downward angle; Crampton, Fig. 7:  illustrates the depth sensors can be on any wall; see also Brunner, Fig. 2 and ¶¶ 0018–0021:  describing many sensor arrangements including a back wall depth sensor arrangement); C. wherein video images, behavioral data, and audio data are captured simultaneously (Examiner finds video and audio data being captured simultaneously is obviously not inventive; Examiner interprets behavioral data broadly as the positional data of the person and their extremities; For instance, if the person’s feet move to a new depth, the depth sensors are active to determine that change; Brunner, for example, Abstract:  teaches the depth sensors are used to determine gestures, and thus are actively acquiring data in real-time; Alternatively or supplementally, Crampton, pg. 7:  teaches the sensors are for capturing expressions, such as happy, sad, angry, etc.; Featherstone, Abstract:  teaches the sensors are for receiving facial replies; see also Featherstone, col. 6, ll. 50–63:  teaching the depth camera is responsible for capturing smiles, frowns, etc.); vi. a first user interface for showing a video of a user, prompting the user to answer interview questions, or prompting the user to demonstrate a skill (Crampton, pg. 7:  teaches prompting the user to answer questions or to walk on a moving walkway; Crampton, pg. 5:  teaches a display and user input devices, e.g. a touchscreen), b. an edge server connected to the first camera, the second camera, the third camera, the first depth sensor, the second depth sensor, the first microphone, and the first user interface, wherein the edge server comprises an edge server non-transitory computer memory and an edge server processor in data communication with the first camera, the second camera, the third camera, the first depth sensor, the second depth sensor, and the first microphone; wherein computer instructions are stored on the computer memory for instructing the edge server processor to perform the steps of (Because edge servers are typically used for content delivery networks, Examiner found the use of an edge server in this context atypical; However, Examiner found the following broad definition; According to Wikipedia’s entry on “Edge Computing,” “Karim Arabi, in an IEEE DAC 2014 Keynote and subsequently in an invited talk at MIT's MTL Seminar in 2015 defined edge computing broadly as all computing outside the cloud happening at the edge of the network, and more specifically in applications where real-time processing of data is required. In his definition, cloud computing operates on big data while edge computing operates on "instant data" that is real-time data generated by sensors or users.”; Crampton, pgs. 12–13, description of Fig. 12:  teaches each of the components of the system are commonly connected to a server computer): i. capturing first video input of the user from the first camera, second video input of the user from the second camera, third video input of the user from the third camera, wherein the first video input, the second video input and the third video input are of a first length (Sanil, ¶ 0120: teaches a video capture device), ii. capturing behavioral depth sensor data input from the first depth sensor and the second depth sensor (Examiner finds video and sensor data being captured simultaneously is obviously not inventive; Examiner interprets behavioral data broadly as the positional data of the person and their extremities; For instance, if the person’s feet move to a new depth, the depth sensors are active to determine that change; Brunner, for example, Abstract:  teaches the depth sensors are used to determine gestures, and thus are actively acquiring data in real-time; Alternatively or supplementally, Crampton, pg. 7:  teaches the sensors are for capturing expressions, such as happy, sad, angry, etc.; Featherstone, Abstract:  teaches the sensors are for receiving facial replies; see also Featherstone, col. 6, ll. 50–63:  teaching the depth camera is responsible for capturing smiles, frowns, etc.), iii. capturing audio input of the user from the first microphone (Sanil, ¶ 0005:  teaches video portions can be selected based on e.g. low-speech activity; Sanil, ¶ 0120: teaches a microphone is a contemplated input device), iv. selecting a portion of interest of the first video input, the second video input, or the third video input based on the simultaneously recorded behavioral data input (Sanil, ¶ 0005:  teaches video portions can be selected based on e.g. low-speech activity; In addition to the prior art demonstrating that automated systems can detect behaviors and content without a video signal and place markers at those positions within the video, Examiner further finds automated systems are capable of identifying and marking other semantic content from video; Indeed, many video surveillance applications are able to select portions of video that are identified as having a potentially interesting event (i.e. event triggers/markers)), v. concatenating portions of the first video input, second video input and third video input to create an audiovisual file, wherein the audiovisual file includes the portion of interest of video input, wherein the audiovisual file is of a second length, wherein the second length is shorter than the first length (Examiner interprets this limitation as an automated video editing software application that will create e.g. a highlight reel from captured AV data by removing less salient portions of the full captured video; Sanil, ¶ 0005:  teaches automatic video editing that is capable of automatically removing portions from a video sequence and “concatenat[ing]” retained video portions to compose a highlight reel), and vi. sending the audiovisual file to a network (Sanil, ¶ 0026:  teaches networked computing).
One of ordinary skill in the art, before the effective filing date of the claimed invention, would have been motivated to combine the elements taught by Featherstone, with those of Crampton, because both references are drawn to the same field of endeavor (personal booth imaging system), because Featherstone’s booth dimensions could have provided approximate dimensions for configuring a system similar to Crampton’s booth, and because the combination represents a mere combination of prior art elements, according to known methods, to yield a predictable result.  This rationale applies to all combinations of Featherstone and Crampton used in this Office Action unless otherwise noted.
One of ordinary skill in the art, before the effective filing date of the claimed invention, would have been motivated to combine the elements taught by Featherstone and Crampton, with those of Brunner, because all three references are drawn to the same field of endeavor (personal imaging system), because Brunner’s depth sensor configuration in view of Featherstone’s and Crampton’s use of depth sensors for similar purposes in their systems represents a mere combination of prior art elements, according to known methods, to yield a predictable result.  This rationale applies to all combinations of Featherstone, Crampton, and Brunner used in this Office Action unless otherwise noted.
One of ordinary skill in the art, before the effective filing date of the claimed invention, would have been motivated to combine the elements taught by Featherstone, Crampton, and Brunner, with those of AirHush, because AirHush’s gas-filled bladder booth offers an off-the-shelf solution for erecting temporary sound booths such that Featherstone’s booth or Crampton’s booth could be constructed using AirHush’s building materials.  Therefore, the stated combination represents a mere combination of prior art elements, according to known methods, to yield a predictable result.  This rationale applies to all combinations of Featherstone, Crampton, Brunner, and AirHush used in this Office Action unless otherwise noted.
One of ordinary skill in the art, before the effective filing date of the claimed invention, would have been motivated to combine the elements taught by Featherstone, Crampton, Brunner and AirHush, with those of Sanil, because Sanil’s video editing feature can be applied to any input video content to compose video highlight reels.  Therefore, the stated combination represents a mere combination of prior art elements, according to known methods, to yield a predictable result.  This rationale applies to all combinations of Featherstone, Crampton, Brunner, AirHush, and Sanil used in this Office Action unless otherwise noted.
Regarding claim 29, the combination of Featherstone, Crampton, Brunner, AirHush, and Sanil teaches or suggests the kiosk of claim 28, wherein concatenating comprises selecting one of the video inputs for each time segment of the audiovisual file (Sanil, ¶‌ 0031:  teaches concatenating multiple selected segments to compose a new video).
Regarding claim 30, the combination of Featherstone, Crampton, Brunner, AirHush, and Sanil teaches or suggests the kiosk of claim 28, wherein the computer instructions that are stored on the computer memory are further configured to instruct the edge server processor to perform the step of: designating an unwanted portion of the first video input, the second video input, or the third video input, wherein the unwanted portion is not included in the audiovisual file (Crampton, Fig. 7:  teaches front, side, and back wall mounting locations for cameras; Crampton, pg. 13:  explains any number of cameras are contemplated in virtually any position; Examiner finds no inventive activity, such as an unexpected result, from the location or number of cameras used; Sanil, ¶ 0005: teaches detecting an unwanted portion of a video and concatenating a video absent the unwanted portion).
Regarding claim 31, the combination of Featherstone, Crampton, Brunner, AirHush, and Sanil teaches or suggests the kiosk of claim 30, wherein the computer instructions that are stored on the computer memory are further configured to instruct the edge server processor to perform the step of: discarding the designated unwanted portions of the first video input, second video input and third video input (Sanil, ¶ 0005: teaches detecting an unwanted portion of a video and concatenating a video absent the unwanted portion).
Regarding claim 32, the combination of Featherstone, Crampton, Brunner, AirHush, and Sanil teaches or suggests the kiosk of claim 31, wherein at least one of the unwanted portions of video input was designated as unwanted based on analysis of the simultaneously recorded behavioral data (Sanil, ¶ 0005: teaches detecting an unwanted portion of a video and concatenating a video absent the unwanted portion).
Regarding claim 34, the combination of Featherstone, Crampton, Brunner, AirHush, and Sanil teaches or suggests the kiosk of claim 30, wherein the computer instruction that are stored on the computer memory are further configured to instruct the edges server processor to perform the steps of: designating a first portion of the first video input, the second video input or the third video input that immediately precedes the unwanted portion; designating a second portion of the first video input, the second video input or the third video input that immediately follows the unwanted portion; and concatenating the first portion of the first video input, the second video input or the third video input with the second portion of the first video input, the second video input or the third video input; wherein the first portion or the second portion comprises the portion of interest (Examiner finds this claim is claiming the process of cutting or splicing together video segments; Sanil, ¶ 0005: teaches detecting an unwanted portion of a video and concatenating a video absent the unwanted portion).
Regarding claim 35, the combination of Featherstone, Crampton, Brunner, AirHush, and Sanil teaches or suggests the kiosk of claim 28, wherein the behavioral data used for selecting the portion of interest identifies a portion of the interview where the user showed the most movement or the least movement (Sanil, ¶ 0059–0060:  teaches finding hotspots in video and keeping those portions wherein the hotness is judged by the amount of activity or movement).
Regarding claim 38, the combination of Featherstone, Crampton, Brunner, AirHush, and Sanil teaches or suggests the kiosk of claim 28, wherein the first camera, the second camera, and the third camera are mounted to the front wall, or wherein the first camera is mounted to the first side wall, the second camera is mounted to the front wall, and the third camera is mounted to the second side wall (Crampton, Fig. 7:  teaches front, side, and back wall mounting locations for cameras; Crampton, pg. 13:  explains any number of cameras are contemplated in virtually any position; Examiner finds no inventive activity, such as an unexpected result, from the location or number of cameras used).
Regarding claim 39, the combination of Featherstone, Crampton, Brunner, AirHush, and Sanil teaches or suggests the kiosk of claim 28, further comprising a fourth camera disposed adjacent to or in the corner of the front wall and the second side wall (Crampton, Fig. 7:  teaches front, side, back, and corner wall mounting locations for the cameras; Crampton, pg. 13:  explains any number of cameras are contemplated in virtually any position; Examiner finds no inventive activity, such as an unexpected result, from the location or number of cameras used); wherein the first side wall comprises a door (Crampton, pg. 12:  teaches an “entry,” which Examiner finds is equivalent to a door; see add’l prior art, particularly Jones, listed under Conclusion Section).
Regarding claim 40, the combination of Featherstone, Crampton, Brunner, AirHush, and Sanil teaches or suggests the kiosk of claim 39, further comprising a fifth camera disposed adjacent to or in the corner of the back wall and the second side wall (Crampton, Fig. 7:  teaches front, side, back, and corner wall mounting locations for the cameras; Crampton, pg. 13:  explains any number of cameras are contemplated in virtually any position; Examiner finds no inventive activity, such as an unexpected result, from the location or number of cameras used).
Regarding claim 41, the combination of Featherstone, Crampton, Brunner, AirHush, and Sanil teaches or suggests the kiosk of claim 28, further comprising a second user interface and a third user interface, wherein the second user interface is mounted on a first arm extending from the second side wall and the third user interface is mounted on a second arm extending from the first side wall (Crampton, pgs. 5–6:  teach a number of user interface devices used in the kiosk including touchscreens, keyboard, button panels, and microphones and explicitly teaches that those and any number of other interface means in any combination is contemplated; Examiner finds mounting interfaces to extension arms is obvious, especially in booths used by the public, so that interface devices such as keyboard, mice, and touchscreens “do not walk off.”; Furthermore, Examiner finds no inventive activity, such as an unexpected result, from the location or number of user interface devices).
Regarding claim 42, the combination of Featherstone, Crampton, Brunner, AirHush, and Sanil teaches or suggests the kiosk of claim 28, wherein the kiosk does not include a roof connected to the enclosing wall (Featherstone, Fig. 3:  teaches a kiosk without a roof).
Regarding claim 43, the combination of Featherstone, Crampton, Brunner, AirHush, and Sanil teaches or suggests the kiosk of claim 28, further comprising a third depth sensor for capturing behavioral data (Crampton, pg. 7:  teaches the sensors are for capturing expressions, such as happy, sad, angry, etc.; Featherstone, Abstract:  teaches the sensors are for receiving facial replies; see also Featherstone, col. 6, ll. 50–63:  teaching the depth camera is responsible for capturing smiles, frowns, etc.), wherein the third depth sensor is mounted on the first side wall or the second side wall opposite from the first depth sensor; wherein the third depth sensor is aimed toward the booth interior (see treatment of claim 28, regarding multitude of depth sensors; Examiner finds no inventive activity, such as an unexpected result, from the location or number of depth sensors); wherein the edge server is connected to the third depth sensor (see treatment of claim 28, regarding the computing device controlling the various sensors and devices in the kiosk; see also Sanil, ¶ 0033:  teaching the system can be implemented on a cloud computing platform).
Regarding claim 44, the combination of Featherstone, Crampton, Brunner, AirHush, and Sanil teaches or suggests a kiosk comprising: a. a booth comprising: i. an enclosing wall forming a perimeter of the booth and defining a booth interior; A. wherein the enclosing wall extends between a bottom of the enclosing wall and a top of the closing wall; B. wherein the enclosing wall has a height from the bottom of the enclosing wall to the top of the enclosing wall C. wherein the perimeter is at least 14 feet (4.3 meters) and not more than 80 feet (24.4 meters) (see treatment of claim 28); ii. a first camera and a second camera for taking video images, each of the cameras aimed toward the booth interior; wherein the first camera and second camera are disposed on the same portion of the enclosing wall (Crampton, Fig. 7:  teaches front, side, back, and corner wall mounting locations for the cameras; Crampton, pg. 13:  explains any number of cameras are contemplated in virtually any position; Examiner finds no inventive activity, such as an unexpected result, from the location or number of cameras used); iii. a first microphone for capturing audio data of sound in the booth interior (see treatment of claim 28); iv. a first depth sensor for capturing behavioral data, wherein the first depth sensor is configured to detect changes in foot position, A. wherein the at least one depth sensor is aimed toward the booth interior; B. wherein video images, behavioral data, and audio data are captured simultaneously; v. a user interface that shows a video of a user, prompts the user to answer interview questions, or prompts the user demonstrate a skill (see treatment of claim 28), wherein the user interface comprises a third camera (Crampton, Fig. 7:  teaches front, side, back, and corner wall mounting locations for the cameras; Crampton, pg. 13:  explains any number of cameras are contemplated in virtually any position; Examiner finds no inventive activity, such as an unexpected result, from the location or number of cameras used); vi. a chair disposed in the interior of the booth, wherein the chair comprises a seat surface, wherein the chair is approximately centered with respect to the back wall in a first position, wherein the chair is moveable (see treatment of claim 28); b. an edge server connected to the first camera, the second camera, the depth sensor, the first microphone, and the user interface, wherein the edge server comprises an edge server non-transitory computer memory and an edge server processor in data communication with the first camera, the second camera, the first depth sensor, and the first microphone; wherein computer instructions are stored on the computer memory for instructing the edge server processor to perform the steps of: i. capturing first video input of the user from the first camera, and second video input of the user from the second camera, wherein the first video input and the second video input are of a first length, ii. capturing behavioral depth sensor data input from the first depth sensor, iii. capturing audio input of the user from the first microphone, iv. selecting a portion of interest of the first video input or the second video input based on the simultaneously recorded behavioral data input, v. concatenating portions of the first video input and the second video input to create an audiovisual file, wherein the audiovisual file includes the portion of interest of video input based on the recorded behavioral data input, wherein the audiovisual file is of a second length, wherein the second length is shorter than the first length and vi. sending the audiovisual file to a network (see treatment of claim 28).
Regarding claim 45, the combination of Featherstone, Crampton, Brunner, AirHush, and Sanil teaches or suggests the kiosk of claim 44, further comprising a second microphone for capturing audio housed in the enclosed booth, wherein the edge server is connected to the second microphone; wherein the computer instructions stored on the memory for instructing the processor to further perform the steps of: a. analyzing audio from the first microphone and audio from the second microphone to determine the highest quality audio data; b. automatically saving the concatenated video data with the highest quality audio data as a single audiovisual file (Examiner finds it obvious that where one video camera with microphone records video and audio representing a first view and a second camera with microphone records video and audio representing a second view and those videos are concatenated, the microphones of the video capturing devices picked up the same audio scene such that one can choose which one of the two (or more) audio sources to include with the concatenated video based on which one is the best; Sanil, ¶ 0005:  teaches choosing to include in a concatenated video the better quality content; see also Sanil, ¶ 0036:  teaching auditory quality analysis that ranks ).
Regarding claim 46, the combination of Featherstone, Crampton, Brunner, AirHush, and Sanil teaches or suggests the kiosk of claim 45, wherein the single audiovisual file comprises video input from the first camera when audio from the first microphone is used and video input from the second camera when audio from the second microphone is used (Sanil, ¶ 0042:  teaches the audio tracks from the video inputs can be used for the concatenated video).
Regarding claim 47, the combination of Featherstone, Crampton, Brunner, AirHush, and Sanil teaches or suggests the kiosk of claim 44, wherein the computer instructions that are stored on the computer memory are further configured to instruct the edge server processor to perform the step of: designating an unwanted portion of the first video input or the second video input, wherein the unwanted portion is not included in the audiovisual file; designating a first portion of the first video input or the second video input that immediately precedes the unwanted portion; designating a second portion of the first video input or the second video input that immediately follows the unwanted portion; and concatenating the first portion of the first video input or the second video input with the second portion of the first video input or the second video input; wherein the first portion or the second portion comprises the portion of interest (Examiner finds this claim is claiming the process of cutting or splicing together video segments; Sanil, ¶ 0005: teaches detecting an unwanted portion of a video and concatenating a video absent the unwanted portion).
Claim 33, 36, and 37 are rejected under 35 U.S.C. 103 as being unpatentable over Featherstone, Crampton, Brunner, AirHush, Sanil, and Kyllonen (US 2015/0269529 A1).
Regarding claim 33, the combination of Featherstone, Crampton, Brunner, AirHush, Sanil, and Kyllonen teaches or suggests the kiosk of claim 31, wherein at least one of the unwanted portions of video input was designated as unwanted based on analysis of the simultaneously recorded behavioral data that identified the user as slouching or fidgeting in the at least one of the discarded portions (Sanil, ¶ 0005: teaches detecting an unwanted portion of a video and concatenating a video absent the unwanted portion; Examiner finds slouching or fidgeting are just examples of behavioral features that can be extracted by a computer system; See Kyllonen, ¶ 0019:  teaching body language and fidgeting are recognized by the computer system; It is noted Kyllonen incorporates by reference Latta (US 2010/0199228 A); Latta’s ¶ 0027 explains depth data is an image of depth pixels that can be analyzed like camera image data.  Latta’s ¶¶ 0059, 0068–0070, and 0074 teach image motion analysis using parameters to characterize human movements as gestures wherein slight motions are differentiated from intentional larger motions, wherein speed and volume of movements are also characterized).
One of ordinary skill in the art, before the effective filing date of the claimed invention, would have been motivated to combine the elements taught by Featherstone, Crampton, Brunner, AirHush, and Sanil, with those of Kyllonen, because Kyllonen’s computer-based physiological reaction detection system of interviewees can be applied to any video of a person.  Therefore, the stated combination represents a mere combination of prior art elements, according to known methods, to yield a predictable result.  This rationale applies to all combinations of Featherstone, Crampton, Brunner, AirHush, Sanil, and Kyllonen used in this Office Action unless otherwise noted.
Regarding claim 36, the combination of Featherstone, Crampton, Brunner, AirHush, Sanil, and Kyllonen teaches or suggests the kiosk of claim 28, wherein the behavioral data used for selecting the portion of interest is selected from a group consisting of posture volume data, and frequency of posture volume changes (Kyllonen, Claim 2:  teaches detecting posture data; Latta’s ¶ 0027 explains depth data is an image of depth pixels that can be analyzed like camera image data.  Latta’s ¶¶ 0059, 0068–0070, and 0074 teach image motion analysis using parameters to characterize human movements as gestures wherein slight motions are differentiated from intentional larger motions, wherein speed and volume of movements are also characterized).
Regarding claim 37, the combination of Featherstone, Crampton, Brunner, AirHush, Sanil, and Kyllonen teaches or suggests the kiosk of claim 28, wherein the behavioral data used for selecting the portion of interest identifies a user's posture (Kyllonen, Claim 2:  teaches detecting posture data; Latta’s ¶ 0027 explains depth data is an image of depth pixels that can be analyzed like camera image data.  Latta’s ¶¶ 0059, 0068–0070, and 0074 teach image motion analysis using parameters to characterize human movements as gestures wherein slight motions are differentiated from intentional larger motions, wherein speed and volume of movements are also characterized).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
White (US 8,300,785 B2) teaches an interview booth for a job applicant (e.g. claim 16).
Pavlidis (US 2006/0116555 A1) teaches an interview booth with the ability to determine physiological state (e.g. Abstract).
Leonard (US 2011/0135279 A1) teaches an interview booth (e.g. ¶ 0108).
Jones (US 2018/0376225 A1) teaches, inter alia, a booth with a door (e.g. Fig. 9A).
Shirakyan (US 10,694,097 B1) teaches microphones, depth and stereo cameras as sensors, and infrared proximity sensors (e.g. col. 5, ll. 59–67).
Tay (US 2020/0012350 A1) teaches depth sensors can capture foot data and is not limited to cropped portions of the person (e.g. ¶¶ 0048, 0053, and 0080).
Mitchell (US 2018/0374251 A1) teaches foot observation including foot depth (e.g. ¶ 0052).
Kyllonen (US 2015/0269529 A1) teaches a machine-based observation of the feet of an interviewee (e.g. ¶ 0005).
Kramer (US 2014/0325373 A1) teaches depth measurement of feet (e.g. 0215).
Li (US 2017/0148488 A1) teaches automatically removing undesirable portions of acquired video without manual editing (¶ 0065).
Yeh (US 2020/0197793 A1) teaches volumetric, shadow, and skeletal models used to characterize and track movements of participants using depth sensor data (¶¶ 0014–0016).
Bi (US 2018/0295428 A1) teaches detecting a motion event by comparing a number of moving pixels of a blob (which is a term of art) to a threshold (¶ 0070).
Guigues (US 2014/0334670 A1) teaches inputs related to characterizing human movements using depth sensor data including the number of pixels associated with centroid regions and first and second order motion data (¶ 0101).
Zhu (US 2013/0176430 A1) teaches motion blobs and differentiating types of motions from salient motions such as ignoring insignificant or repetitive motions (e.g. ¶¶ 0006, 0014, 0032, and 0053).
Morris (US 2015/0302158 A1) teaches using image regions and motion analysis (comparing motion to thresholds) over a period of time to estimate heart rate (¶ 0074).
Latta (US 2010/0199228 A1) is incorporated by reference in the Kyllonen reference (Kyllonen, ¶ 0020).  Latta’s ¶ 0027 explains depth data is an image of depth pixels that can be analyzed like camera image data.  Latta’s ¶¶ 0059, 0068–0070, and 0074 teach image motion analysis using parameters to characterize human movements as gestures wherein slight motions are differentiated from intentional larger motions, wherein speed and volume of movements are also characterized.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael J Hess whose telephone number is (571)270-7933.  The examiner can normally be reached Mon - Fri 9:00am-5:30pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, William Vaughn can be reached on (571)272-3922.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8933.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL J HESS/Primary Examiner, Art Unit 2481