FINAL ACTION
Change in Art Unit Designation for Your Application
The Art Unit designation of your application in the USPTO has changed from 2663 to 2698.  The examiner assigned to this application remains the same. To aid in correlating any papers for this application, all further correspondence regarding this application should be directed to Art Unit 2698.  

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 7/23/2021, 9/30/2021 and 1/11/2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments
APPLICANT’S REMARKS (page 11) to objection of specification “title of invention”, requests this rejection be held in abeyance until there is an indication of allowable subject matter. Examiner notes that the current title is a great starting point but recommends adding additional analogous language indicative of the core features present in the independent claims. Examiner notes that independent claims 36 and 42 are currently allowable.
requests this rejection be held in abeyance until there is an indication of allowable subject matter. Examiner acknowledges Applicant’s request and maintains the double patenting rejection as discussed in Office Action mailed on 6/23/2021.
APPLICANT’S REMARKS (page 15) to new claims 41 and 42 are found persuasive. After further search/consideration of the prior art, claim 41 is objected as allowable and independent claim 42 is allowable.
APPLICANT’S REMARKS (pages 12-14) to 35 USC 103 rejection, traverses this rejection made over Marco (US 2018/0350405) in view of Dubin (US 2013/0216206) per NEW AMENDMENTS made to the “system” of independent claim 21 AND “method” of independent claim 28. As analogously amended in BOTH independent claims (21 & 28), the new amended limitations RECITES (with emphasis in bold):
 “a first video input from a first video camera and a second video input from a second video camera, wherein the first video camera and the second video camera are directed towards a common area, such that the video cameras are configured to obtain video data of a common subject in the common area from different angles;” -- as now amended in LINES 2-5 of claim 21
“wherein the combined audiovisual file includes the first low noise audio segment from the audio input.” -- as now amended in LINES 35-36 of claim 21

Applicant’s arguments with respect to independent claims 21 and 28 have been considered but are moot because the arguments do not apply to the combination of references being used in the current rejection.
not found persuasive because it is also considered OBVIOUS to apply the fundamental concepts of automatic video editing as taught by Marco in view of Dubin to a COMMON AREA such as a STAGE, which includes scenarios of a COMMON PERSON such as a SINGLE SPEAKER or PERFORMER as taught by related “automatic video editing” prior art Beauregard (US 2010/0183280).
PRIOR ART Beauregard (US 2010/0183280), Figures 1 and 6 shown below:

    PNG
    media_image1.png
    596
    376
    media_image1.png
    Greyscale
	
    PNG
    media_image2.png
    409
    346
    media_image2.png
    Greyscale

                    Figure 1                                   Figure 6: annotated for discussion purposes

Giving a brief summary, Beauregard teaches an automatic video editing system that produces a final audio-video a COMMON AREA “STAGE” of a COMMON SUBJECT which is considered for scenario having “ONE PERSON” (Fig. 6: 606) by intercutting video clips from multiple cameras (Fig. 6: 601-603 & 609) and audio clips from a reference microphone (Fig. 6: 604/605). The final video is a concatenated (i.e. temporally-linked sequence) of intercut video clips from the multiple cameras and audio clips from microphone (Fig. 1: steps 106 & 108 and “multi-camera live event scenario” per para[0010, 0015-0017, 0065-0074]).
Examiner notes that the predominant examples given (i.e. Figure 6) teaches that the COMMON PERSON (606) is a musical band performer and the reference audio (604/605) is the band performer’s music. It is also taught that the invention may be applied to an event where the PERSON is alternatively a SPEAKER in view of para[0074 & 0136], and thus it is interpreted that the reference audio (604/605) may simply be a person’s speech, lecture or panelist discussion.
Automatic determination for how the video/audio clips are selected and concatenated are based on an audio/video analysis to identify important events including a person’s (112) speaking-state in view of para[0132, 0136-0137]. In view of para[0136], it is recognized to select the “loudest” video clip which is indicative of a person “currently” speaking. Based on this teaching, one of ordinary skill in the art understands and would find it obvious for a person’s speaking-state to be automatically identified from a low/silent audio level (i.e. paused speech) OR high audio level (i.e. active speech) present in the audio clip.
Prior art Beauregard also teaches audio analysis methods may be used to identify salient instants “important events” in the reference audio clip (para[0125]) AND the automatically selected video clips may “start” and “end” at the salient instants (para[0126]). Additionally, the selected video clips may have “durations” set according to loudness in reference audio clip in view of para[0126].
automatic determination for how the video/audio clips are selected and concatenated are based on a video analysis to identify important events such as the presence of a face, para[0132 & 0137], which may be used in combination with the audio analysis to produce the final audio-video.
Beauregard’s Figure 2 (timeline) shown below:

    PNG
    media_image3.png
    643
    860
    media_image3.png
    Greyscale

Note: Figure 2 has been annotated by the Examiner for discussion purposes to illustrate a simplified interpretation using only two cameras and a standalone microphone.


common area with a common person is considered using the Figure 2 timeline (shown above) with the setup shown in Figure 6, to illustrate a simplified interpretation using only two cameras and a standalone microphone. The two cameras from Figure 6 will be considered as the first camera 601 generating first video input “Vid-1” and second camera 602 generating second video input “Vid-2” in Figure 2. The standalone microphone 604/605 will be considered to generate the audio input “Audio” 201/208 in Figure 2 in view of para[0039 & 0069]. The other cameras and respective video inputs i.e. Vid-3 through Vid-6 will not be used in this simplified interpretation. 
Thus, the ABOVE discussed EXAMPLE SCENARIO (for a common area with a common person) should be interpreted as CONTINUOUS AUDIO while the TWO VIDEO CAMERA inputs are switched to produce the final audio-video file, as would be the case if more than two video camera inputs were used.
Furthermore, as recognized by Beauregard in view of para[0004], in the general field of “video editing” one of the primary purposes to use video editing is to remove the unwanted parts and also apply video enhancements such as transitions between shots. The intention of Beauregard’s automatic video editing invention is to produce a final audio-video that includes the desirable video shots AND avoids the poor quality or undesirable video shots as taught in para[0109]. 



Beauregard is considered to teach using “automatic video editing” using a continuous audio input and two video camera inputs (from different angles) AIMED at a COMMON AREA of a COMMON SUBJECT per  Applicant’s newly amended limitation:

“a first video input from a first video camera and a second video input from a second video camera, wherein the first video camera and the second video camera are directed towards a common area, such that the video cameras are configured to obtain video data of a common subject in the common area from different angles;”

When taking the collective knowledge and teachings in combination over disclosed prior arts (Marco, Dubin & Beauregard), one of ordinary skill in the art could have conceived of Applicant’s claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to COMBINE the teachings of Beauregard into suitable modification with the teachings of Marco in view Dubin to produce an automatic video editing system/method comprising continuous audio input and two video camera inputs (from different angles) for a scenario aimed on a common area of a common subject for the MOTIVATED REASON of producing a high quality final audio-video with all the desirable important segments without the unwanted, unimportant or disruptive audio-video parts in the analogous art of automatic video edit processing.


amended limitation (per independent claims 21 and 28) in which both specify that the COMBINED AUDIOVISUAL FILE includes the “first low noise audio segment” from the audio input.
	Using claim 21 as an example (but also holds true for claim 28), the claimed features per lines 25-36 can be interpreted to have 3 scenarios in which the low noise audio event may be inclusive of one of the video clips or both video clips since the intercut switch point between the two video inputs may exist anywhere within the range of the low noise audio segment event as illustrated below.
	Limitation (1) scenario recites: “the first audiovisual clip ends at the beginning of the first low noise audio segment and the second audiovisual clip begins at the beginning of the first low noise audio segment” which is interpreted as depicted below.

    PNG
    media_image4.png
    414
    552
    media_image4.png
    Greyscale

(2) scenario recites: “the first audiovisual clip ends after the beginning of the first low noise audio segment and before the end of the first low noise audio segment and the second audiovisual clip begins after the beginning of the first low noise audio segment and before the end of the first low noise audio segment” which can be interpreted as depicted below.


    PNG
    media_image5.png
    409
    551
    media_image5.png
    Greyscale



(3) scenario recites: “the first audiovisual clip ends at the end of the first low noise audio segment and the second audiovisual clip begins at the end of the first low noise audio segment” which can be interpreted as depicted below.


    PNG
    media_image6.png
    407
    554
    media_image6.png
    Greyscale


The Examiner considers these 3 scenarios well known in the art of automatic video edit processing, which applies a transition effect at switch point between two videos for the motivated reason of producing a high quality final audio-video with all desirable important segments without the unwanted, unimportant or disruptive audio-video parts.
close up shot (camera 1) of the subject during speaking moments AND a wide angle shot (camera 2) of the subject with the background objects during non-speaking moments or vice versa. 
When taking the collective knowledge and teachings in combination over disclosed prior arts (Marco, Dubin & Beauregard), one of ordinary skill in the art could have conceived of Applicant’s claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to COMBINE the teachings of Beauregard into suitable modification with the teachings of Marco in view Dubin to configure an automatic video editing system/method comprising continuous audio input and two video camera inputs (from different angles) for a scenario aimed on a common area of a common subject to produce a combined audiovisual file that includes the audio input’s first low noise audio segment for the MOTIVATED REASON of producing a high quality final audio-video with all the DESIRABLE important segments without the unwanted, unimportant or disruptive audio-video parts in the analogous art of automatic video edit processing.

IN CONCLUSION, since Applicant's amendment necessitates a new grounds of rejection, the Examiner makes a new rejection over the prior art combination -- Marco in view of Dubin in further view of Beauregard (US 2010/0183280). For a further explanation of the prior art combination as it corresponds to Applicant’s amendments, see the 35 USC 103 rejection detailed in this Office Action.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 21-32 and 34 are rejected under 35 U.S.C. 103 as being unpatentable over Marco (US 2018/0350405) in view of Dubin (US 2013/0216206) in view of Beauregard (US 2010/0183280).

	As per independent claim 21, Marco teaches a system (Figures 1, 2 & 5-7: Automatic audio-video editing system which smartly mixes audio & video from any number of electronic devices, para [0001 & 0004], to produce a singular time-aligned audio-video edited multimedia file) comprising: 
a first video input and a second video input, and an audio input (Figures 1, 2 & 7: Automatic audio-video editing system smartly mixes audio & video from any number electronic devices 105/120 comprising a camera 155, microphone 125/130 or combination of both a camera & microphone, para[0001, 0004, 0009-0010, 0022 & 0061]. Therefore, Marco’s teachings imply that the first and second video inputs may be from respective cameras AND the audio input may be from a standalone microphone 125 OR integrated device having both the mic 130 and camera 155); 
a time counter providing a timeline associated with the first video input, the second video input, and the audio input, wherein the timeline enables a time synchronization of the first video input, the second video input, and the audio input (Figures 5 & 6: synchronization step 505/605 and Para [0004, 0023-0024, 0028 & 0041]: The electronic devices may be synchronized to simultaneously start recording with each other, wherein one of the devices may serve as the master device 105 (having an internal clock) establishing a reference timeline in which the other slave devices 120 (having internal clocks) may be synchronized to for producing the smartly mixed singular time-aligned audio-video edited multimedia file); 
a non-transitory computer memory and a computer processor, and computer instructions stored on the memory for instructing the processor (Automatic audio-video editing system devices have a computer which executes a stored software application (see processing flowcharts in Figures 5 & 6) in view of para [0022, 0029-0030, 0034, 0041 & 0077]) to perform the steps of: 
sampling the audio input to identify a first low noise audio segment in which a decibel level is below a threshold level for a predetermined period of time AND sampling the audio input to identify a beginning of the first low noise audio segment and an end of the first low noise audio segment, and in response to identifying the first low noise audio segment, AUTOMATICALLY assembling a COMBINED AUDIOVISUAL FILE by retaining a first audiovisual clip comprising a portion of the audio input and first video input occurring before the first low noise audio segment, retaining a second audiovisual clip comprising a portion of the audio input and second video input occurring after the first low noise audio segment, and concatenating the first audiovisual clip and the second audiovisual clip to CREATE a COMBINED AUDIOVISUAL FILE (Fig. 5: steps 510-530; Fig. 6: steps 610-635; Fig.7: switch cutting of interviewee speaking 705 & 715 and stop speaking. System analyzes both the AUDIO & VIDEO to identify the audio characteristics which includes low noise audio events (para [0009-0010, 0042, 0045, 0047-0048, 0050, 0053, 0055, 0074]) such as no speech or low energy level/volume “below a threshold”, para[0050], which is used to produce a combined audiovisual file as in step 545 or step 650. Combined audiovisual file uses a transition effect which may segue/cut from “retained” first audio-video clip to “retained” second audio-video clip based on identified audio characteristics “no speech or low energy level/volume”, para[0009-0010 & 0067]. In other words, in the spirit of Marco’s teachings, it can be appreciated that the system analysis identifies low/no speaking noise in a first clip and high speaking noise in a second clip, wherein a transition effect switch cuts from the first clip to the second clip to produce a high quality combined time-aligned audio-video with all desirable important segments).
Regarding the limitations: “wherein (1) the first audiovisual clip ends at the beginning of the first low noise audio segment and the second audiovisual clip begins at the beginning of the first low noise audio segment, (2) the first audiovisual clip ends after the beginning of the first low noise audio segment and before the end of the first low noise audio segment and the second audiovisual clip begins after the beginning of the first low noise audio segment and before the end of the first low noise audio segment, OR (3) the first audiovisual clip ends at the end of the first low noise audio segment and the second audiovisual clip begins at the end of the first low noise audio segment”, these features “scenarios 1-3” are interpreted as depicted in the Response to Arguments above. Furthermore, per the previous limitation discussion of identifying low/no speaking noise in a first clip and high speaking noise in a second clip, wherein a transition effect segue/cuts from the first clip to the second clip to produce a high quality combined time-aligned audio-video with all desirable important segments over cited teachings in para [0009-0010, 0042, 0045, 0047-0048, 0050, 0053, 0055, 0067 & 0074], one of ordinary skill in the art could easily conceive of scenarios 2 & 3.
	Regarding the bolded language per the limitation: “sampling the audio input to identify a low noise audio segment in which a decibel level is below a threshold level for a predetermined period of time AND sampling the audio input to identify a beginning of the low noise audio segment and an end of the low noise audio segment”, these features are considered implicit/inherent characteristic of the identified low noise audio i.e. the audio has a duration, frequency, decibel/volume level and may be associated with an image such as a person’s mouth speaking.
	However, to more explicitly show that it is well known in the analogous art to identify low noise segments further associated with a duration, Examiner evidences prior art Dubin (Figures 2, 4 & 5 and para [0027-0029 & 0067]: teaches automatic video editing of plural audio & video inputs to produce a final intercut audio-video file. Figures 6 & 7 and para [0079-0080, 0090-0091, 0109, 0111 & 0178]: speech analysis 160 uses voice detection threshold 209 and identifies speaking segments 211 for making a switch cut determination 227. The voice detection/speaking identification process may further be based on duration, for example, a short pause in speech “low/no audio noise” may indicate its part of a larger speaking segment i.e. pauses between words spoken in a sentence, thus don’t switch cut yet. This contrasted to a longer pause “low/no audio noise” indicative that a person has stopped speaking first audio-video input OR that a dominant/louder speaking is coming from another/second audio-video input and thus applying a switch cut may be desirable. Para [0178]: teaches any participant sound that is less than a second in duration preferably is simply ignored. However, any such sounds that can be recognized as (or soon to be) the participant's voice, particularly if there are multiple such sounds, at least sometimes can be interpreted as an attempt by that participant to break into the conversation and, therefore, can be used as a basis for cutting away to that participant in situations where a cutaway is desired. Therefore, Dubin’s teachings in combination with Marco is also considered, as a matter of obviousness, to teach scenario 1 to achieve a “preemptive” switch cut to a person who is “about to speak”.). 
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Dubin into suitable modification with the teachings of Marco to identify low noise audio segments 
	Regarding Applicant’s new claim amendments:
“a first video input from a first video camera and a second video input from a second video camera, wherein the first video camera and the second video camera are directed towards a common area, such that the video cameras are configured to obtain video data of a common subject in the common area from different angles” AND “wherein the combined audiovisual file includes the first low noise audio segment from the audio input.”, these features are rejected for reasons discussed in the above “Response to Arguments” over the collective prior art teachings of Marco in view of Dubin in view of Beauregard taken in combination.
When taking the collective knowledge and teachings in combination over disclosed prior arts (Marco, Dubin & Beauregard), one of ordinary skill in the art could have conceived of Applicant’s claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to COMBINE the teachings of Beauregard into suitable modification with the teachings of Marco in view Dubin to configure an automatic video editing system/method comprising continuous audio input and two video camera inputs (from different angles) for a scenario aimed on a common area of a common subject to produce a combined audiovisual file that includes the audio input’s first low noise audio segment for the MOTIVATED REASON of producing a high quality final audio-video all the DESIRABLE important segments without the unwanted, unimportant or disruptive audio-video parts in the analogous art of automatic video edit processing.

	As per claim 22, Marco in view of Dubin in view of Beauregard teaches the system of claim 21, wherein the first video input, the second video input, and the audio input are recorded synchronously, and wherein the combined audiovisual file is a video interview of a job candidate (Marco’s invention may be recording synchronously (Figures 5 & 6: synchronization step 505/605 and para [0004, 0023-0024, 0028 & 0041]) to produce a combined video interview in general, para [0022]. Dubin’s invention may be applied to a job interview, para[0028]. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Dubin into suitable modification with the teachings of Marco to produce a combined audiovisual file video interview of a job candidate).
 
	As per claim 23, Marco in view of Dubin in view of Beauregard teaches the system of claim 21, further comprising computer instructions stored on the memory for instructing the processor to perform the steps of: sampling the audio input to identify multiple low noise audio segments in which the decibel level is below the threshold level for a predetermined period of time, and automatically concatenating alternating audiovisual clips that switch between the first video input and second video input after each low noise audio segment (In view of the prior art combination teachings discussed in claim 21, these additional features are considered obvious to one of ordinary skill in the art. Marco: teaches automatic processing (Figures 5 & 6) of identifying low/no speaking noise in a first clip and high speaking noise in a second clip, wherein a transition effect segue/cuts from the first clip to the second clip to produce a high quality combined time-aligned audio-video with all desirable important segments over cited teachings in para [0009-0010, 0042, 0045, 0047-0048, 0050, 0053, 0055, 0067 & 0074]. It’s understood these steps may be repeatable to automatically “switch in alternating fashion” between audio-video clips in a real world scenario. This can be gleamed from Marco’s Fig.7 (shown below): alternating audio-visual clips between two different camera angles based on when an interviewee 110 is speaking 705 & 715 and stops speaking 710). 

    PNG
    media_image7.png
    422
    468
    media_image7.png
    Greyscale

	
(In view of the prior art combination teachings discussed in claim 21, these additional features are considered obvious to one of ordinary skill in the art. Marco: Fig. 5: steps 510-530 and Fig. 6: steps 610-635, system automatically analyzes both the audio & video to identify the audio characteristics which includes low noise audio events (para [0009-0010, 0042, 0045, 0047-0048, 0050, 0053, 0055, 0074]) such as no speech or low energy level/volume “below a threshold”, para[0050], AND  extracting content data such as a face/mouth image data via face recognition, which are used to automatically produce a combined audiovisual file as in step 545 or step 650. Combined audiovisual file uses a transition effect which may segue/cut from “retained” first audio-video clip to “retained” second audio-video clip based on identified audio characteristics “no speech or low energy level/volume”, para[0009-0010 & 0067]. In other words, in the spirit of Marco’s teachings, it can be appreciated that the system analysis identifies low/no speaking noise in a first clip and high speaking noise in a second clip, wherein a transition effect switch cuts from the first clip to the second clip to produce a high quality combined time-aligned audio-video with all desirable important segments).

 	As per claim 25, Marco in view of Dubin in view of Beauregard teaches the system of claim 24, wherein the switch-initiating event is a keyword extracted from the audio input via speech-to-text. (In view of the prior art combination teachings discussed in claim 21 & 24, these additional features are considered obvious by teaching audio/voice recognition and image recognition in which the important moment may be a word/sentence spoken. Marco, Figures 5 & 6 and para[0005, 0009-0010, 0042, 0045-0048, 0050, 0053, 0055, 0074]. Dubin, Figure 6: speech analysis 160).

	As per claim 26, Marco in view of Dubin in view of Beauregard teaches the system of claim 21, wherein the combined audiovisual file has a length of time equivalent to a length of time of the first video input, the second video input and the audio input (As discussed in the Response to Arguments and depicted scenarios 1-3, Marco teaches a combined audiovisual file has a length of time SAME to a length of time of the first video input, the second video input and the audio input). 

	As per claim 27, Marco in view of Dubin in view of Beauregard teaches the system of claim 21, wherein when the first audiovisual clip ends after the beginning of (As discussed in the Response to Arguments and depicted scenarios, these features are considered to be encompassed by scenario #2, and thus is taught over the prior art combination of Marco in view of Dubin as discussed in claim 21). 

	Independent claim 28, recites: 
“a computer-implemented method comprising: receiving first video input of an individual from a first camera, receiving second video input of the individual from a second camera, wherein the first video camera and the second video camera are directed towards a common area, such that the video cameras are configured to obtain video data of a common subject in the common area from different angles, receiving audio input of the individual from a microphone, wherein the first video input, the second video input, and the audio input are recorded synchronously, and the first video input, the second video input, and the audio input are all of a first length; sampling the audio input, the first video input, or the second video input to identify a low noise audio event, wherein the sampled audio input to identify the low noise audio event has a decibel level that is below a threshold level for a predetermined period of time; automatically assembling a combined audiovisual file by performing the steps of: 
retaining a first audiovisual clip comprising a portion of the first video input occurring before the low noise audio event; retaining a second audiovisual clip comprising a portion of the second video input occurring after the low noise audio event; and concatenating the first audiovisual clip and the second audiovisual clip to create a combined audiovisual file containing video of the individual from two camera angles; 
wherein (1) the first audiovisual clip ends at the beginning of the low noise audio segment and the second audiovisual clip begins at the beginning of the low noise audio segment, (2) the first audiovisual clip ends after the beginning of the low noise audio segment and before the end of the low noise audio segment and the second audiovisual clip begins after the beginning of the low noise audio segment and before the end of the low noise audio segment, or (3) the first audiovisual clip ends at the end of the low noise audio segment and the second audiovisual clip begins at the end of the low noise audio segment;
wherein the combined audiovisual file includes the low noise audio segment from the audio input”. 
Claim 28 is considered to comprise the same inventive features as taught claim 21, with the exception that the audio-video recording target is an “individual i.e. interviewee”. However, this additional feature is considered obvious over the prior art combination taught in claim 21 because the automatic video editing teachings can still be applied to two cameras from different angles, each having a respective field of view shot (i.e. a close-up shot and wide angle shot) with the individual in it. One of 
When taking the collective knowledge and teachings in combination over disclosed prior arts (Marco, Dubin & Beauregard), one of ordinary skill in the art could have conceived of Applicant’s claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to COMBINE the teachings of Beauregard into suitable modification with the teachings of Marco in view Dubin to configure an automatic video editing system/method comprising continuous audio input and two video camera inputs (from different angles) for a scenario aimed on a common area of a common subject to produce a combined audiovisual file that includes the audio input’s first low noise audio segment for the MOTIVATED REASON of producing a high quality final audio-video with all the DESIRABLE important segments without the unwanted, unimportant or disruptive audio-video parts in the analogous art of automatic video edit processing.
For example, an intercut audio-video file comprising a close-up shot of the interviewee during speaking moments AND a wide-angle shot of the interview with the background objects during non-speaking moments or vice versa. 

	As per claim 29, Marco in view of Dubin in view of Beauregard teaches the method of claim 28, wherein the combined audiovisual file is a video interview of a job candidate (Marco’s invention may produce a combined video interview in general, para [0022]. Dubin’s invention may be applied to a job interview, para[0028]. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Dubin into suitable modification with the teachings of Marco to produce a combined audiovisual file video interview of a job candidate). 

	As per claim 30, Marco in view of Dubin in view of Beauregard teaches the method of claim 28, further comprising the steps of: sampling the audio input to identify a plurality of low noise audio segments, retaining video clips that alternately switch between the first video input and the second video input following the low noise audio segments, and concatenating the alternating video clips to create a combined audiovisual file containing video that alternates between two camera angles (In view of the prior art combination teachings made in claim 28, these additional features are considered obvious to one of ordinary skill in the art and as a matter of design choice/user preference. Marco: teaches automatic processing (Figures 5 & 6) of identifying low/no speaking noise in a first clip and high speaking noise in a second clip, wherein a transition effect segue/cuts from the first clip “first camera angle” to the second clip “second camera angle” to produce a high quality combined time-aligned audio-video with all desirable important segments over cited teachings in para [0009-0010, 0042, 0045, 0047-0048, 0050, 0053, 0055, 0067 & 0074]. It’s understood these steps may be repeatable to automatically “switch in alternating fashion” between audio-video clips in a real world scenario. This can be gleamed from Marco’s Fig.7 (shown above in claim 23): alternating audio-visual clips between two different camera angles based on when an interviewee 110 is speaking 705 & 715 and stops speaking 710). 

	As per claim 31, Marco in view of Dubin in view of Beauregard teaches the method of claim 30, further comprising the step of extracting content data from the first video input, the second video input, or the audio input to identify one or more switch-initiating events, wherein switching between the first video input and the second video input is only performed for low noise audio segments that follow switch-initiating events (In view of the prior art combination teachings made in claim 28 and 30, these additional features are considered obvious to one of ordinary skill in the art. Marco: Fig. 5: steps 510-530 and Fig. 6: steps 610-635, system automatically analyzes both the audio & video to identify the audio characteristics which includes low noise audio events (para [0009-0010, 0042, 0045, 0047-0048, 0050, 0053, 0055, 0074]) such as no speech or low energy level/volume “below a threshold”, para[0050], AND  extracting content data such as a face/mouth image data via face recognition, which are used to automatically produce a combined audiovisual file as in step 545 or step 650. Combined audiovisual file uses a transition effect which may segue/cut from “retained” first audio-video clip to “retained” second audio-video clip based on identified audio characteristics “no speech or low energy level/volume”, para[0009-0010 & 0067]. In other words, in the spirit of Marco’s teachings, it can be appreciated that the system analysis identifies low/no speaking noise in a first clip and high speaking noise in a second clip, wherein a transition effect switch cuts from the first clip to the second clip to produce a high quality combined time-aligned audio-video with all desirable important segments). 

	As per claim 32, Marco in view of Dubin in view of Beauregard teaches the method of claim 31, wherein the content data is a keyword extracted using speech-to-text (In view of the prior art combination teachings discussed in claim 28 & 31, these additional features are considered obvious by teaching audio/voice recognition and image recognition in which the important moment may be a word/sentence spoken. Marco, Figures 5 & 6 and para[0005, 0009-0010, 0042, 0045-0048, 0050, 0053, 0055, 0074]. Dubin, Figure 6: speech analysis 160).

	As per claim 34, Marco in view of Dubin in view of Beauregard teaches the method of claim 28, wherein when the first audiovisual clip ends after the beginning of the low noise audio segment and before the end of the low noise audio segment and the second audiovisual clip begins after the beginning of the low noise audio segment and before the end of the low noise audio segment, the first audio visual clip ends at a midpoint of the low noise audio segment (As discussed in the Response to Arguments and depicted scenarios, these features are considered to be encompassed by scenario #2, and thus is taught over the prior art combination of Marco in view of Dubin as made in claim 28). 




Allowable Subject Matter
Claims 33, 35 and 41 are objected to as being dependent upon respective rejected base claims 21 and 28, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claims 36-40 and 42 are allowable. 
As allowable subject matter has been indicated, applicant's reply must either comply with all formal requirements or specifically traverse each requirement not complied with.  See 37 CFR 1.111(b) and MPEP § 707.07(a).
Therefore, Applicant's response will need to overcome the double patenting rejection discussed above.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”







Closest Prior Art
	The following 5 references are considered to be the closest prior art teachings in relation to Applicant’s claimed/unclaimed invention. The collective teachings in these references are considered to illustrate many teachings of Applicant’s invention, as one of ordinary skill in the art would have found obvious to use in combination before the effective filing date of the claimed invention for the motivated reason of producing a high quality final audio-video with all the desirable important segments without the unwanted, unimportant or disruptive audio-video parts in the analogous art of automatic video edit processing.

	Marco (US 2018/0350405) discloses an automatic audio-video editing system of a recorded interview (using any number devices 105/120 each having a camera, microphone or combination of both a camera & microphone, para[0022 & 0061]) which performs analysis of audio/video segments (Fig.5: 510/515 & Fig.6: 610/615) to concatenate a final synchronized audio-video (Fig.5: 545 & Fig.6: 650) based on desired moments which may be a current person speaking. Each video may be from a different angle in view of para[0009]. The video analysis may identify image objects “face/mouth talking” used to determine which audio segment is ducked/deemphasized (i.e. muted, decreased volume or removed) and which audio segment is emphasized (i.e. kept) between a pair of first and second audio-video inputs. The video analysis is also used to determine which video segment is deemphasized via a transition effect (i.e. cut/removed, wipe, fade) and which video segment is emphasized between a pair of first and second audio-video inputs. The audio analysis may identify audio audio segment is ducked/deemphasized (i.e. muted or decreased volume) and which audio segment is emphasized (i.e. kept) between a pair of first and second audio-video inputs. See para[0009-0010, 0022-0026, 0042, 0045, 0047-0048, 0050, 0053, 0055, 0064 & 0074].
	
Dubin (US 2013/0216206) discloses an automatic video editing system applicable to any recorded audio or audio-video conversation (para[0027-0032]), which may include an interview facilitated by a server (Figure 17 & para[0270-0278]), for combining audio/video tracks to produce a final composite synchronized audio-visual video based on dominant speaker(s). The system may perform audio analysis 120/160 and video analysis 122/168 to identify the dominant speaker(s) amongst a group of people (Fig.3, 11-14) to produce the final audio-visual video in view of Figures 4-7. Deemphasize processing (i.e. mute/reduce) to unwanted audio segments such as a non-speech segment, while emphasize processing (i.e. enhance, enlarge, highlight) to wanted video segments such as an identified dominant speaker is performed to produce the final audio-visual video in view of para[0066-0067, 0079-0080, 0104, 0106, 0113, 0168-0169 and 0183].
It is noted that Dubin teaches trimming excess quiet segments that start at very beginning of entire audio-visual OR trimming excess content segments at very ending of entire audio-visual in para[0116-0017] which is not the same thing as trimming between any pair of audio-video inputs. It is also noted that Dubin teaches editing/cutting uncomfortable pauses due to communication latencies (i.e. technology-induced) between when participants speak in para[0129] which is not the same thing as trimming between any pair of audio-video inputs due to identifying an actual low noise audio caused by a pause in speech (i.e. human-induced) when a participant speaks/not speaks.

Beauregard (US 2010/0183280) discloses an automatic video editing system (using any number of cameras & microphones) which performs analysis of audio/video (para[0136-0137]) to concatenate a final synchronized audio-video (Fig.1) based on important moments which may be a current person speaking i.e. automatic switching between video clips may be based on cutting to the loudest person as the current speaker, para[0136]. See Figures 1-3 and 6. See para[0039, 0065-0074, 0130-0132 and 0136-0137].

Sargin (US 8,913,103) discloses an automatic video editing system (using a single camera and microphone) which performs analysis of audio/video to concatenate a final audio-video based on a detected person speaking or not. See Figures 1, 2 and 9. See col.2, lines 48-60; col.3, lines 23-26; col.6, lines 17-30; col.13, line 4 – col.14, line 27.

	Bradley (US 2015/0356512) discloses an automatic video editing system for a job interview which concatenates a final video (i.e. comprised of both video & audio) from a series of recorded questions and answers. These videos could be created using a computer video camera, done in a professional studio or other means of video capture. Furthermore, while the video may be posted or created in a single sitting, it 



Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.





Contacts
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to AKSHAY TREHAN whose telephone number is (571) 270-5252.  The examiner can normally be reached on Monday - Friday.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Twyler Haskins can be reached on (571) 272-7406.  The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/AKSHAY TREHAN/
Examiner, Art Unit 2698

/JAMES M HANNETT/Primary Examiner, Art Unit 2698