Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 

“Video encoder”  in claim 1 .
“generative Adversarial Network System”;  “audio synthesizer” “an Inverse Short-Time Fourier Transform (ISTFT) function”  in claim 24


Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.




Reasons for Allowance
The following is an examiner’s statement of reasons for allowance:

Re claim 1 Zhao et al “The Sound of pixels” (cited the IDS discloses) discloses A method for an audiovisual source separation processing, the method comprising: receiving video data showing images of a plurality of sound sources into a video encoder (see abstract also see figure 4 ) ; encoding, in the video encoder, the received video data into video localization data comprising information associating pixels in the frames of video data with different channels of sound (see section 3.1 video analysis network see also section 5.6 note that pixel are associated with different channels of sound  );

Zhao does not expressly disclose concurrently receiving into the video encoder optical flow data of the video data, the optical flow data indicating motions of pixels between frames of the video data. encoding, in the video encoder, the received optical flow data into video separation data comprising information associating motion information in the frames of video data with the different channels of sound.

Lu et  “Listen and Look: Audio–Visual Matching Assisted Speech Source Separation” IEEE 2018
Lu discloses A method for an audiovisual source separation processing, the method comprising: receiving video data showing images of a plurality of sound sources into a video encoder; concurrently receiving into the video encoder optical flow data of the video data, the optical flow data indicating motions of pixels between frames of the video data (see figure 1 and 2 note optical flow and grey scale images are input into the audio visual encoder and section II a ); encoding, in the video encoder, the received video data into video localization data (see figure 1 and 2 and section II A  note optical flow and grey video data are input to encoder ); and encoding, in the video encoder, the received optical flow data into video separation data. (see figure 1 and 2 and section II A  note optical flow and grey video data are input to encoder )


Lu does not expressly disclose encoding, in the video encoder, the received video data into video localization data comprising information associating pixels in the frames of video data with different channels of sound; and encoding, in the video encoder, the received optical flow data into video separation data comprising information associating motion information in the frames of video data with the different channels of sound.”



Re claim 16 Zhao discloses A method for an audiovisual source separation processing, the method comprising: providing an audio receiver to receive audio associated with the video frame data, the audio receiver including a Short-Time Fourier Transform (STFT) function to convert the associated audio into spectrogram data comprising a listing of values for different frequency bins n at a time t (see section 3.1 Audio analysis network ); providing an input to receive a selection of a pixel of the video frame data as a selected pixel (see section 3.1 audio synthesizer network note that sound of a input pixels is estimated); mixing spectrogram data for the selected pixel with the associated audio in an audio synthesizer and providing an output of the audio synthesizer into an Inverse Short-Time Fourier Transform (ISTFT) function (see section 3.1 audio synthesizer network); and providing an output of the ISTFT function as output audio of the selected pixel(see section 3.1 audio synthesizer network);.

Zhao does not expressly disclose “providing a Generative Adversarial Network (GAN) system comprising a plurality of Deep Neural Networks (DNNs) configured to comprise a GAN generator and a GAN discriminator, the GAN generator configured to receive video frame data and associated optical flow frame data indicating pixel motion between the video frames;”

Re claim 24 note that claim the allowable features of claim 24  24 is similar to those of claim 16 


Re claim 21 Wang et al discloses A method, the method comprising training a Generative Adversarial Network (GAN) system comprising a plurality of Deep Neural Networks (DNNs) configured to comprise a GAN generator and a GAN discriminator, the GAN generator (see figure 6a note that there is a generator network 404 and a discriminator network 406) configured to receive video frame data and associated optical flow frame data indicating pixel motion between the video frames (see figure 6A  note that optical flow and training images are input into the image exposure generator network  (see also paragraph 38 note that the images are video frames ), wherein the training comprises receiving a plurality of different video clips into the GAN generator (see paragraph 38 note that video is used for training ) and, using a gradient descent training process, training the GAN generator to generate candidates attempting to fool the GAN discriminator while training the GAN discriminator to correctly identify whether the candidates are real/fake or clean/mixture (see paragraph 34 note that the discriminator is attempted to be fooled by the generator).

The Wang  does not discloses A method for an audiovisual source separation processing, the method comprising training a Generative Adversarial Network (GAN) system comprising a plurality of Deep Neural Networks (DNNs) configured to comprise a GAN generator and a GAN discriminator, the GAN generator configured to receive video frame data and associated optical flow frame data indicating pixel motion between the video frames.  

The examiner notes that the GAN in Wang the GAN in wang has nothing to do with audio visual source separation and is trained for a completely different purpose. The examiner notes that training a GAN for audio visual source separation would require different video clips for training and a completely different type of discriminator that used in Wang. Furthermore the GAN in Wang would not be useful for performing Audio visual source separation is not capable of or related to performing the audio visual separation recited in the preamble. 

See MPEP section 2111.02 section II.  :

During examination, statements in the preamble reciting the purpose or intended use of the claimed invention must be evaluated to determine whether or not the recited purpose or intended use results in a structural difference (or, in the case of process claims, manipulative difference) between the claimed invention and the prior art. If so, the recitation serves to limit the claim. See, e.g., In re Otto, 312 F.2d 937, 938, 136 USPQ 458, 459 (CCPA 1963) (The claims were directed to a core member for hair curlers and a process of making a core member for hair curlers. The court held that the intended use of hair curling was of no significance to the structure and process of making.); In re Sinex, 309 F.2d 488, 492, 135 USPQ 302, 305 (CCPA 1962) (statement of intended use in an apparatus claim did not distinguish over the prior art apparatus). To satisfy an intended use limitation which is limiting, a prior art structure which is capable of performing the intended use as recited in the preamble meets the claim. See, e.g., In re Schreiber, 128 F.3d 1473, 1477, 44 USPQ2d 1429, 1431 (Fed. Cir. 1997) (anticipation rejection affirmed based on Board’s factual finding that the reference dispenser (a spout disclosed as useful for purposes such as dispensing oil from an oil can) would be capable of dispensing popcorn in the manner set forth in appellant’s claim 1 (a dispensing top for dispensing popcorn in a specified manner)) and cases cited therein.


Re claim 24 Wang discloses A method of training a Generative Adversarial Network (GAN) system, the GAN system comprising a plurality of Deep Neural Networks (DNNs) configured to comprise a GAN generator and a GAN discriminator (see figure 6a note that there is a generator network 404 and a discriminator network 406), the GAN generator configured to receive video frame data and associated optical flow frame data indicating pixel motion between the video frames, (see figure 6A  note that optical flow and training images are input into the image exposure generator network  see also paragraph 38 note that the images are video frames ) the training method comprising: receiving a plurality of different video clips into the GAN generator see also paragraph 38 note that the images are video frames ); applying a gradient descent training process to the DNNs of the GAN generator to train the GAN generator to generate candidates attempting to fool the GAN discriminator(see paragraph 34 note that the discriminator is attempted to be fooled by the generator see also paragraph 139).; and applying the gradient descent training process to the GAN discriminator to train the GAN discriminator to correctly identify whether the candidates are real/fake or clean/mixture (see paragraph 34 note that the discriminator is attempted to be fooled by the generator see also paragraph 139)..

The examiner notes that While the GAN is recited in the preamble the specific GAN recited in the preamble is referred to the in the body of claim and therefore the language: “a Generative Adversarial Network (GAN) system used for audiovisual source separation, the GAN system comprising a plurality of Deep Neural Networks (DNNs) configured to comprise a GAN generator and a GAN discriminator, the GAN generator configured to receive video frame data and associated optical flow frame data indicating pixel motion between the video frames” is given patentable weight.  Similar to claim 21 The GAN in Wang would not be useful for performing Audio visual source separation is not capable of or related to performing the audio visual separation. 

The remaining claims depend from one or more of the above allowable claims.


Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEAN T MOTSINGER whose telephone number is (571)270-1237. The examiner can normally be reached 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached on (571)272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SEAN T MOTSINGER/Primary Examiner, Art Unit 2669