DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Allowable Subject Matter
1.	Claim 7-8, 10 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Claim Rejections - 35 USC § 103
2.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
3.	Claim(s) 1, 16 are rejected under 35 U.S.C. 103 as being unpatentable over Lehtiniemi et al. (US 2018/0115853) in view of Eronen (WO 2019/166278).
Regarding claim 1, Lehtiniemi teaches a method of cropping a portion of an audio signal captured from a plurality of spatially separated audio sources in a scene (see fig. 3a-3b, ¶ 0074, 0101-0105, 0120. The system provides for cropping audio signals which are provided in different scenes.) the method comprising: capturing the audio signal with one or more recording devices (see ¶ 0122. The audio signals are captured from microphones or other audio capturing devices.); separating the audio signal into a plurality of components associated with one or more of the plurality of audio sources; selecting a spatial region in the scene (see fig. 3a-3b, ¶ 0058-0059, 0061-0062, 0074, 0086, 0101-0102. The audio scenes are spatially separated in different locations. Thus separating the overlapping audio into two separate locations.). 
Lehtiniemi does not teach determining which of the plurality of components are associated with an audio source positioned outside of the selected spatial region; and cropping the plurality of components associated with an audio source positioned outside of the selected spatial region out of the audio signal.  
Eronen teaches determining which of the plurality of components are associated with an audio source positioned outside of the selected spatial region; and cropping the plurality of components associated with an audio source positioned outside of the selected spatial region out of the audio signal (see page 17, line 34-page 18, line 8. The directions of which the spatial audio is presented maybe cropped  (the audio from directions outside of the crop region) are discarded. Therefore audio that is determined to outside of the region will be cropped and removed or compressed.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lehtiniemi to incorporate audio on outside regions being cropped in order to narrow down the sound field to the internal region. The modification provides for sounds to be heard clearer from those regions. 


Regarding claim 16, Lehtiniemi teaches an audio-visual system comprising: one or more recording devices configured to capture an audio signal from a plurality of spatially separated audio sources in a scene (see fig. 3a-3b, ¶ 0074, 0101-0105, 0120. The system provides for cropping audio signals which are provided in different scenes.); a user device comprising a display configured to display a spatial representation of the scene and a user interface for selecting a spatial region on the displayed spatial representation and one or more processors configured to: separate the audio signal into a plurality of components associated with one of the plurality of audio sources; (see fig. 3a-3b, ¶ 0058-0059, 0061-0062, 0074, 0082, 0086, 0101-0102, 0122. The audio scenes are spatially separated in different locations. Thus separating the overlapping audio into two separate locations. The audio signals are captured from microphones or other audio capturing devices.).
Lehtiniemi does not teach determine which of the plurality of components are associated with an audio source positioned outside of the spatial region selected with the user interface; and crop the plurality of components associated with an audio source positioned outside of the selected spatial region out of the audio signal.  
Eronen teaches determine which of the plurality of components are associated with an audio source positioned outside of the spatial region selected with the user interface; and crop the plurality of components associated with an audio source positioned outside of the selected spatial region out of the audio signal (see page 17, line 34-page 18, line 8. The directions of which the spatial audio is presented maybe cropped  (the audio from directions outside of the crop region) are discarded. Therefore audio that is determined to outside of the region will be cropped and removed or compressed.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lehtiniemi to incorporate audio on outside regions being cropped in order to narrow down the sound field to the internal region. The modification provides for sounds to be heard clearer from those regions. 





4.	Claim(s) 2-3, 13-15, 17-22 are rejected under 35 U.S.C. 103 as being unpatentable over Lehtiniemi et al. (US 2018/0115853) in view of Eronen (WO 2019/166278).
Regarding claim 2, Lehtiniemi teaches the method of claim 1, wherein selecting the spatial region in the scene comprises: on a display of a user device, displaying a spatial representation of the scene; and with a user interface of the user device, selecting the spatial region on the displayed spatial representation (see fig. 3a-3b, ¶ 0081-0083. The display provides spatial regions in a scene in which the selecting of the region can be selected by the user.).   

Regarding claim 3, Lehtiniemi teaches the method of claim 2, comprising: with an image capture device, capturing image data of the scene, and constructing the spatial representation of the scene from the captured image data (see fig. 3a-3b, ¶ 0081-0083, 0118. The display provides spatial regions in a scene in which the selecting of the region can be selected by the user. The image data associated with the image data are rendered in the environment.).    


Regarding claim 13, Lehtiniemi does not teach the method of claim 3, comprising cropping out of the image data regions outside of the selected spatial region.  
Eronen teaches cropping out of the image data regions outside of the selected spatial region (see page 14-line 14-page 15, line 10, page 17, line 34-page 18, line 8. The directions of which the spatial audio is presented maybe cropped  (the audio from directions outside of the crop region) are discarded. Therefore audio that is determined to outside of the region will be cropped and removed or compressed. The audio signals are in correlation with image data that if the image is being cropped in correlation with the audio data.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lehtiniemi to incorporate audio on outside regions being cropped in order to narrow down the sound field to the internal region. The modification provides for sounds to be heard clearer from those regions. 


Regarding claim 14, Lehtiniemi does not teach the method of claim 1, comprising: recognizing in the cropped audio signal a speech component, isolating the speech component, and outputting with a playback device the speech component. 
 Eronen teaches recognizing in the cropped audio signal a speech component, isolating the speech component, and outputting with a playback device the speech component (see page 14-line 14-page 16, line 6, page 17, line 34-page 18, line 8. The directions of which the spatial audio is presented maybe cropped  (the audio from directions outside of the crop region) are discarded. Therefore audio that is determined to outside of the region will be cropped and removed or compressed. The audio signals are in correlation with image data that if the image is being cropped in correlation with the audio data. The ambient audio includes the speech in which it is separated to in order to provide a clear sound.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lehtiniemi to incorporate audio on outside regions being cropped in order to narrow down the sound field to the internal region. The modification provides for sounds to be heard clearer from those regions. 

Regarding claim 15, Lehtiniemi does not teach the method of claim 1, comprising: identifying in the cropped audio signal the component of the plurality of components having the highest volume, and outputting with a playback device the component having the highest volume. 
 Eronen teaches identifying in the cropped audio signal the component of the plurality of components having the highest volume, and outputting with a playback device the component having the highest volume (see page 14-line 14-page 16, line 6, page 17, line 34-page 18, line 8. The directions of which the spatial audio is presented maybe cropped  (the audio from directions outside of the crop region) are discarded. Therefore audio that is determined to outside of the region will be cropped and removed or compressed. The ambient audio includes the speech in which it is separated to in order to provide a clear sound. The audio at a volume greater than the ambient audio  is provided to a user for playback.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lehtiniemi to incorporate audio on outside regions being cropped in order to narrow down the sound field to the internal region. The modification provides for sounds to be heard clearer from those regions. 

Regarding claim 17, Lehtiniemi teaches the audio-visual system of claim 16, wherein the user device comprises said one or more recording devices (see ¶ 0122. The audio signals are captured from microphones or other audio capturing devices.).




Regarding claim 18, Lehtiniemi teaches the audio-visual system of claim 16, comprising one or more image capture devices configured to capture image data of the scene, and wherein the one or more processors are configured to construct the spatial representation of the scene from the captured image data (see fig. 3a-3b, ¶ 0081-0083, 0118. The display provides spatial regions in a scene in which the selecting of the region can be selected by the user. The image data associated with the image data are rendered in the environment.).    


Regarding claim 19, Lehtiniemi teaches the audio-visual system of claim 18. wherein the user device comprises said one or more image capture devices (see ¶ 0122. The audio signals are captured from microphones or other audio capturing devices.).


Regarding claim 20, Lehtiniemi does not teach the audio-visual system of claim 18, wherein the one or more processors are configured to crop out of the captured image data of the scene regions outside of the selected spatial region.  
Eronen teaches wherein the one or more processors are configured to crop out of the captured image data of the scene regions outside of the selected spatial region (see page 14-line 14-page 15, line 10, page 17, line 34-page 18, line 8. The directions of which the spatial audio is presented maybe cropped  (the audio from directions outside of the crop region) are discarded. Therefore audio that is determined to outside of the region will be cropped and removed or compressed. The audio signals are in correlation with image data that if the image is being cropped in correlation with the audio data.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lehtiniemi to incorporate audio on outside regions being cropped in order to narrow down the sound field to the internal region. The modification provides for sounds to be heard clearer from those regions. 


Regarding claim 21, Lehtiniemi does not teach the audio-visual system of claim 20, comprising one or more playback devices configured to output the cropped audio signal and the cropped image data.    
Eronen teaches comprising one or more playback devices configured to output the cropped audio signal and the cropped image data (see page 14-line 14-page 15, line 10, page 17, line 34-page 18, line 8. The directions of which the spatial audio is presented maybe cropped  (the audio from directions outside of the crop region) are discarded. Therefore audio that is determined to outside of the region will be cropped and removed or compressed. The audio signals are in correlation with image data that if the image is being cropped in correlation with the audio data.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lehtiniemi to incorporate audio on outside regions being cropped in order to narrow down the sound field to the internal region. The modification provides for sounds to be heard clearer from those regions. 


Regarding claim 22, Lehtiniemi does teaches the audio-visual system of claim 16, wherein said user device is mobile device.  
Eronen teaches wherein said user device is mobile device (see page 14-line 26-32. A user device being a mobile telecommunication device.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lehtiniemi to incorporate a mobile device. The modification for being implemented on a mobile device.  


5.	Claim(s) 4 is rejected under 35 U.S.C. 103 as being unpatentable over Lehtiniemi et al. (US 2018/0115853) in view of Eronen (WO 2019/166278) in further view of Bar-on et al. (US 2015/0281839).
	Regarding claim 4, Lehtiniemi and Eronen does not teach the method of claim 1, wherein separating the audio signal into a plurality of components comprises performing blind source separation on the captured audio signal.  
	Bar-on teaches wherein separating the audio signal into a plurality of components comprises performing blind source separation on the captured audio signal (see ¶ 0054. The audio sources are captured and a blind source separation is performed on the audio signals.). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lehtiniemi and Eronen to incorporate blind source separation on audio signals. The modification provides separation for plurality of audio signals captured form the microphones using blind source separation. 

6.	Claim(s) 5 is rejected under 35 U.S.C. 103 as being unpatentable over Lehtiniemi et al. (US 2018/0115853) in view of Eronen (WO 2019/166278) in further view of Bar-on et al. (US 2015/0281839) in further view of Helwani et al. (US 2018/0299527).
	Regarding claim 5, Lehtiniemi, Eronen and Bar-on do not teach the method of claim 4, wherein performing said blind source separation comprises: converting the captured audio signal to time-frequency domain data comprising a plurality of frames for a plurality of times and frequencies, and constructing a multi-channel filter to operate on said time-frequency data frames to separate the plurality of components by source by calculating a set of filter coefficients corresponding to each source.  
	Helwani teaches wherein performing said blind source separation comprises: converting the captured audio signal to time-frequency domain data comprising a plurality of frames for a plurality of times and frequencies, and constructing a multi-channel filter to operate on said time-frequency data frames to separate the plurality of components by source by calculating a set of filter coefficients corresponding to each source (see ¶ 0018, 0035-0037, 0061-0072, 0092-0101. The system captures the audio and converts the signals in the time frequency domain. The system calculates the coefficients on a frame by frame basis.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lehtiniemi, Eronen and Bar-on to incorporate calculating on a frame by frame basis of the captured signal sources. The modification provides separation for plurality of audio signals captured form the microphones using blind source separation. 

Regarding claim 6, Lehtiniemi, Eronen and Bar-on do not teach the method of claim 5, wherein said cropping comprises: selecting and applying the set of filter coefficients of the multi-channel filter corresponding to those sources determined to be inside of the selected spatial region. 
Helwani teaches wherein said cropping comprises: selecting and applying the set of filter coefficients of the multi-channel filter corresponding to those sources determined to be inside of the selected spatial region (see fig .1-2, ¶ 0018, 0035-0037, 0061-0072, 0092-0101, 0122. The system captures the audio and converts the signals in the time frequency domain. The system calculates the coefficients on a frame by frame basis. The sources are from a direction that can be in an inside region ).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lehtiniemi, Eronen and Bar-on to incorporate calculating on a frame by frame basis of the captured signal sources. The modification provides separation for plurality of audio signals captured form the microphones using blind source separation. 


7.	Claim(s) 9 is rejected under 35 U.S.C. 103 as being unpatentable over Lehtiniemi et al. (US 2018/0115853) in view of Eronen (WO 2019/166278) in further view of Ben-Bassat (US 2015/0365628).
	Regarding claim 9,  Lehtiniemi, Eronen  do not teach the method of claim 3, wherein the set of possible directions of arrival comprises a set of vectors in a 3D coordinate system of the spatial representation of the scene.  
	Ben-bassat teaches wherein the set of possible directions of arrival comprises a set of vectors in a 3D coordinate system of the spatial representation of the scene (see ¶ 0055. The spatial information is provided in a 3D representation of a 3D coordinate system.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lehtiniemi and Eronen to incorporate 3D spatial representation. The modification provides 3D representation in a coordinate system.  
 

8.	Claim(s) 11, 12 are rejected under 35 U.S.C. 103 as being unpatentable over Lehtiniemi et al. (US 2018/0115853) in view of Eronen (WO 2019/166278) in further view of Bar-on et al. (US 2015/0281839) in further view of Helwani et al. (US 2018/0299527) in further view of Rollow, IV et al. (US 2020/0068297).
	Regarding claim 11, Lehtiniemi, Eronen, Bar-on and Helwani do not teach the method of claim 5, comprising calculating said phase correlations using a phase transformation method.  
	Rollow teaches calculating said phase correlations using a phase transformation method (see ¶ 0016, 0066-0069, 0071-0072. The system providing phase transformation for steering the beam based on the received signal of the microphone array.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lehtiniemi, Eronen, Bar-on and Helwani to incorporate beam steering of the capturing system by phase transformation. The modification provides tracking the signals in the environment.   
 

Regarding claim 12, Lehtiniemi, Eronen, Bar-on and Helwani do not teach the method of claim 11, wherein the phase transformation method comprises a steered response power phase transformation (SRP-PHAT) method.  
	Rollow teaches wherein the phase transformation method comprises a steered response power phase transformation (SRP-PHAT) method (see ¶ 0016, 0066-0069, 0071-0072. The system providing phase transformation for steering the beam based on the received signal of the microphone array. The transformation in response to the power phase transformation.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lehtiniemi, Eronen, Bar-on and Helwani to incorporate beam steering of the capturing system by phase transformation. The modification provides tracking the signals in the environment.   


9.	Claim(s) 23 is rejected under 35 U.S.C. 103 as being unpatentable over Lehtiniemi et al. (US 2018/0115853) in view of Eronen (WO 2019/166278) in further view of Pell (US 2018/0176508).
	Regarding claim 23, Lehtiniemi, Eronen do not teach the audio-visual system of claim 16, wherein the audio-visual system comprises a video teleconferencing system.  
	Pell teaches wherein the audio-visual system comprises a video teleconferencing system (see ¶ 0076, 0101. The video conferencing system provides for cropping audio and images.). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lehtiniemi and Eronen to incorporate a mobile device. The modification for being implemented on a mobile device.  


Conclusion
10.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to ASSAD MOHAMMED whose telephone number is (571)270-7253. The examiner can normally be reached 9:00AM-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached on 571-272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ASSAD MOHAMMED/Examiner, Art Unit 2651       

/DUC NGUYEN/Supervisory Patent Examiner, Art Unit 2651