Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This is in response to applicant's amendment which was filed on 10/28/2021 and has been entered. Claims 1, 10, 21, 30, and 41-42 have been amended. No claims have been cancelled. No claims have been added. Claims 1-42 are still pending in this application, with claim 1, 21, and 41-42 being independent.
 
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. 

Claims 1-4, 21-24 and 41-42 is/are rejected under 35 U.S.C. 103 as being unpatentable over Grosche (US 2019/0253826) in view of Glaser (US 2018/0088900).

Regarding claim 1, Grosche teaches A device configured to play one or more of a plurality of audio streams (Grosche ¶0065, “VLOs for each recording spot in the virtual free field” and ¶0066 “the one or more VLOs assigned to the respective microphone setup so that these one or more VLOs virtually reproduce the sound that was recorded by the respective microphone setup”), the device comprising: a memory configured to store the plurality of audio streams (Grosche ¶0105, “the storage medium is configured to store microphone signals and/or metadata of one or more microphone setups, the static and/or dynamic VLO parameters and/or any information necessary for performing the methods of the embodiments of the present disclosure”), each of the audio streams representative of a soundfield (Grosche figure 2a and ¶0069, “virtual listening position for (3D) audio playback within a real, recorded acoustic scene”); and one or more processors coupled to the memory (Grosche ¶0105), and configured to: present a user interface to a user (Grosche figure 2a and  ¶0064, “the user may be enabled to specify the virtual listening position by typing in a specific virtual listening position into the playback apparatus”); obtain a first indication from the user via the user interface representing a desired listening position (Grosche figure 2a and ¶0064, “the user may be enabled to specify ; however does not explicitly teach obtain a second indication from the user via the user interface representing an audio source distance threshold, and select, based on the first indication and the second indication, at least one audio stream of the plurality of audio streams.
 
Glaser teaches a first indication representing a desired listening position (Glaser figure 12 and ¶0060, “the audio control configuration system 130 can enable other forms of position and/or orientation sensing….person orientation sensing system 170 that functions to detect location, direction, and/or orientation of one or more subjects”), obtain a second indication from a user via an user interface representing an audio source distance threshold (Glaser figure 5 and ¶0046, “the control application can display a heat map where color maps to amplitude…graphically detectable in a visual manner” and ¶0048 “user interaction that supports adding, customizing, and/or removing positional audio control inputs”), and select, based on the first indication and the second indication, at least one audio stream of the plurality of audio streams (Glaser ¶0061, “audio generator 140 functions to generate an audio output according to the positional audio control inputs.” See also figure 13, ¶0060 “dynamically increasing volume for audio sources in close proximity of a subject”).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of 

Regarding claims 2 and 22, Grosche teaches wherein the memory is further configured to store location information associated with coordinates of an acoustical space in which a corresponding one of the plurality of audio streams was captured or synthesized (Grosche ¶0070-0072, “coordinate origin to the center position of the i-th recording spot”).

Regarding claims 3 and 23, Grosche teaches wherein the user interface comprises one or more of a graphical user interface (Grosche figure 2a and ¶0064, “the user may be enabled to specify the virtual listening position by typing in a specific virtual listening position into the playback apparatus”), a gesture-based user interface, a voice command-based user interface, a touch-based user interface.

Regarding claims 4 and 24, Grosche teaches wherein the user interface is configured to obtain user input in at least one of single touch, multi-touch, gesture, voice command, or tap (Grosche figure 2a and ¶0064, “the user may be enabled to specify the virtual listening position by typing in a specific virtual listening position into the playback apparatus”).

Regarding claim 21, Grosche teaches A method comprising: storing, by a memory (Grosche ¶0105, “the storage medium is configured to store microphone , a plurality of audio streams (Grosche ¶0065, “VLOs for each recording spot in the virtual free field” and ¶0066 “the one or more VLOs assigned to the respective microphone setup so that these one or more VLOs virtually reproduce the sound that was recorded by the respective microphone setup”), each of the audio streams representative of a soundfield (Grosche figure 2a and ¶0069, “virtual listening position for (3D) audio playback within a real, recorded acoustic scene”), the memory being communicatively coupled to one or more processors (Grosche ¶0105); presenting, by the one or more processors, a user interface to a user (Grosche figure 2a and  ¶0064, “the user may be enabled to specify the virtual listening position by typing in a specific virtual listening position into the playback apparatus”); obtaining, by the one or more processors, via the user interface, a first indication representing a desired listening position (Grosche figure 2a and ¶0064, “the user may be enabled to specify the virtual listening position by typing in a specific virtual listening position into the playback apparatus”), however does not explicitly teach obtaining, by the one or more processors, from the user via the user interface, a second indication representing an audio source distance threshold, selecting, by the one or more processors and based on the first indication and the second indication, at least one audio stream of the plurality of audio streams.

Glaser teaches a first indication representing a desired listening position (Glaser figure 12 and ¶0060, “the audio control configuration system 130 can enable obtaining, by the one or more processors, from the user via the user interface, a second indication representing an audio source distance threshold (Glaser figure 5 and ¶0046, “the control application can display a heat map where color maps to amplitude…graphically detectable in a visual manner” and ¶0048 “user interaction that supports adding, customizing, and/or removing positional audio control inputs”), selecting, by the one or more processors and based on the first indication and the second indication, at least one audio stream of the plurality of audio streams (Glaser ¶0061, “audio generator 140 functions to generate an audio output according to the positional audio control inputs.” See also figure 13, ¶0060 “dynamically increasing volume for audio sources in close proximity of a subject”).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of Glaser to improve the known device of Grosche to achieve the predictable result of enhancing audio processing to focus on the audio source of attention.

Regarding claim 41, Grosche teaches A device configured to play one or more of a plurality of audio streams (Grosche ¶0065, “VLOs for each recording spot in the virtual free field” and ¶0066 “the one or more VLOs assigned to the respective microphone setup so that these one or more VLOs virtually reproduce the sound that was recorded by the respective microphone setup”), the device comprising: means for storing the plurality of audio streams (Grosche ¶0105, “the storage medium is configured to store microphone signals and/or metadata of one or more microphone setups, the static and/or dynamic VLO parameters and/or any information necessary for performing the methods of the embodiments of the present disclosure”), each of the audio streams representative of a soundfield (Grosche figure 2a and ¶0069, “virtual listening position for (3D) audio playback within a real, recorded acoustic scene”); means for presenting a user interface to a user (Grosche figure 2a and  ¶0064, “the user may be enabled to specify the virtual listening position by typing in a specific virtual listening position into the playback apparatus”); means for obtaining, via the user interface, a first indication representing a desired listening position (Grosche figure 2a and ¶0064, “the user may be enabled to specify the virtual listening position by typing in a specific virtual listening position into the playback apparatus”); however does not explicitly teach means for obtaining, from the user via the user interface, a second indication representing an audio source distance threshold; and means for selecting, based on the first indication and the second indication, at least one audio stream of the plurality of audio streams.

Glaser teaches a first indication representing a desired listening position (Glaser figure 12 and ¶0060, “the audio control configuration system 130 can enable other forms of position and/or orientation sensing….person orientation sensing system 170 that functions to detect location, direction, and/or orientation of one or more subjects”), means for obtaining, from the user via the user interface, a second indication representing an audio source distance threshold (Glaser figure 5 and ; and means for selecting, based on the first indication and the second indication, at least one audio stream of the plurality of audio streams (Glaser ¶0061, “audio generator 140 functions to generate an audio output according to the positional audio control inputs.” See also figure 13, ¶0060 “dynamically increasing volume for audio sources in close proximity of a subject”).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of Glaser to improve the known device of Grosche to achieve the predictable result of enhancing audio processing to focus on the audio source of attention.

Regarding claim 42, Grosche teaches A non-transitory computer-readable storage medium having stored thereon instructions that, when executed, cause one or more processors to: store (Grosche ¶0105, “the storage medium is configured to store microphone signals and/or metadata of one or more microphone setups, the static and/or dynamic VLO parameters and/or any information necessary for performing the methods of the embodiments of the present disclosure”) a plurality of audio streams (Grosche ¶0065, “VLOs for each recording spot in the virtual free field” and ¶0066 “the one or more VLOs assigned to the respective microphone setup so that these one or more VLOs virtually reproduce the sound that was recorded by the , each of the audio streams representative of a soundfield (Grosche figure 2a and ¶0069, “virtual listening position for (3D) audio playback within a real, recorded acoustic scene”); present a user interface to a user (Grosche figure 2a and  ¶0064, “the user may be enabled to specify the virtual listening position by typing in a specific virtual listening position into the playback apparatus”); obtain, via the user interface from the user, a first indication representing a desired listening position (Grosche figure 2a and ¶0064, “the user may be enabled to specify the virtual listening position by typing in a specific virtual listening position into the playback apparatus”); however does not explicitly teach obtain, via the user interface from the user, a second indication representing an audio source distance threshold, and select, based on the first indication and the second indication, at least one audio stream of the plurality of audio streams.

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of Glaser to improve the known device of Grosche to achieve the predictable result of enhancing audio processing to focus on the audio source of attention.

Claim 5-6, 8-9, 11-12, 19-20, 25-26, 28-29 and 39-40 is/are rejected under 35 U.S.C. 103 as being unpatentable over Grosche (US 2019/0253826) in view of Glaser (US 2018/0088900) in further view of Leppanen (US 2018/0349088).



Leppanen teaches a graphical user interface comprises representations associated with coordinates of the acoustical space in which the plurality of audio streams were captured or synthesized (Leppanen ¶0051 “Knowledge of the location of each distinct audio source may be obtained by using transmitters/receivers or identification tags to track the position of the audio sources, such as relative to the presence capture device, in the scene captured by the presence capture device”).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of Leppanen to improve the known device of Grosche in view of Glaser to achieve the predictable result of user-friendly way for controlling sound sources in the virtual environment.

Regarding claims 6 and 26, Grosche in view of Glaser in further view of Leppanen teaches wherein the representations are arranged in the graphical user interface with a spatial relationship representing relative positions of the coordinates of 

Regarding claims 8 and 28, Grosche in view of Glaser in further view of Leppanen teaches wherein the one or more processors is further configured to combine at least two audio streams based on the indication by at least one of mixing the at least two audio streams (Leppanen figure 1 and ¶0054) or interpolating a third audio stream based on the at least two audio streams.

Regarding claims 9 and 29, Grosche in view of Glaser in further view of Leppanen teaches wherein the one or more processors are further configured to obtain via the user interface an importance indication representing an importance assigned to an audio stream, and wherein the importance indicates a relative gain to be applied to the audio stream (Leppanen ¶0075 and figure 5, “second user input type 524 effects the change in audio level for audio mixing”).

Regarding claim 11, Grosche in view of Glaser in further view of Leppanen teaches wherein the one or more processors are further configured to obtain an indication, via the user interface, from the user that the user desires to activate a snapping mode (Leppanen ¶0098, “smartphone…digital camera”).

Regarding claim 12, Grosche in view of Glaser in further view of Leppanen teaches wherein the snapping mode is a hard snapping mode or a soft snapping mode (Leppanen ¶0098, “smartphone…digital camera”).

Regarding claims 19 and 39, Grosche in view of Glaser in further view of Leppanen teaches wherein the device comprises a mobile handset (Leppanen ¶0068, “smartphone”).

Regarding claims 20 and 40, Grosche in view of Glaser in further view of Leppanen teaches wherein the device further comprises a wireless transceiver, the wireless transceiver being coupled to the one or more processors and being configured to receiver a wireless signal, wherein the wireless signal comprises at least one of Bluetooth, or Wi-Fi (Leppanen ¶0068, “smartphone”), or conforms to a fifth generation (5G) cellular protocol.

Claims 10, 13, 16, 30 and 36 is/are rejected under 35 U.S.C. 103 as being unpatentable over Grosche (US 2019/0253826) in view of Glaser (US 2018/0088900) in further view of Eronen (EP 3343349).

Regarding claims 10 and 30, Grosche in view of Glaser does not explicitly teach wherein the one or more processors are further configured to set, based on the indication, an audio source distance threshold.

Eronen teaches wherein the one or more processors are further configured to set, based on the indication, an audio source distance threshold (Eronen ¶0006, “beyond the threshold distance”).
	
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of Eronen to improve the known device of Grosche in view of Glaser to achieve the predictable result of providing realistic audible sounds in multiple positions in virtual reality.

Regarding claim 13, Grosche in view of Glaser in further view of Eronen teaches wherein the one or more processors are further configured to: determine a first audio source distance threshold, and wherein the one or more processors are configured to select the at least one audio stream of the plurality of audio streams, further based on the first audio source distance threshold (Eronen ¶0008).

Regarding claims 16 and 36, Grosche in view of Glaser in further view of Eronen teaches wherein the one or more processors are further configured to: determine that a user is moving from one location to another location; and based on the determination that the user is moving from one location to another location, select at least one different audio stream of the plurality of audio streams (Eronen figures 3, 5 and ¶0006).

Claims 17-18 and 37-38 is/are rejected under 35 U.S.C. 103 as being unpatentable over Grosche (US 2019/0253826) in view of Glaser (US 2018/0088900) in further view of Leppanen (US 2018/0349088) in further view of Mindlin (US 10484811).

Regarding claims 17 and 37, Grosche in view of Glaser in further view of Leppanen teaches wherein a displayed world comprises a visual scene represented by video data captured by a camera (Leppanen ¶0005, “virtual or augmented reality view”), however does not explicitly teach wherein the device comprises an extended reality headset.

Mindlin teaches wherein the device comprises an extended reality headset (Mindlin figure 2A).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of Mindlin to improve the known device of Grosche in view of Glaser in further view of Leppanen to achieve the predictable result of combining virtual and augmented reality to achieve a combination reality experience.

Regarding claims 18 and 38, Grosche in view of Glaser in further view of Leppanen in further view of Mindlin teaches wherein the device comprises an extended reality headset (Mindlin figure 2A), and wherein a displayed world comprises a virtual world (Leppanen ¶0005, “virtual or augmented reality view”).

Claim 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Grosche (US 2019/0253826) in view of Glaser (US 2018/0088900) in further view of Eronen (EP 3343349) in further view of Murtaza (WO 2019/072984).

Regarding claim 14, Grosche in view of Glaser in further view of Eronen does not explicitly teach wherein the one or more processors are further configured to: determine a second audio source distance threshold, and wherein the one or more processors are configured to select the at least one audio stream of the plurality of audio streams, further based on the second audio source distance threshold.

Murtaza teaches determine a second audio source distance threshold, and wherein the one or more processors are configured to select the at least one audio stream of the plurality of audio streams, further based on the second audio source distance threshold (Murtaza figure 3).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of Murtaza to improve the known device of Grosche in view of Glaser in further view of Eronen to achieve the predictable result of accurately determining the position of a user to provide optimal directionality.

Claims 15, and 31-35 is/are rejected under 35 U.S.C. 103 as being unpatentable over Grosche (US 2019/0253826) in view of Glaser (US 2018/0088900) in further view of Eronen (EP 3343349) in further view of Murtaza (WO 2019/072984) in further view of Leppanen (US 2018/0349088).

Regarding claims 15 and 35, Grosche in view of Glaser in further view of Eronen in further view of Murtaza does not explicitly teach wherein the one or more processors are configured to combine the two audio streams by applying a function F(x) to the two audio streams.

Leppanen teaches wherein the one or more processors are configured to combine the two audio streams by applying a function F(x) to the two audio streams (Leppanen figure 1 and ¶0054).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of Leppanen to improve the known device of Grosche in view of Glaser in further view of Eronen in further view of Murtaza to achieve the predictable result of user-friendly way for controlling sound sources in the virtual environment.

Regarding claim 31, Grosche in view of Glaser in further view of Eronen in further view of Murtaza in further view of Leppanen teaches further comprising obtaining, by the one or more processors via the user interface, an indication from the user that the 

Regarding claim 32, Grosche in view of Glaser in further view of Eronen in further view of Murtaza in further view of Leppanen teaches wherein the snapping mode is a hard snapping mode or a soft snapping mode (Leppanen ¶0098, “smartphone…digital camera”).

Regarding claim 33, Grosche in view of Glaser in further view of Eronen in further view of Murtaza in further view of Leppanen teaches determining, by the one or more processors, a first audio source distance threshold, wherein the selecting the at least one audio stream of the plurality of audio streams is further based on the first audio source distance threshold (Eronen ¶0008).

Regarding claim 34, Grosche in view of Glaser in further view of Eronen in further view of Murtaza in further view of Leppanen teaches determining, by the one or more processors via the user interface, a second audio source distance threshold, wherein the selecting the at least one audio stream of the plurality of audio streams is further based on the second audio source distance threshold (Eronen figure 5 and ¶0008, each angle of the circumference of circle 512 can be considered a threshold).

Response to Arguments
Applicant's arguments filed 10/28/2021 have been fully considered but they are not persuasive. Arguments relating to the cited references not teaching the amended claims are moot because the arguments do not apply to the new grounds of rejection.
Applicant argues on pages 13-14, cited reference Leppanen does not teach the limitation “snapping mode,” “hard snapping mode,” and “soft snapping mode” because the specification discusses what the snapping mode is. Examiner respectfully disagrees. In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., snapping mode, hard snapping mode, and soft snapping mode maybe be defined as a transform of spatial coordinates for audio rendering) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). In addition, the language of the specification states that the snapping mode “may be defined,” as said transform of spatial coordinates, but with BRI, the camera of a smartphone may also be considered the snapping mode because the specification does not specifically define the Snapping mode “is” a transform of spatial coordinates.
Applicant’s argument on pages 14-15 of Remarks that cited reference Eronen does not teach “a second audio source distance threshold,” are moot in view of the new grounds of rejection. Therefore, the arguments are not persuasive and the claims stand rejected.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NORMAN YU whose telephone number is (571)270-7436.  The examiner can normally be reached on Mon - Fri 11am-7pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ahmad Matar can be reached on 571-272-7488.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Any response to this action should be mailed to:
                        Commissioner of Patents and Trademarks
                        P.O. Box 1450
                        Alexandria, Va.  22313-1450
        Or faxed to:
                    (571) 273-8300, for formal communications intended for entry and for 
                     informal or draft communications, please label “PROPOSED” or “DRAFT”.
                                Hand-delivered responses should be brought to: 

                         Customer Service Window 
                         Randolph Building 
                         401 Dulany Street 
                         Arlington, VA 22314

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/NORMAN YU/Primary Examiner, Art Unit 2652