Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. KR10-2018-0077318 1 , filed on  07/03/2018.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/15/2021, 10/25/2021, and 05/10/2022 has been considered by the examiner.
Drawings
The drawing submitted on 12/22/2020 is been considered by the examiner.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-3, 5, 9-13, and 15, are rejected under 35 U.S.C. 102(a)(2) as being anticipated by John et al.(US 2019/0208317 A1,  based on provisional application filling date 12/28/2017 ).

Regarding Claims 1 and 11, John et al. teach: A device for outputting sound, the device comprising: a speaker (VR headset speaker) configured to output the sound ([0072] AR audio rendering system 865 may further dynamically adjust the mixing and playback of the audio signals, for example depending on a position and an orientation of a head of the user wearing a VR headset.); a memory storing one or more instructions ([0057] As illustrated in FIG. 7, the audio processing system 700 may include a processor 710, a memory 720, one or more acoustic sensors 730, an audio processing system 740, and an output device 750. [0058] Memory 720 (for example, non-transitory computer readable storage medium) stores, at least in part, instructions and data for execution by processor 710 and/or the audio processing system 740.); and a processor configured to execute the one or more instructions, wherein the processor is configured to execute the one or more instructions to control the speaker to, predict external sound to be received from an external environment ([0033] During the offline training, a model coefficient update process 425 is performed using features extracted from the combination of the target signal 412 and interference signal 414 to update parameters of the deep neural network 450 until the deep neural network 450 is optimized to make predictions consistent with the known content categories. In other words, the optimized deep neural network 450 can be used to produce separated audio signals that are the same as, or close to, the target signal 412 and the one or more interference signals 414 of known sound content categories.), variably adjust sound (time-varying filter) to be output from the speaker based on the predicted external sound, and output the adjusted sound ([0066] In real time, using the set of signal features 825 as input, the neural network 830 generates a set of time-varying filters 840A, 840B, 840C, and 840D for the specific time frame. Each filter corresponds to a pre-defined sound content category.  [0067] Each of the filters 840A-840D filters the converted audio signal 815 into a separated audio signal 845A, 845B, 845C, or 840D. Each of the separated audio signals 845A, 845B, 845C, or 840D includes an audio signal of a corresponding sound content category. [0069] In some embodiments, the output module 850 receives the separated audio signals 845A, 845B, 845C, and 845D for the corresponding sound content categories and may convert the separated audio signals 845A, 845B, 845C, and 845D from the frequency domain back to the time domain. In some embodiments, the output module 850 may output the separated audio signals 845A, 845B, 845C, and 845D to other systems or modules for further processing depending on the applications. [0070] The audio signals separated based on sound content categories (either as separate audio signal streams or channels of an audio signal stream) may be used for various applications such as reproducing a sound environment in VR or AR applications. For example, as shown in FIG. 8, output module 850 may output the separated audio signals to a VR reproduction system including a VR audio rendering system 865. [0072] For example, the VR audio rendering system 865 may mix and/or playback one or more of the separated audio signals 845A, 845B, 845C, and 845D based on the spatial information of sound sources, such that the VR audio rendering system 865 recreates a sound environment (also referred to as sound stage) that is the same as, or similar to, the actual sound environment including the original sound sources. Using the spatial information of the sound source of 845 (e.g. a person speaking), AR audio rendering system 865 may further dynamically adjust the mixing and playback of the audio signals, for example depending on a position and an orientation of a head of the user wearing a VR headset.).

Regarding Claims 2 and 12, John et al. teach: The device of claim 1, wherein the processor is further configured to variably adjust the sound to be output by separating at least a portion of the predicted external sound and the sound to be output from each other in at least one band of frequency and time (See rejection of claim 1 and [0066] In real time, using the set of signal features 825 as input, the neural network 830 generates a set of time-varying filters 840A, 840B, 840C, and 840D for the specific time frame. Each filter corresponds to a pre-defined sound content category. [0068] More particularly, in some embodiments, each of the filters 840A-840D is a real-valued (or alternatively, complex-valued) function (also referred to as masking function) of frequency for a specific time frame, where each frequency bin (e.g., a frequency range) has a value from 0 to 1. Thus, each of the filters 840A-840D filters the converted audio signal 815 in the frequency domain by multiplying the converted audio signal 815 by the masking function. A portion of the audio signal is attenuated at frequency points where the value of the masking function is less than 1. For example, a value of zero of the masking function mutes a portion of the audio signal at a corresponding frequency point. In other words, sound in frequency points where the masking function is equal to 0 is inaudible in a reconstructed output signal filtered by the masking function.).

Regarding Claims 3 and 13, John et al. teach:  The device of claim 2, wherein the processor is further configured to set at least one frequency band for filtering the sound to be output and control the speaker to output filtered sound based on the at least one frequency band, wherein the at least one frequency band is dynamically adjusted in a time band (See rejection of claim 2 and specifically [0066] In real time, using the set of signal features 825 as input, the neural network 830 generates a set of time-varying filters 840A, 840B, 840C, and 840D for the specific time frame. Each filter corresponds to a pre-defined sound content category. [0068] …each of the filters 840A-840D is a real-valued (or alternatively, complex-valued) function (also referred to as masking function) of frequency for a specific time frame, where each frequency bin (e.g., a frequency range) has a value from 0 to 1. Thus, each of the filters 840A-840D filters the converted audio signal 815 in the frequency domain by multiplying the converted audio signal 815 by the masking function. A portion of the audio signal is attenuated at frequency points where the value of the masking function is less than 1. [0072]. Using the spatial information of the sound source of 845 (e.g. a person speaking), AR audio rendering system 865 may further dynamically adjust the mixing and playback of the audio signals, for example depending on a position and an orientation of a head of the user wearing a VR headset.).

Regarding Claims 5 and 15, John et al. teach: The device of claim 2, wherein the sound to be output comprises music sound comprising at least one musical instrument component, wherein the processor is further configured to adjust, from among the at least one musical instrument component, at least one musical instrument component in at least one band of frequency and time (See rejection of claim 3 and [0016] To illustrate certain aspects of some embodiments, consider a scenario where a captured audio signal contains a conversation of two people talking near a jazz trio at an outdoor cafe. BSS attempts to separate all sources, including both talkers, each instrument of the jazz trio and any prominent ambient sound sources nearby. By contrast, the disclosed technology is capable of separating the speech content (e.g., from both talkers) from the music content (e.g., from the entire jazz trio) and from other ambient sounds. [0042] At step 525, the neural network of the audio separation system generates a plurality of time-varying filters in a frequency domain using the signal features as inputs of the neural network. [0043] At step 530, the audio separation system separates the audio signal into a plurality of category specific audio signals by applying the time-varying filters to the audio signal. Each of the category specific audio signals contains content of a corresponding sound content category among the plurality of sound content categories. In some embodiments, the category specific audio signals are produced by multiplying the audio signal by the time-varying real-valued functions. [0056] More particularly, in some embodiments, each of the filters 740A-740D is a real-valued (or alternatively, complex-valued) function (also referred to as masking function) of frequency for a specific time frame, where each frequency bin (e.g., a frequency range) has a value from 0 to 1. Thus, each of the filters 740A-740D filters the converted audio signal 715 in the frequency domain by multiplying the converted audio signal 715 by the masking function. A portion of the audio signal is attenuated at frequency points where the value of the masking function is less than 1. For example, a value of zero of the masking function mutes a portion of the audio signal at a corresponding frequency point. In other words, sound in frequency points where the masking function is equal to 0 is inaudible in a reconstructed output signal filtered by the masking function.).

Regarding Claim 9, John et al. teach: The device of claim 1, wherein the sound to be output from the speaker comprises speech sound and music sound, and wherein the processor is further configured to separate at least a portion of the predicted external sound, the speech sound, and the music sound from one another in at least one band of frequency and time (See rejection of claim 1 and [0016] To illustrate certain aspects of some embodiments, consider a scenario where a captured audio signal contains a conversation of two people talking near a jazz trio at an outdoor cafe. BSS attempts to separate all sources, including both talkers, each instrument of the jazz trio and any prominent ambient sound sources nearby. By contrast, the disclosed technology is capable of separating the speech content (e.g., from both talkers) from the music content (e.g., from the entire jazz trio) and from other ambient sounds.).

Regarding Claim 10, John et al. teach: The device of claim 9, wherein the processor is further configured to separate at least a portion of the predicted external sound, the speech sound, and the music sound from one another in at least one band of frequency and time by applying different filters to the speech sound and the music sound from each other (See rejection of claim 5).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 4 and 14, are rejected under 35 U.S.C. 103 as being unpatentable over John et al. in view of Amir (US 2018/0350383 A1).

Regarding Claims 4 and 14, John et al. teach: The device of claim 3, wherein the processor is further configured to set the at least one frequency band for filtering the sound to be output (See rejection of claim 3 and [0040] In some embodiments, each of the time-varying filters is a time-varying real-valued function of frequency. A value of the real-valued function for a corresponding frequency represents a level of attenuation for the corresponding frequency or range of frequencies. For example, a value of 0.5 for a given frequency or frequency range would cause the signal amplitude for that frequency or frequency range to be reduced by half. [0068] …each of the filters 840A-840D is a real-valued (or alternatively, complex-valued) function (also referred to as masking function) of frequency for a specific time frame, where each frequency bin (e.g., a frequency range) has a value from 0 to 1. Thus, each of the filters 840A-840D filters the converted audio signal 815 in the frequency domain by multiplying the converted audio signal 815 by the masking function. A portion of the audio signal is attenuated at frequency points where the value of the masking function is less than 1.).
John et al. do not explicitly teach: “set the at least one frequency band for filtering the sound to be output by setting a number of the at least one frequency band and setting at least one of a width and a center frequency of the at least one frequency band”.
Amir teaches: set the at least one frequency band for filtering the sound to be output by setting a number of the at least one frequency band and setting at least one of a width and a center frequency of the at least one frequency band ([0042] Examples of the frequency response curve 200 for the filter 140 may be band-pass, having additional attenuation at frequencies above a certain level, as shown by the curve portion 220 in FIG. 2, or may be hi-pass and allow all practical frequencies above the cutoff frequency 210 to pass, as shown by the curve portion 230 in FIG. 2. [0043] The filter 140 is a dynamic filter and the cutoff frequency 210 is adjusted based upon a background noise in the environment 110. As illustrated in FIG. 2, the cutoff frequency 210 may be adjusted between a lower frequency 240 and a higher frequency 250. For some examples, the lower frequency may be about 100 Hz and the higher frequency may be about 200 Hz or about 250 Hz. In other examples, the lower frequency may be about 60 Hz or may otherwise be in a range from about 50 Hz to 120 Hz. Additionally, the higher frequency may be about 300 Hz in various examples and may otherwise be in a range from about 200 Hz to 400 Hz, or about 200 Hz to 600 Hz. Particular values for the lower frequency 240 and the higher frequency 250 may be application dependent, and may depend upon the nature of the environment 110, available materials and/or processing capability of the components used in the filter 140, and/or the purpose to which the output signal 160 will be applied. In some examples, a default value or range for the lower frequency 240 and the higher frequency 250 may be pre-configured and/or user selectable for specific applications, such as for being outside (e.g., wind noise), in a car (e.g., road noise), in a plane (e.g., engine noise), and the like. Note: Fig.2, shows the setting width of upper and lower cutoff frequency and the center frequency which is the arithmetic or geometric mean of the upper and lower cut-off frequency which is automatic based on the widths.).
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for John et al. to include the teaching of Amir above in order to adjust a dynamic filter based upon estimated background noise in the environment.
Claim(s) 6 is  rejected under 35 U.S.C. 103 as being unpatentable over John  in view of Gunasekara et al. (US 2019/0147853 A1).
Regarding Claim 6, John et al teach: The device of claim 1, predicted external sound (See rejection of claim 1).
John et al. do not explicitly teach: wherein the predicted external sound comprises a first speech to be made by a user, wherein the processor is further configured to obtain a second speech made by the user and predict the first speech based on the obtained second speech.
Gunasekara et al. teaches: wherein the predicted external sound comprises a first speech (predicted utterance) to be made by a user, wherein the processor is further configured to obtain a second speech made (receiving a set of utterances corresponding to at least one cluster corresponding to a language model) by the user and predict the first speech based on the obtained second speech ([0015] The illustrative embodiments provide a method, system, and computer program product. An embodiment of a method for predicting utterances in a dialog system includes receiving a set of utterances associated with a dialog between a client device and a dialog system, mapping the utterances to vector representations of the utterances, and identifying at least one cluster to which the utterances belong from among a plurality of possible clusters. The embodiment further includes predicting a next cluster based upon a conditional probability of the next cluster following a set of a predetermined number of previous clusters using a language model. The embodiment still further includes predicting a next utterance from among a plurality of possible utterances within the predicted next cluster. [0081] With reference to FIG. 6, this figure depicts a block diagram of a runtime process flow 600 in accordance with an illustrative embodiment. In the embodiment, a conversation 602 is initiated between a user and a dialog system in which one or more utterances are received by the runtime process in order to predict a next utterance using the trained language model 516 of FIG. 5 using an utterance prediction procedure. [0084] The embodiment further includes a cluster identification component 608 configured to determine the cluster to which the utterance vector belongs. A cluster prediction component 610 utilizes the identified cluster and language model 516 to predict a next cluster based upon the probabilities within language model 516. An utterance prediction component 612 is configured to predict the next utterance within the dialog based upon the predicted cluster.).
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for John et al. to include the teaching of Gunasekara et al. above in order to predict a user next utterance within the dialog based upon the utterance corresponding to the user past interaction with the dialog system.
Claim(s) 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over John et al. in view of Jackson et al. (US 10529358 B2).
Regarding Claim 7, John et al. teach: The device of claim 1, the predicted external sound (See rejection of claim 1).
John et al. do not teach: wherein the predicted external sound comprises first music sound, and wherein the processor is further configured to obtain second music sound from the external environment, identify already published music comprising the second music sound, based on the second music sound, and predict the first music sound, based on the identified music.
Jackson et al, teach:  wherein the predicted external sound comprises first music sound, and wherein the processor is further configured to obtain second music sound from the external environment, identify already published music comprising the second music sound, based on the second music sound, and predict the first music sound, based on the identified music (Col 5, lines 50-55, In one embodiment, a user is located in a noisy environment in which music is playing from loud speakers. The user wants to hear/discern a conversation-of-interest, but cannot do so because of the music. A complete solution to the problem is to directly (and substantially only) cancel the music.  Col 6, lines 42- 46, Embodiments of the current invention take advantage of knowing what noise (in the current example, background music) will be played/heard in the very near future—meaning usually a short time—from 1 to 20 milliseconds—before the noise reaches the user. Col 7, lines 3-9, This method entails sampling the music and sending the recorded sample to the cloud (using the shortest possible transmission time) and then analyzing the recorded sample, as known in the art. Following analysis, a sound file for the entire recording is retrieved—typically a complete song, which is then used by a noise cancelation device to cancel the noise (music). Col 7, lines 19-22, Stated differently, once identification of a recording has taken place, a noise cancellation device cancels the played music because the noise cancellation device has essentially accessed the sound signal in advance.).
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for John et al. to include the teaching of Jackson et al. above in order to access and cancel music in advance from conversation that to be the played next.

Regarding Claim 8:  The device of claim 7, wherein the processor is further configured to obtain a database (cloud) comprising information about at least one piece of music, match the information about at least one piece of music comprised in the database and the obtained second music sound with each other, and identify the published music, based on a result of the matching (see rejection of claim 7).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Verbeke et al.(US 2022/0246161 A1) teach: a method for modifying a sound included in an audio signal: determining, for each sound included in a plurality of sounds included in an audio signal, one or more classifications associated with the sound.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878. The examiner can normally be reached Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2656