DETAILED ACTION
Claim Objections
The claims 2 and 11 are objected to because each claim should begin with a capital letter and ends with a period. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-7 and 9-18 are rejected under 35 U.S.C. 102 (a) (2) as being anticipated by U.S Pub. No. 2020/0394999 A1 to Levine et al. (hereinafter “Levine”).
Regarding claim 1, Levine discloses a method, whereby a plurality of training audio tracks is provided to a human sound mixer and responsive to the training audio tracks a plurality of individually processed training audio tracks is received from the human sound mixer (Abstract and paragraph [0004]- [0005]; please see a sound mixer including a human dialogue), the method comprising:
inputting the training audio tracks and the individually processed training audio tracks to a machine thereby training the machine (Abstract , paragraphs [0005] and [0045]; extracted metadata and content feature data may be stored/cataloged in a database/library or other data store 310 as an extracted sound mix and M&E mix dataset 315 that may be used to train and test one or more machine learning models used to derive M&E mixes from original sound mixes including domestic dialogue);
outputting from the trained machine a plurality of audio processing operations respectively emulating human audio processing of the training audio tracks (Abstract, paragraphs [0004], [0021]  [0037], [0040] and [0046]; dataset may comprise a subset of known sound mix inputs (e.g., extracted content feature data/metadata of original sound mix with domestic language dialogue track) and associated outputs (e.g., content feature data of M&E mix associated with sound mix)); and
storing in a record of a database the audio processing operations (Abstract, paragraphs [0022], [0037] and [0087]; record of original sound mixes and associated M&E mixes for movies 260-1 to 260-N (individually referred to as a movie 260) that may be stored in one or more databases 210).

Regarding claim 2, Levine discloses the method of claim 1, further comprising:
extracting a plurality of audio features of the training audio tracks;
storing the audio features of the training audio tracks in a record of the database (Abstract, paragraphs [0022], [0037] and [0087]; record of original sound mixes and associated M&E mixes for movies 260-1 to 260-N (individually referred to as a movie 260) that may be stored in one or more databases 210).

Regarding claim 3, Levine discloses the method of claim 2 further comprising:
inputting a plurality of original audio tracks to a trained machine; extracting a plurality of audio features from the original audio tracks (Abstract , paragraphs [0005] and [0045]; extracted metadata and content feature data may be stored/cataloged in a database/library or other data store 310 as an extracted sound mix and M&E mix dataset 315 that may be used to train and test one or more machine learning models used to derive M&E mixes from original sound mixes including domestic dialogue);
responsive to the extracted audio features of the original audio tracks, selecting a recommendation from the database for individual audio processing of the audio tracks; presenting the recommendation (Abstract and paragraphs [0038]-[0040]; receiving a sound mix including human dialogue; extracting metadata from the sound mix, where the extracted metadata categorizes the sound mix; extracting content feature data from the sound mix, the extracted content feature data including an identification of the human dialogue and instances or times the human dialogue occurs within the sound mix; automatically calculating, with a trained model, content feature data of a music and effects (M&E) sound mix using at least the extracted metadata and the extracted content feature data of the sound mix; and deriving the M&E sound mix using at least the calculated content feature data);
enabling processing of the original audio tracks according to the recommendation;
enabling mixing the processed audio tracks into a playable audio production; and enabling playing the audio production; wherein said selecting a recommendation from a database is responsive to a similarity metric between the extracted audio features of the original audio tracks and the extracted audio features of the training audio tracks (Abstract and paragraphs [0035]- [0040]; a composition playlist including a time code index specifying the order and playback times of the track files).

Regarding claim 4, Levine discloses the method of claim 1, whereby an audio mix is received from the human sound mixer of the individually processed training audio tracks, the method further comprising:
extracting an audio mix feature of the audio mix (Abstract, paragraphs [0005] and [0045]; extracted metadata and content feature data may be stored/cataloged in a database/library or other data store 310 as an extracted sound mix and M&E mix dataset 315 that may be used to train and test one or more machine learning models used to derive M&E mixes from original sound mixes including domestic dialogue);
storing the audio mix feature in a record of the data base (Abstract, paragraphs [0022], [0037] and [0087]; record of original sound mixes and associated M&E mixes for movies 260-1 to 260-N (individually referred to as a movie 260) that may be stored in one or more databases 210).

Regarding claim 5, Levine discloses the method of claim 4, further comprising:
receiving a target feature of the audio mix; further responsive to the audio mix feature of the audio mix stored in the database and the target feature of the audio mix, said selecting a recommendation for audio processing of the audio tracks (paragraphs [0068]; component 440 may use the extracted metadata to select the type of machine learned model that is applied to the extracted content feature data of input sound mix 101).

Regarding claim 6, Levine discloses the method of claim 3, further comprising:
inputting a target audio feature for processing the original audio tracks; further responsive to the target audio feature, said selecting a recommendation for audio processing of the original audio tracks (Abstract , paragraphs [0005] and [0045]; extracted metadata and content feature data may be stored/cataloged in a database/library or other data store 310 as an extracted sound mix and M&E mix dataset 315 that may be used to train and test one or more machine learning models used to derive M&E mixes from original sound mixes including domestic dialogue).

Regarding claim 7, Levine discloses the method of claim 3, further comprising:
inputting a tag describing an attribute of the audio tracks or of the playable audio production; and said selecting a recommendation further responsive to the tag (paragraphs [0035]; sound mix may comprise digital audio track files that are assembled into a DCP including the audio track files (including foreign dialogue), image track files, and a composition playlist including a time code index specifying the order and playback times of the track files).


Regarding claim 9, Levine discloses the method of claim 3, further comprising:
processing the original audio tracks according to the recommendation, producing thereby individually processed audio tracks; extracting an audio feature of the individually processed audio tracks; and refining said recommendation responsive to the extracted audio feature of the individually processed audio tracks and a target audio feature input (Abstract and paragraphs [0038]-[0040]; receiving a sound mix including human dialogue; extracting metadata from the sound mix, where the extracted metadata categorizes the sound mix; extracting content feature data from the sound mix, the extracted content feature data including an identification of the human dialogue and instances or times the human dialogue occurs within the sound mix; automatically calculating, with a trained model, content feature data of a music and effects (M&E) sound mix using at least the extracted metadata and the extracted content feature data of the sound mix; and deriving the M&E sound mix using at least the calculated content feature data).

Regarding claim 10, Levine discloses a system, whereby a plurality of training audio tracks is provided to a human sound mixer and responsive to the training audio tracks a plurality of individually processed training audio tracks is received from the human sound mixer (Abstract and paragraph [0004]- [0005]; please see a sound mixer including a human dialogue), the system comprising:
a machine configured to:
input the training audio tracks and the individually processed training audio tracks to produce a trained machine (Abstract , paragraphs [0005] and [0045]; extracted metadata and content feature data may be stored/cataloged in a database/library or other data store 310 as an extracted sound mix and M&E mix dataset 315 that may be used to train and test one or more machine learning models used to derive M&E mixes from original sound mixes including domestic dialogue);
output a plurality of audio processing operations respectively emulating human audio processing of the training audio tracks (Abstract, paragraphs [0004], [0021]  [0037], [0040] and [0046]; dataset may comprise a subset of known sound mix inputs (e.g., extracted content feature data/metadata of original sound mix with domestic language dialogue track) and associated outputs (e.g., content feature data of M&E mix associated with sound mix)); and
a database configured to store in a record the audio processing operations (Abstract, paragraphs [0022], [0037] and [0087]; record of original sound mixes and associated M&E mixes for movies 260-1 to 260-N (individually referred to as a movie 260) that may be stored in one or more databases 210).

Regarding claim 11, Levine discloses 1. The system of claim 10, further comprising;
a processor configured to extract audio features of the training audio tracks; wherein the audio features of the training audio tracks are storable in a record of the database (Abstract, paragraphs [0022], [0037] and [0087]; record of original sound mixes and associated M&E mixes for movies 260-1 to 260-N (individually referred to as a movie 260) that may be stored in one or more databases 210).

Regarding claim 12, Levine discloses the system of claim 11:
wherein the trained machine is configured to input a plurality of original audio tracks,
wherein a processor is configured to extract a plurality of audio features from the original audio tracks; wherein responsive to the extracted audio features of the original audio tracks, a recommendation is selected from a database for individual audio processing of the audio tracks (Abstract , paragraphs [0005] and [0045]; extracted metadata and content feature data may be stored/cataloged in a database/library or other data store 310 as an extracted sound mix and M&E mix dataset 315 that may be used to train and test one or more machine learning models used to derive M&E mixes from original sound mixes including domestic dialogue);
wherein the recommendation is selected from the database responsive to a similarity metric between the extracted audio features of the original audio tracks and the extracted audio features of the training audio tracks (Abstract and paragraphs [0035]- [0040]; a composition playlist including a time code index specifying the order and playback times of the track files).

Regarding claim 13, Levine discloses the system of claim 13, wherein the recommendation is selected based on a similarity between a target audio feature and at least one of the extracted audio features (Abstract and paragraphs [0035]- [0040]; a composition playlist including a time code index specifying the order and playback times of the track files).

Regarding claim 14, Levine discloses the system of claim 10,
whereby an audio mix is received from the human sound mixer of the individually processed audio tracks, wherein an audio mix feature is received or extracted from the received audio mix and the audio mix feature is stored in a record of the database (Abstract, paragraphs [0022], [0037] and [0087]; record of original sound mixes and associated M&E mixes for movies 260-1 to 260-N (individually referred to as a movie 260) that may be stored in one or more databases 210).

Regarding claim 15, Levine discloses the system of claim 14, wherein the recommendation is selected based on a similarity between a target feature of the audio mix and the audio mix feature of the audio mix stored in the database (paragraphs [0038] and [0042]; a presence of music that matches an established reference library, vocal song music cues, a stem file including music data).

Regarding claim 16, Levine discloses the system of claim 12, further comprising a user interface including:
a visual representation of the original audio tracks; a presentation of the recommendation; a mechanism for processing individually the audio tracks according to the recommendation into a playable audio production; and an option for playing the audio production (paragraphs [0035]; digital audio track files that are assembled into a DCP including the audio track files (including foreign dialogue), image track files, and a composition playlist including a time code index specifying the order and playback times of the track files).

Regarding claim 17, Levine discloses the system of claim 16, wherein the user interface further includes:
a mechanism for receiving a target audio feature for processing at least one of the original audio tracks; and a mechanism for selecting a recommendation responsive to the target audio feature (Abstract and paragraphs [0068]; component 440 may use the extracted metadata to select the type of machine learned model that is applied to the extracted content feature data of input sound mix 101).

Regarding claim 18, Levine discloses the system of claim 16, wherein the user interface further includes:
a mechanism for receiving a target feature of the audio mix; a mechanism for selecting a recommendation responsive to the audio mix feature of the audio mix stored in the database and the target feature of the audio mix, wherein the recommendation includes a selection for audio processing of the audio tracks and for mixing the processed audio tracks; and  a mechanism for presenting the recommendation and enabling the audio processing of the audio tracks and mixing the audio tracks according to the recommendation (Abstract and paragraphs [0038]-[0040]; receiving a sound mix including human dialogue; extracting metadata from the sound mix, where the extracted metadata categorizes the sound mix; extracting content feature data from the sound mix, the extracted content feature data including an identification of the human dialogue and instances or times the human dialogue occurs within the sound mix; automatically calculating, with a trained model, content feature data of a music and effects (M&E) sound mix using at least the extracted metadata and the extracted content feature data of the sound mix; and deriving the M&E sound mix using at least the calculated content feature data).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over U.S Pub. No. 2020/0394999 A1 to Levine et al. (hereinafter “Levine”) in view of U.S Pub. No. 2016/0371051 A1 to ROWE et al. (hereinafter “ROWE”).
Regarding claim 8, Levine does not teach the method of claim 3, further comprising, prior to inputting the original audio tracks to the trained machine:
pre-processing the original audio tracks by a short time Fourier transform (STFT) or by converting into Mel Frequency Cepstral Coefficients (MFCC).
ROWE discloses pre-processing the original audio tracks by a short time Fourier transform (STFT) (paragraphs [0033] and [0038]; audio signal analyzer 130 may be configured to perform a Fast Fourier Transform (FFT)).
At the time of the effective filing date of the invention, it would have been obvious to a person of ordinary skilled in the art to modify Levine’s teaching with a feature of pre-processing the original audio tracks by a short time Fourier transform (STFT) or by converting into Mel Frequency Cepstral Coefficients (MFCC) as taught by ROWE in order to transform the audio signal from the time domain into a frequency domain (paragraph [0033]).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AKELAW A TESHALE whose telephone number is (571)270-5302. The examiner can normally be reached 9 am -6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fan Tsang can be reached on (571)272-7547. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

AKELAW TESHALE
Primary Examiner
Art Unit 2653



/AKELAW TESHALE/Primary Examiner, Art Unit 2653