Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-6 rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Claims 1, 4, recite decoding of encoded multi-channel input audio data to obtain decoded multi- channel audio data and the receipt of signaling data to determine if loudness processing should be performed on the decoded multi-channel audio data. The specification does not discuss multi-channel data beyond assertions that metadata includes “stereo mix,” and “stereo parameters,” as well as loudness values (see at least ¶ 31, 55, etc. of the instant PGPub: 20210280200). The specification does not discuss multi-channel audio or indeed audio channels generally beyond asserting that the metadata may contain information relevant to “one or more dialogue channels. (see at least ¶ 74, etc. of the instant PGPub: 20210280200). As such the recited multi-channel audio data must be reasonably considered new matter. Claims 2, 3, 5, 6 rejected at least for dependency from claims 1, 4. Furthermore the parent and provisional applications of the instant application contain no greater reference to the recited multi-channel input and decoding and as such may qualify as prior art. Appropriate correction is required.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3 rejected under 35 U.S.C. 103 as being unpatentable over Schreiner: 20100014692 hereinafter Sch and further in view of Johnston: 8908874: hereinafter John.

Regarding claim 1
Sch teaches:
An audio decoding method, comprising: 
receiving an encoded bitstream, generated by an audio encoder (Sch: ¶ 13, 59, 87-91: a set of audio objects downmixed to a multichannel signal and transmitted to a user such as using a Dolby, and/or MPEG, etc., type codec, wherein the signal comprises a multichannel audio data and further comprises audio metadata directive of loudness and dynamic range control thereon),
the encoded bitstream including encoded multi-channel input audio data and processing state metadata including a loudness value (Sch: ¶ 49-53, 59, 76, 87-90, 130-152; Fig 10: multichannel audio signal further comprising a plurality of metadata parameters for gain, compression, etc.; such as well-known Dolby metadata functionalities of Dialog Normalization, and Dynamic Range Control); 
decoding the encoded multi-channel input audio data (Sch: ¶ 130-152; Fig 10: a decoder is implemented for output of the received downmixed multichannel audio in concert with received metadata parameters) to obtain decoded multi- channel audio data (Sch: ¶ 130-152; Fig 10: mult9i-channel audio decoded in concert with metadata parameters directive of audio characteristics is output from dry/wet controller);
receiving signaling data indicating whether loudness processing should be performed on the decoded multi-channel audio data (Sch: ¶ 130-152; Fig 10: system optionally receives user mode control data which determines a user decision to implement user directed processing modes instead of processing the objects, channels, etc. in concert with the metadata, further in the absences of both processing metadata and user directed processing data the relevant processing is not performed; that is, metadata parameters transmitted directs adjusting of the audio components relevant to the parameters and/or user signaling data directs loudness processing thereof) or not performed on the decoded multi-channel audio data (Sch: ¶ 130-152; Fig 10: e.g. by employ or instantiation of other modes which do not require loudness, normalization, etc. processing, such as a mode for location of channels in diverse positions); 
when said signaling data indicates that loudness processing should be performed on the decoded multi-channel audio data (Sch: ¶ 130-152; Fig 10: the presence of mode data specifying loudness processing, or a dry/wet signal is used to output the metadata processing of the multichannel audio or in the absence of user determined mode data the presence of loudness or dynamic range control metadata relevant to a particular channel, object, etc.; components with no transmitted metadata remain unchanged): 
obtaining the loudness value from the processing state metadata (Sch: ¶ 44-55; 130-152; Fig 10: the system reads well known or extended metadata including metadata borne gain values, dialogue normalization values, dynamic range control values, etc.); and 
normalizing loudness of said decoded multi-channel audio data according to the loudness value, to provide output audio data (Sch: ¶ 13, 59, 87-91, 117, 118, 160; Fig 7, 10, 11: system normalizes dialog, loudness, etc. gain of a plurality of channels based on read metadata such as the well-known audio metadata directive of loudness and dynamic range control thereon such as by using an object matrix bearing an audio signal power value suitable to for each of a plurality of objects including normalization or under direction of a dialnorm parameter).
To be sure the Sch specification largely deals with object type processing rather than multiple channel processing, however Sch teaches that the discussed metadata is well known to operate upon multiple channel data as well (see at least Sch: ¶ 13, 59, 87-90, 91) Furthermore,  strongly suggests but does not explicitly teach signaling data explicitly directive of whether loudness processing is or is not performed on the decoded multi-channel audio data (Sch: ¶ 130-152; Fig 10: absences of processing metadata and user directed processing data causes the relevant processing to not be performed). As such Examiner takes official notice that it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to utilize the metadata and signaling discussed by Sch for direction of loudness control and normalization on the Sch disclosed multi-channel audio data, such as when decoding a Dolby or MPEG type encoded audio signal. The average skilled practitioner would have been motivated to do so for the purpose of allowing data driven or user driven metadata control over output of multichannel audio signals and would have expected only predictable results therefrom.

As such, Sch strongly suggests but does not explicitly teach signaling data directive of whether loudness processing is not performed on the decoded audio data (Sch: ¶ 130-152; Fig 10: absences of processing metadata and user directed processing data causes the relevant processing to not be performed).

In a related field of endeavor John teaches a system for streaming and reproduction multi-channel audio wherein the system receives and decodes an encoded bitstream (John: Abstract; Fig 13), the decoding including decoding audio input data and decoding metadata (John: Col 23: 38-24:12; Fig 13; Claim 26: metadata demultiplexed from an audio stream is read to determine processing parameters of the audio) and wherein the metadata comprises a ‘direct rendering’ flag which when asserted instructs the decoder to output a particular audio channel, stream, etc. without performing processing on said channel, stream, etc. (John: Col 9:30-10:16; Claim 26). It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to incorporate a direct rendering type of metadata such as that described by John within the Sch taught media decoding system. The average skilled practitioner would have been motivated to do so for the purpose of allowing an upstream user to explicitly disallow any particular downstream processing of an audio channel, stream, etc. which the upstream user cares to specify. That is, an upstream user would seek a manner in which to explicitly disallow processing on the part of a downstream user such as any of the John: Table 1 processing (e.g. Reverberation; Panning and/or Volume control of channel coefficients, etc.), or indeed any well-known processing (e.g. Equalization, Dynamics control, etc.) enabled by the particular coding system and would expect predictable results therefrom.

Regarding claim 2
Sch in view of John teaches or suggests:
An audio decoding system and method, wherein said processing state metadata further indicates whether dynamic range processing should be performed (Sch: ¶ 130-152; Fig 10: system optionally receives user mode control data which determines a user decision to implement user directed processing modes or processing the objects in concert with the metadata, the metadata including dialogue normalization dynamic range control, volume control, etc.); and said method further comprising
when said processing state metadata indicates that dynamic range processing should be performed (Sch: ¶ 130-152; Fig 10: such as when the appropriate metadata parameters are present): 
obtain a dynamic range value from the processing state metadata (Sch: ¶ 44-55; 130-152; Fig 10: the system reads well known or extended metadata including metadata borne gain values, dialogue normalization values, dynamic range control values, etc. presence of a metadata value is considered to indicate that processing should be performed in concert therewith);
normalize dynamic range of said decoded multi-channel audio data according to the dynamic range value (Sch: ¶ 44-55; 117, 118, 130-152; Fig 7, 10: system normalizes dialog gain of a plurality of channels using an object matrix bearing an audio signal power value suitable to for each of a plurality of objects, channels, etc.). The claim is considered obvious over Sch as modified by John as addressed in the base claim as it would have been obvious to apply the further teaching of Sch to the modified device of Sch and John.

Regarding claim 3
Sch in view of John teaches or suggests:
An audio decoding system and method, wherein said loudness value is computed on dialogue portions of the audio data (Sch: ¶ 44-55; 117, 118, 130-152; Fig 7, 10: system operates to normalize dialog (dialnorm) gain of an audio output signal). The claim is considered obvious over Sch as modified by John as addressed in the base claim as it would have been obvious to apply the further teaching of Sch to the modified device of Sch and John.


Claims 4-6 rejected under 35 U.S.C. 103 as being unpatentable over Schreiner: 20100014692 hereinafter Sch and further in view of Johnston: 8908874: hereinafter John and further in view of Ishikawa: 20110182432 hereinafter Ish.
Regarding claim 4
Sch teaches:
An audio decoding system and method, comprising: a decoder configured to: receive an encoded bitstream generated by an audio encoder (Sch: ¶ 13, 59, 87-90, 91: a set of audio objects downmixed to a multichannel signal and transmitted to a user such as using a Dolby, and/or MPEG, etc., type codec, wherein the signal comprises a multichannel audio data and further comprises audio metadata directive of loudness and dynamic range control thereon), 
the encoded bitstream including encoded multi-channel input audio data and processing state metadata including a loudness value (Sch: ¶ 49-53, 59, 76, 87-90, 130-152; Fig 10: multichannel audio signal further comprising a plurality of metadata parameters for gain, compression, etc.; such as well-known Dolby metadata functionalities of Dialog Normalization, and Dynamic Range Control);
decode said encoded multi-channel input audio data  (Sch: ¶ 130-152; Fig 10: a decoder is implemented for output of the received downmixed multichannel audio in concert with received metadata parameters) to obtain decoded multi- channel audio data (Sch: ¶ 130-152; Fig 10: multi-channel audio decoded in concert with metadata parameters directive of audio characteristics is output from dry/wet controller); and 
a processing unit (Sch: 104: a processor operates to decode an audio signal in concert with audio metadata parameters) configured to: 
receive signaling data indicating whether loudness processing should be performed on the decoded multi-channel audio data  (Sch: ¶ 130-152; Fig 10: system optionally receives user mode control data which determines a user decision to implement user directed processing modes instead of processing the objects, channels, etc. in concert with the metadata, further in the absences of both processing metadata and user directed processing data the relevant processing is not performed; that is, metadata parameters transmitted directs adjusting of the audio components relevant to the parameters and/or user signaling data directs loudness processing thereof) or not performed on the decoded multi-channel audio data (Sch: ¶ 130-152; Fig 10: e.g. by employ or instantiation of other modes which do not require loudness, normalization, etc. processing, such as a mode for location of channels in diverse positions), and 
when said signaling data indicates that loudness processing should be performed on the decoded audio data (Sch: ¶ 130-152; Fig 10: the presence of mode data specifying loudness processing, or a dry/wet signal is used to output the metadata processing of the multichannel audio or in the absence of user determined mode data the presence of loudness or dynamic range control metadata relevant to a particular channel, object, etc.):
obtain the loudness value from the processing state metadata(Sch: ¶ 44-55; 130-152; Fig 10; the system reads well known or extended metadata including metadata borne gain values, dialogue normalization values, dynamic range control values, etc.); and
normalize loudness of said decoded multi-channel audio data according to the loudness value, to provide output audio data (Sch: ¶ 117, 118; Fig 7: system normalizes dialog gain of a plurality of channels using an object matrix bearing an audio signal power value suitable to for each of a plurality of objects).

To be sure the Sch specification largely deals with object type processing rather than multiple channel processing, however Sch teaches that the discussed metadata is well known to operate upon multiple channel data as well (see at least Sch: ¶ 13, 59, 87-90, 91) Furthermore,  strongly suggests but does not explicitly teach signaling data explicitly directive of whether loudness processing is or is not performed on the decoded multi-channel audio data (Sch: ¶ 130-152; Fig 10: absences of processing metadata and user directed processing data causes the relevant processing to not be performed). As such Examiner takes official notice that it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to utilize the metadata and signaling discussed by Sch for direction of loudness control and normalization on the Sch disclosed multi-channel audio data, such as when decoding a channels of a Dolby or MPEG type encoded audio signal. The average skilled practitioner would have been motivated to do so for the purpose of allowing data driven or user driven metadata control over output of multichannel audio signals and would have expected only predictable results therefrom.

As such, Sch strongly suggests but does not explicitly teach signaling data directive of whether loudness processing is not performed on the decoded audio data (Sch: ¶ 130-152; Fig 10: absences of processing metadata and user directed processing data causes the relevant processing to not be performed).

In a related field of endeavor John teaches a system for streaming and reproduction multi-channel audio wherein the system receives and decodes an encoded bitstream (John: Abstract; Fig 13), the decoding including decoding audio input data and decoding metadata (John: Col 23: 38-24:12; Fig 13; Claim 26: metadata demultiplexed from an audio stream is read to determine processing parameters of the audio) and wherein the metadata comprises a ‘direct rendering’ flag which when asserted instructs the decoder to output a particular audio channel, stream, etc. without performing processing on said channel, stream, etc. (John: Col 9:30-10:16; Claim 26). It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to incorporate a direct rendering type of metadata such as that described by John within the Sch taught media decoding system. The average skilled practitioner would have been motivated to do so for the purpose of allowing an upstream user to explicitly disallow any particular downstream processing of an audio channel, stream, etc. which the upstream user cares to specify. That is, an upstream user would seek a manner in which to explicitly disallow processing on the part of a downstream user such as any of the John: Table 1 processing (e.g. Reverberation; Panning and/or Volume control of channel coefficients, etc.), or indeed any well-known processing (e.g. Equalization, Dynamics control, etc.) enabled by the particular coding system and would expect predictable results therefrom.

Sch in view of John does not explicitly teach the system, method performed on a media processing unit comprising a post-processing unit operative of recited decoding tasks.

In a related field of endeavor consider Ish which teach the benefit of post processing for adjusting at least gain parameters of an output audio signal at particular frequency values in concert with audio metadata in the form of post processing parameter matrices (Ish: ¶ 184-201; Fig 11, etc.) It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to perform the Sch taught operations using a post processor such as that of Ish. The average skilled practitioner would have been motivated to do so for the purpose of  implementing well known post processor audio management operations such as volume control, delay, channel mapping, equalization, dynamic range control, dialog normalization, sample rate conversion, surround effects, matrix decoding, etc. and would have expected predictable results therefrom. Further, Ish is considered merely exemplary of the variety of manners in which a postprocessor responds to extant metadata parameters to direct output gain of a plurality of audio channels, objects, etc., please see the art cited in the conclusion for further examples.

Regarding claim 5
Sch in view of John in view of Ish teaches or suggests:
An audio decoding system and method, wherein said processing state metadata further indicates whether dynamic range processing should be performed (Sch: ¶ 130-152; Fig 10: system optionally receives user mode control data which determines a user decision to implement user directed processing modes or processing the objects in concert with the metadata, the metadata including dialogue normalization dynamic range control, volume control, etc.); and wherein said post- processing unit is further configured to: 
when said processing state metadata indicates that dynamic range processing should be performed (Sch: ¶ 130-152; Fig 10: such as when the appropriate metadata parameters are present): 
obtain a dynamic range value from the processing state metadata (Sch: ¶ 44-55; 130-152; Fig 10; the system reads well known or extended metadata including metadata borne gain values, dialogue normalization values, dynamic range control values, etc.);
normalize dynamic range of said decoded multi-channel audio data according to the dynamic range value (Sch: ¶ 44-55; 117, 118, 130-152; Fig 7, 10: system normalizes dialog gain of a plurality of channels using an object matrix bearing an audio signal power value suitable to for each of a plurality of objects). The claim is considered obvious over Sch, John and Ish as addressed in the base claim as it would have been obvious to apply the further teaching of Sch to the modified device of Sch, John and Ish.

Regarding claim 6
Sch in view of John in view of Ish teaches or suggests:
An audio decoding system and method, wherein said loudness value is computed on dialogue portions of the audio data (Sch: ¶ 44-55; 117, 118, 130-152; Fig 7, 10: system operates to normalize dialog (dialnorm) gain of an audio output signal). The claim is considered obvious over Sch, John and Ish as addressed in the base claim as it would have been obvious to apply the further teaching of Sch to the modified device of Sch, John and Ish.

Response to Arguments

Applicant’s amended claim in concert with arguments, see claims and remarks, filed 2/28/22, with respect to the rejection(s) of claim(s) 1-3 under 35 USC 102 over Schreiner and Johnston; and claims(s) 4-6 under 35 USC 103 over Schreiner, Johnston and Ishikawa have been fully considered and are not persuasive. Particularly, Applicant puts forth argument that loudness normalization is performed on the entire input audio data rather than on individual audio objects, as shown by Schreiner at block 25 in FIG. 11, which performs loudness normalization on the summed audio objects. Note, however, that there is no signaling disclosed or suggested by Schreiner that indicates whether block 25 is performed or not performed on the summed audio objects. 
Examiner considers this arguing irrelevant to the rejection presented supra in as much as the argued distinction is not recited in the claimed subject matter. Further, as discussed supra Schreiner  discusses that the presence within a stream of particular signaling data is considered directive of whether loudness processing should be performed on the decoded audio data. (see at least Sch: ¶ 13, 59, 87-90, 91, 130-135) 
Applicant further alleges that none of the metadata listed in Table 1 of Johnston is related to loudness processing and that the ordinary skilled practitioner in the art would not combine diffusion metadata audio object signaling for loudness processing of spatial audio objects at least because multi- channel audio and audio objects are two different audio technologies. Nor is there any motivation or suggestion by Schreiner or Johnston to combine multi-channel diffusion metadata and object signaling for loudness processing, or how such combination would be accomplished from a technical perspective.
Examiner respectfully disagrees, Johnston is not cited to clarify the nature of the processing, merely that a direct rendering flag is well-known and as such it would be obvious to utilize , combine or otherwise include a direct processing flag with the Schreiner taught processing (see at least Sch: ¶ 13, 59, 87-90, 91, 130-135) for at least the purpose of allowing an upstream user to explicitly disallow any particular downstream processing such as the gain, compression, level processing taught by Sch and/or the table 1 processing discussed by Johnston. Applicant’s arguments with regard to claim 1 are not considered persuasive and the arguments regarding claim 4 are similarly not persuasive and as such no claims are currently in condition for allowability. 

Conclusion


Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VIVIAN CHIN can be reached on 5712727848. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PAUL C MCCORD/Primary Examiner, Art Unit 2654