Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION

Claim Rejections - 35 USC § 112
Applicant’s amendments to Claims 1-6 filed 6/27/22 suffice to obviate the 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, rejections thereof.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-3, 7, 8 rejected under 35 U.S.C. 103 as being unpatentable over Schreiner: 20100014692 hereinafter Sch and further in view of Wiser: 20050240395.

Regarding claim 1
Sch teaches:
An audio decoding method, comprising: 
receiving an encoded bitstream, generated by an audio encoder (Sch: ¶ 13, 59, 87-91: a set of audio objects downmixed to a multichannel signal and transmitted to a user such as using a Dolby, and/or MPEG, etc., type codec, wherein the signal comprises a multichannel audio data and further comprises audio metadata directive of loudness and dynamic range control thereon),
the encoded bitstream including encoded multi-channel input audio data and processing state metadata including a loudness value (Sch: ¶ 49-53, 59, 76, 87-90, 125-152; Fig 9, 10: multichannel audio signal further comprising a plurality of metadata parameters for gain, compression, etc.; such as well-known Dolby metadata functionalities of Dialog Normalization, and Dynamic Range Control i.e. a matrix of values indicative of loudness processing such as that of fig 9); 
decoding the encoded multi-channel input audio data (Sch: ¶ 125-152; Fig 9, 10: a decoder is implemented for output of the received downmixed multichannel audio in concert with received metadata parameters) to obtain decoded multi- channel audio data (Sch: ¶ 130-152; Fig 10: multi-channel audio decoded in concert with metadata parameters directive of audio output characteristics);
receiving signaling data indicating how particular processing on the decoded multi-channel audio data directs the system to render or not render particular channel, objects, etc. (Sch: ¶ 105-111, 125-152; Fig 4, 9, 10: system receives rendering data directive of the implementation of implement user directed processing modes instead of processing the objects, channels, etc. in concert with the metadata, rendering matrix values direct whether or not to perform particular processing upon a particular object, channel, etc. and thereby control loudness values of the objects, channels, etc. from particular speakers, i.e. in fig 9 a first object, channel, etc. is rendered with a maximum loudness in the left speaker and no loudness in the right speaker, whereas object, channel 6 bears metadata directive of the system to perform no rendering by indicating no loudness in either speaker, as such, transmitted metadata parameters directs adjusting of the audio components relevant to the parameters for at least the delivery of objects, channels, etc. based on particular speaker loudness values);
when said signaling data indicates to perform the loudness processing on the decoded multi-channel audio data (Sch: ¶ 105-111, 125-152; Fig 4, 9, 10: the presence of mode data and or rendering data specifying loudness processing is used to output the audio signal in concert with metadata processing): 
obtaining the loudness value from the processing state metadata (Sch: ¶ 44-55; 125-152; Fig 9, 10: the system reads well known or extended metadata including metadata borne gain values, dialogue normalization values, dynamic range control values, etc.); and 
normalizing loudness of said decoded multi-channel audio data according to the loudness value, to provide output audio data (Sch: ¶ 13, 59, 87-91, 117-152, 160; Fig 7, 9-11: system normalizes dialog, loudness, etc. gain of a plurality of channels based on read metadata such as the well-known audio metadata directive of loudness and dynamic range control thereon such as by using an object matrix bearing an audio signal power value suitable to direct output for each/any  of a plurality of objects including loudness, normalization, range, etc. parameters).

To be sure the Sch specification largely deals with object type processing rather than multiple channel processing, however Sch teaches that the discussed metadata is well known to operate upon multiple channel data as well (see at least Sch: ¶ 13, 59, 87-90, 91, 105-111, 125-152; Fig 4, 9, 10) Furthermore,  Sch strongly suggests but does not explicitly teach signaling data explicitly directive of whether loudness processing is or is not performed on the decoded multi-channel audio data (Sch: ¶ 105-111, 125-152; Fig 4, 9, 10: absences of processing metadata and user directed processing data causes the relevant processing to not be performed).

As such, Sch strongly suggests but does not explicitly teach signaling data directive of whether loudness processing is not performed on the decoded audio data such that when said signaling data indicates not to perform the loudness processing on the decoded multi-channel audio data: disabling performing the loudness processing.

In a related field of endeavor Wiser teaches an system and method for decoding of an audio stream (Wiser: Abstract; ¶ 5) in concert with signaling data in the form of a processing profile (Wiser: ¶ 45, 50-52; Fig 4, 6: metadata directive of particular audio processing, parameters thereof including gain, loudness parameters as well as dynamic range parameters); said parameters including a bypass parameter, wherein the bypass parameter whether processing according to data stored in the profile is to be performed or bypassed altogether (Wiser: ¶ 45, 50-52; Fig 4, 6).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to utilize the Wiser taught bypass metadata in concert with the Sch taught decoding system and method. The average skilled practitioner would have been motivated to do so for at least the purpose of conforming audio output processing to capabilities of a particular output device(s), to the preferences of a particular user, allowing a downstream user to disable processing directed by an upstream user or device, etc. and would have expected only predictable results therefrom.

Regarding claim 2
Sch in view of Wiser teaches or suggests:
An audio decoding system and method, wherein said processing state metadata further indicates whether or not to perform dynamic range processing (Sch: ¶ 130-152; Fig 10: system optionally receives user mode control data which determines a user decision to implement user directed processing modes or processing the objects in concert with the metadata, the metadata including dialogue normalization dynamic range control, volume control, etc.); (Wiser: ¶ 45, 50-52; Fig 4, 6: metadata directive of particular audio processing, parameters thereof including gain, loudness parameters as well as dynamic range parameters); and said method further comprising
when said processing state metadata indicates to perform the dynamic range processing   (Sch: ¶ 130-152; Fig 10: such as when the appropriate metadata parameters are present): 
obtain a dynamic range value from the processing state metadata (Sch: ¶ 44-55; 125-152; Fig 10: the system reads well known or extended metadata including metadata borne gain values, dialogue normalization values, dynamic range control values, etc. presence of a metadata value is considered to indicate that processing should be performed in concert therewith); (Wiser: ¶ 45, 50-52; Fig 4, 6);
normalize dynamic range of said decoded multi-channel audio data according to the dynamic range value (Sch: ¶ 44-55; 117, 118, 125-152; Fig 7, 9, 10: system normalizes dialog gain of a plurality of channels using an object matrix bearing an audio signal power value suitable to for each of a plurality of objects, channels, etc.); (Wiser: ¶ 45, 50-52; Fig 4, 6). The claim is considered obvious over Sch as modified by Wiser as addressed in the base claim as it would have been obvious to apply the further teaching of Sch and/or Wiser to the modified device of Sch and Wiser.

Regarding claim 3
Sch in view of Wiser teaches or suggests:
An audio decoding system and method, wherein said loudness value is computed on dialogue portions of the audio data (Sch: ¶ 44-55; 117, 118, 130-152; Fig 7, 10: system operates to normalize dialog (dialnorm) gain of an audio output signal). The claim is considered obvious over Sch as modified by John as addressed in the base claim as it would have been obvious to apply the further teaching of Sch to the modified device of Sch and John. The claim is considered obvious over Sch as modified by Wiser as addressed in the base claim as it would have been obvious to apply the further teaching of Sch and/or Wiser to the modified device of Sch and Wiser.

Regarding claim 7
Sch in view of Wiser teaches or suggests:
An audio decoding system and method, wherein the encoded bitstream includes signaling data (Sch: ¶ 2, 7-9, 105-111, 125-152; Fig 4, 9, 10: system receives rendering metadata directive of output processing along with audio streams), wherein the processing state metadata and the signaling data are embedded in one or more reserved fields of the metadata, or hidden with the multi-channel input audio data (Wiser: ¶ 45, 50-52; Fig 4, 6: bypass information comprises a reserved bypass metadata field).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to include the Wiser taught bypass information, field thereof within the metadata processing metadata transmitted by the Sch system and method. The average skilled practitioner would have been motivated to do so for at least the purpose of conforming audio output processing to capabilities of a particular output device(s), to the preferences of a particular user, allowing a downstream user to disable processing directed by an upstream user or device, etc. and would have expected only predictable results therefrom.

Regarding claim 8
Sch in view of Wiser teaches or suggests:
An audio decoding system and method wherein the signaling data includes a subset of processing state metadata, or a summary of processing state metadata. (Wiser: ¶ 45, 50-52, Fig 4, 6: bypass metadata comprises a subset of the processing state metadata). The claim is considered obvious over Sch as modified by Wiser as addressed in the base claim as it would have been obvious to apply the further teaching of Sch and/or Wiser to the modified device of Sch and Wiser.

Claims 4-6, 9 rejected under 35 U.S.C. 103 as being unpatentable over Schreiner: 20100014692 hereinafter Sch and further in view of Wiser: 20050240395 and further in view of Ishikawa: 20110182432 hereinafter Ish.
Regarding claim 4
Sch teaches:
An audio decoding system and method, comprising: a decoder configured to: receive an encoded bitstream generated by an audio encoder (Sch: ¶ 13, 59, 87-90, 91: a set of audio objects downmixed to a multichannel signal and transmitted to a user such as using a Dolby, and/or MPEG, etc., type codec, wherein the signal comprises a multichannel audio data and further comprises audio metadata directive of loudness and dynamic range control thereon), 
the encoded bitstream including encoded multi-channel input audio data and processing state metadata including a loudness value (Sch: ¶ 49-53, 59, 76, 87-90, 125-152; Fig 9, 10: multichannel audio signal further comprising a plurality of metadata parameters for gain, compression, etc.; such as well-known Dolby metadata functionalities of Dialog Normalization, and Dynamic Range Control);
decode said encoded multi-channel input audio data  (Sch: ¶ 125-152; Fig 9, 10: a decoder is implemented for output of the received downmixed multichannel audio in concert with received metadata parameters) to obtain decoded multi- channel audio data (Sch: ¶ 125-152; Fig 9, 10: multi-channel audio decoded in concert with metadata parameters directive of audio output characteristics); and 
a processing unit (Sch: 104: a processor operates to decode an audio signal in concert with audio metadata parameters) configured to: 
receiving signaling data indicating how particular processing on the decoded multi-channel audio data directs the system to render or not render particular channel, objects, etc. (Sch: ¶ 105-111, 125-152; Fig 4, 9, 10: system receives rendering data directive of the implementation of implement user directed processing modes instead of processing the objects, channels, etc. in concert with the metadata, rendering matrix values direct whether or not to perform particular processing upon a particular object, channel, etc. and thereby control loudness values of the objects, channels, etc. from particular speakers, i.e. in fig 9 a first object, channel, etc. is rendered with a maximum loudness in the left speaker and no loudness in the right speaker, whereas object, channel 6 bears metadata directive of the system to perform no rendering by indicating no loudness in either speaker, as such, transmitted metadata parameters directs adjusting of the audio components relevant to the parameters for at least the delivery of objects, channels, etc. based on particular speaker loudness values), and 
when said signaling data indicates to perform the loudness processing on the decoded audio data (Sch: ¶ 105-111, 125-152; Fig 4, 9, 10: the presence of mode data and or rendering data specifying loudness processing is used to output the audio signal in concert with metadata processing):
obtain the loudness value from the processing state metadata(Sch: ¶ 44-55; 130-152; Fig 10; the system reads well known or extended metadata including metadata borne gain values, dialogue normalization values, dynamic range control values, etc.); and
normalize loudness of said decoded multi-channel audio data according to the loudness value, to provide output audio data (Sch: ¶ 13, 59, 87-91, 117-152, 160; Fig 7, 9-11: system normalizes dialog, loudness, etc. gain of a plurality of channels based on read metadata such as the well-known audio metadata directive of loudness and dynamic range control thereon such as by using an object matrix bearing an audio signal power value suitable to direct output for each/any  of a plurality of objects including loudness, normalization, range, etc. parameters).

To be sure the Sch specification largely deals with object type processing rather than multiple channel processing, however Sch teaches that the discussed metadata is well known to operate upon multiple channel data as well (see at least Sch: ¶ 13, 59, 87-90, 91, 105-111, 125-152; Fig 4, 9, 10) Furthermore,  Sch strongly suggests but does not explicitly teach signaling data explicitly directive of whether loudness processing is or is not performed on the decoded multi-channel audio data (Sch: ¶ 105-111, 125-152; Fig 4, 9, 10: absences of processing metadata and user directed processing data causes the relevant processing to not be performed).

As such, Sch strongly suggests but does not explicitly teach signaling data directive of whether loudness processing is not performed on the decoded audio data such that when said signaling data indicates not to perform the loudness processing on the decoded multi-channel audio data: disabling performing the loudness processing.
 (Sch: ¶ 130-152; Fig 10: absences of processing metadata and user directed processing data causes the relevant processing to not be performed).

In a related field of endeavor Wiser teaches an system and method for decoding of an audio stream (Wiser: Abstract; ¶ 5) in concert with signaling data in the form of a processing profile (Wiser: ¶ 45, 50-52; Fig 4, 6: metadata directive of particular audio processing, parameters thereof including gain, loudness parameters as well as dynamic range parameters); said parameters including a bypass parameter, wherein the bypass parameter whether processing according to data stored in the profile is to be performed or bypassed altogether (Wiser: ¶ 45, 50-52; Fig 4, 6).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to utilize the Wiser taught bypass metadata in concert with the Sch taught decoding system and method. The average skilled practitioner would have been motivated to do so for at least the purpose of conforming audio output processing to capabilities of a particular output device(s), to the preferences of a particular user, allowing a downstream user to disable processing directed by an upstream user or device, etc. and would have expected only predictable results therefrom.

Sch in view of Wiser does not explicitly teach the system, method performed on a media processing unit comprising a post-processing unit operative of recited decoding tasks.

In a related field of endeavor consider Ish which teach the benefit of post processing for adjusting at least gain parameters of an output audio signal at particular frequency values in concert with audio metadata in the form of post processing parameter matrices (Ish: ¶ 184-201; Fig 11, etc.) It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to perform each/any of the Sch in view of Wiser taught operations using a post processor such as that of Ish. The average skilled practitioner would have been motivated to do so for the purpose of  implementing well known post processor audio management operations such as volume control, delay, channel mapping, equalization, dynamic range control, dialog normalization, sample rate conversion, surround effects, matrix decoding, etc. and would have expected predictable results therefrom. Further, Ish is considered merely exemplary of the variety of manners in which a postprocessor responds to extant metadata parameters to direct output gain of a plurality of audio channels, objects, etc., please see the art cited in the conclusion for further examples.

Regarding claim 5
Sch in view of wiser in view of Ish teaches or suggests:
An audio decoding system and method, wherein said processing state metadata further indicates whether dynamic range processing should be performed (Sch: ¶ 130-152; Fig 10: system optionally receives user mode control data which determines a user decision to implement user directed processing modes or processing the objects in concert with the metadata, the metadata including dialogue normalization dynamic range control, volume control, etc.); and wherein said post- processing unit is further configured to: 
when said processing state metadata indicates that dynamic range processing should be performed (Sch: ¶ 130-152; Fig 10: such as when the appropriate metadata parameters are present): 
obtain a dynamic range value from the processing state metadata (Sch: ¶ 44-55; 130-152; Fig 10; the system reads well known or extended metadata including metadata borne gain values, dialogue normalization values, dynamic range control values, etc.); ; (Wiser: ¶ 45, 50-52; Fig 4, 6);
normalize dynamic range of said decoded multi-channel audio data according to the dynamic range value (Sch: ¶ 44-55; 117, 118, 130-152; Fig 7, 10: system normalizes dialog gain of a plurality of channels using an object matrix bearing an audio signal power value suitable to for each of a plurality of objects); ; (Wiser: ¶ 45, 50-52; Fig 4, 6). The claim is considered obvious over Sch, Wiser and Ish as addressed in the base claim as it would have been obvious to apply the further teaching of Sch and/or Wiser to the modified device of Sch, Wiser and Ish.

Regarding claim 6
Sch in view of Wiser in view of Ish teaches or suggests:
An audio decoding system and method, wherein said loudness value is computed on dialogue portions of the audio data (Sch: ¶ 44-55; 117, 118, 125-152; Fig 4, 7, 9, 10: system operates to normalize dialog (dialnorm) gain of an audio output signal). The claim is considered obvious over Sch, Wiser and Ish as addressed in the base claim as it would have been obvious to apply the further teaching of Sch and/or Wiser to the modified device of Sch, Wiser and Ish.

Regarding claim 9
Sch in view of Wiser in view of  teaches or suggests:
An audio decoding system and method wherein the decoder is further configured to generate, based at least in part on the processing state metadata embedded imperceptibly in the audio signal (Sch: ¶ 105-111, 125-152; Fig 4, 9, 10); (Wiser: ¶ 45, 50-52; Fig 4, 6) but does not explicitly teach the signaling data using a reversible or irreversible data hiding technique. Examiner takes official notice that the transmission of metadata, such as in a header, footer, watermark, etc. was well known to comprise a reversible and/or irreversible data hiding technique and as such would have comprised an obvious inclusion. The average skilled practitioner would have been motivated to do so for the purpose of  transmitting metadata without impacting the quality of the audio payload and would have expected only predictable results therefrom.

Response to Arguments

Applicant’s amended claims in concert with arguments, see claims and remarks, filed 6/27/22, with respect to the rejection(s) of claim(s) 1-3 under 35 USC 102 over Schreiner and Johnston; and claims(s) 4-6 under 35 USC 103 over Schreiner, Johnston and Ishikawa have been fully considered and are persuasive. Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Schreiner and Wiser and Schreiner, Wiser and Ishikawa.

Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
20120131325 restriction metadata for dynamic range processing
20080080722 signaling for bypassing of processing directive metadata

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL C MCCORD whose telephone number is (571)270-3701. The examiner can normally be reached 730-630 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VIVIAN CHIN can be reached on 5712727848. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PAUL C MCCORD/Primary Examiner, Art Unit 2654