DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the claims at issue are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the reference application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/forms/. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to http://www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Instant Application
Patent No. 11,087,779
(Claim 1)
1. An information processing method comprising: preparing, for content that includes audio of a plurality of channels, a feature amount of the audio; and setting playback control data for controlling playback of the content in accordance with the feature amount of the audio, wherein the feature amount includes a plurality of elements depending on frequency characteristics in the respective channels.



(Claim 2)
2. The information processing method according to claim 1, wherein: a scene type of the content is identified from the feature amount of the audio in the content by use of an identification model representative of relations between feature amounts of audio and scene types, and the playback control data is set according to the identified scene type.


































(Claim 3)
3. An information processing method comprising: preparing, for content that includes audio of a plurality of channels, a feature amount of the audio; and setting playback control data for controlling playback of the content in accordance with the feature amount of the audio, wherein the feature amount includes a plurality of elements depending on signal intensities in the respective channels.




(Claim 4)
4. The information processing method according to claim 3, wherein: a scene type of the content is identified from the feature amount of the audio in the content by use of an identification model representative of relations between feature amounts of audio and scene types, and the playback control data is set according to the identified scene type.


































(Claim 5)
5. An information processing method comprising: preparing, for content that includes audio, a feature amount of the audio; and setting playback control data for controlling playback of the content in accordance with the feature amount of the audio, wherein: the feature amount includes a plurality of elements corresponding to different pitch classes, and each of the plurality of elements is set to have a value acquired by adding or averaging, across octaves, signal intensities of a band component corresponding to the corresponding pitch class in the audio signal.


(Claim 6)
6. The information processing method according to claim 5, wherein: a scene type of the content is identified from the feature amount of the audio in the content by use of an identification model representative of relations between feature amounts of audio and scene types, and the playback control data is set according to the identified scene type. 



(Claim 8)
8. A method for identifying a scene type, comprising: preparing, for content that includes video and audio, a feature amount of the audio in the content; and identifying the scene type of the content based on the feature amount of the audio; and wherein the feature amount is a vector including elements corresponding to respective channels representative of the audio in the content, and each of the elements is a numerical value depending on frequency characteristics of a channel that corresponds to the element from among the channels.

(Claims 9 and 11 below include the claimed limitations of Claim 2 of the Instant Application)
(Claim 9)
9. The method according to claim 8, wherein identifying a scene type includes identifying the scene type of the content from the feature amount of the audio in the content by use of an identification model representative of relations between feature amounts of audio and scene types.
(Claim 10)
10. The method according to claim 9, wherein the identification model is a statistical model that identifies a single scene type from the feature amount of the audio in the content.

(Claim 11)
11. The method according to claim 8, further comprising setting playback control data for controlling playback of the content based on the identified scene type.
(Claim 12)
12. The method according to claim 11, wherein the playback control data includes audio control data configured to control a sound field formed by the audio in the content.
(Claim 13)
13. The method according to claim 8, further comprising notifying a terminal apparatus of the identified scene type.
(Claim 14)
14. The method according to claim 8, wherein the scene type is a classification of a scene represented by the content.
 (Claim 8)
8. A method for identifying a scene type, comprising: preparing, for content that includes video and audio, a feature amount of the audio in the content; and identifying the scene type of the content based on the feature amount of the audio; and wherein the feature amount is a vector including elements corresponding to respective channels representative of the audio in the content, and each of the elements is a numerical value depending on frequency characteristics of a channel that corresponds to the element from among the channels.

(Claims 9 and 11 below include the claimed limitations of Claim 4 of the Instant Application)


(Claim 9)
9. The method according to claim 8, wherein identifying a scene type includes identifying the scene type of the content from the feature amount of the audio in the content by use of an identification model representative of relations between feature amounts of audio and scene types.
(Claim 10)
10. The method according to claim 9, wherein the identification model is a statistical model that identifies a single scene type from the feature amount of the audio in the content.

(Claim 11)
11. The method according to claim 8, further comprising setting playback control data for controlling playback of the content based on the identified scene type.
(Claim 12)
12. The method according to claim 11, wherein the playback control data includes audio control data configured to control a sound field formed by the audio in the content.

(Claim 13)
13. The method according to claim 8, further comprising notifying a terminal apparatus of the identified scene type.
(Claim 14)
14. The method according to claim 8, wherein the scene type is a classification of a scene represented by the content.
(Claim 8)
8. A method for identifying a scene type, comprising: preparing, for content that includes video and audio, a feature amount of the audio in the content; and identifying the scene type of the content based on the feature amount of the audio; and wherein the feature amount is a vector including elements corresponding to respective channels representative of the audio in the content, and each of the elements is a numerical value depending on frequency characteristics of a channel that corresponds to the element from among the channels.


(Claims 9 and 11 below include the claimed limitations of Claim 6 of the Instant Application)
(Claim 9)
9. The method according to claim 8, wherein identifying a scene type includes identifying the scene type of the content from the feature amount of the audio in the content by use of an identification model representative of relations between feature amounts of audio and scene types.
(Claim 10)
10. The method according to claim 9, wherein the identification model is a statistical model that identifies a single scene type from the feature amount of the audio in the content.

(Claim 11)
11. The method according to claim 8, further comprising setting playback control data for controlling playback of the content based on the identified scene type.
(Claim 12)
12. The method according to claim 11, wherein the playback control data includes audio control data configured to control a sound field formed by the audio in the content.
(Claim 13)
13. The method according to claim 8, further comprising notifying a terminal apparatus of the identified scene type.
(Claim 14)
14. The method according to claim 8, wherein the scene type is a classification of a scene represented by the content.


Claims 1 and 2 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 8, 9 and 11 of U.S. Patent No. 11,087,779, and further in view of Suzuki US Pub. No. 2012/0033933. 
Re claim 1, the conflicting claims are not patentably distinct from each other because every limitation of claim 1 of the Instant Application is found in claim 8 of the Patent No. 11,087,779, except the following limitation: “and setting playback control data for controlling playback of the content in accordance with the feature amount of the audio.”
However, the reference of Suzuki explicitly teaches “and setting playback control data for controlling playback of the content in accordance with the feature amount of the audio” (see figs. 43-44, 46 ¶s 785, 823, 839 for setting playback control data for controlling playback of the content in accordance with the feature amount of the audio (i.e. playback control unit 349 performs playback control for playing the digest content from the digest contents generating unit 348 as described in fig. 45 paragraph 840))
Therefore, taking the combined teachings of Patent No. 11,087,779 and Suzuki as a whole, it would have been obvious before the effective filing date of the claimed invention to incorporate this feature (playback) into the system of Patent No. 11,087,779 as taught by Suzuki.
One will be motivated to incorporate the above feature into the system of Patent No. 11,087,779 as taught by Suzuki for the benefit of having an audio feature amount extracting unit 351 that extracts the audio feature amount of each frame of the content of interest supplied from the contents selecting unit 71, supplies to the audio maximum likelihood state sequence estimating unit 352, wherein a digest contents generating unit 348 uses the highlight scene frame extracted from the frames of the content of interest to generate a digest content of the content of interest, supplies to the playback control unit 349, and a playback control unit 349 performs playback control for playing the digest content from the digest contents generating unit 348 in order to ease the processing time when performing playback control for playing the digest content (see figs. 45-46 ¶s 823, 839-840)
Re claim 2, the conflicting claims are not patentably distinct from each other because claim 2 of the Instant Application is recited in claims 9 and 11 of the Patent No. 11,087,779.
Claims 3 and 4 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 8, 9 and 11 of U.S. Patent No. 11,087,779, and further in view of Suzuki US Pub. No. 2012/0033933, and further in view of Wei US Patent No. 8,780,209. 
Re claim 3, the conflicting claims are not patentably distinct from each other because every limitation of claim 3 of the Instant Application is found in claim 8 of the Patent No. 11,087,779, except the following limitation: “and setting playback control data for controlling playback of the content in accordance with the feature amount of the audio, wherein the feature amount includes a plurality of elements depending on signal intensities in the respective channels.”
However, the reference of Suzuki explicitly teaches “and setting playback control data for controlling playback of the content in accordance with the feature amount of the audio” (see figs. 43-44, 46 ¶s 785, 823, 839 for setting playback control data for controlling playback of the content in accordance with the feature amount of the audio (i.e. playback control unit 349 performs playback control for playing the digest content from the digest contents generating unit 348 as described in fig. 45 paragraph 840))
Therefore, taking the combined teachings of Patent No. 11,087,779 and Suzuki as a whole, it would have been obvious before the effective filing date of the claimed invention to incorporate this feature (playback) into the system of Patent No. 11,087,779 as taught by Suzuki.
One will be motivated to incorporate the above feature into the system of Patent No. 11,087,779 as taught by Suzuki for the benefit of having an audio feature amount extracting unit 351 that extracts the audio feature amount of each frame of the content of interest supplied from the contents selecting unit 71, supplies to the audio maximum likelihood state sequence estimating unit 352, wherein a digest contents generating unit 348 uses the highlight scene frame extracted from the frames of the content of interest to generate a digest content of the content of interest, supplies to the playback control unit 349, and a playback control unit 349 performs playback control for playing the digest content from the digest contents generating unit 348 in order to ease the processing time when performing playback control for playing the digest content (see figs. 45-46 ¶s 823, 839-840)
On the other hand, the reference of Wei explicitly teaches “wherein the feature amount includes a plurality of elements depending on signal intensities in the respective channels” (see col. 8 lines 13-24, col. 15 lines 48-67, col. 16 lines 1-13 for the feature amount includes a plurality of elements depending on frequency characteristics in the respective channels (i.e. each of the feature extraction modules 102a, 102b, 102c, 102d extracts at least one characteristic feature of the input and output media signals 110, 112, 118 and 120 to produce a corresponding extracted feature signal 126a, 126b, 126c and 126d as described in fig. 1 col. 7 line 67, col. 8 lines 1-4, furthermore, the match confidence signal generator 849 receives the first and second extracted feature signals 826a and 826b generated by the first and second feature extraction modules 802a and 802b and generates a match confidence signal 856, the match confidence signal 856 represents the likelihood or probability that the first and second input media signals 810 and 818 "match" (i.e. they represent the same content), the match confidence signal generator 849 includes a cross correlation module 850 and a strength and consistency analyzer 852, the cross correlation module 850 performs cross correlation on the first and second extracted feature signals 826a and 826b generated by the first and second feature extraction modules 802a and 802b respectively, and outputs a cross-correlation signal 854, the strength and consistency analyzer 852 analyzes the cross-correlation signal 854 generated by the cross correlation module 850 and outputs the match confidence signal 856 as described in fig. 8 col. 16 line 14-34))
Therefore, taking the combined teachings of Patent No. 11,087,779 and Wei as a whole, it would have been obvious before the effective filing date of the claimed invention to incorporate this feature (signal) into the system of Patent No. 11,087,779 as taught by Wei.
One will be motivated to incorporate the above feature into the system of Patent No. 11,087,779 as taught by Wei for the benefit of having a match confidence signal generator 849 that includes a cross correlation module 850 and a strength and consistency analyzer 852, wherein the cross correlation module 850 performs cross correlation on the first and second extracted feature signals 826a and 826b generated by the first and second feature extraction modules 802a and 802b respectively, and outputs a cross-correlation signal 854, wherein the strength and consistency analyzer 852 analyzes the cross-correlation signal 854 generated by the cross correlation module 850 and outputs the match confidence signal 856 in order to improve efficiency when analyzing the cross-correlation signal 854 generated by the cross correlation module 850 and outputting the match confidence signal 856 (see fig. 8 col. 16 line 14-34)
Re claim 4, the conflicting claims are not patentably distinct from each other because claim 4 of the Instant Application is recited in claims 9 and 11 of the Patent No. 11,087,779.
Claims 5 and 6 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 8, 9 and 11 of U.S. Patent No. 11,087,779, and further in view of Suzuki US Pub. No. 2012/0033933, and further in view of Neuhauser et al. US Pub. No. US 2014/0180673.
Re claim 5, the conflicting claims are not patentably distinct from each other because every limitation of claim 5 of the Instant Application is found in claim 8 of the Patent No. 11,087,779, except the following limitation: “and setting playback control data for controlling playback of the content in accordance with the feature amount of the audio, wherein: the feature amount includes a plurality of elements corresponding to different pitch classes, and each of the plurality of elements is set to have a value acquired by adding or averaging, across octaves, signal intensities of a band component corresponding to the corresponding pitch class in the audio signal.”
However, the reference of Suzuki explicitly teaches “and setting playback control data for controlling playback of the content in accordance with the feature amount of the audio” (see figs. 43-44, 46 ¶s 785, 823, 839 for setting playback control data for controlling playback of the content in accordance with the feature amount of the audio (i.e. playback control unit 349 performs playback control for playing the digest content from the digest contents generating unit 348 as described in fig. 45 paragraph 840))
Therefore, taking the combined teachings of Patent No. 11,087,779 and Suzuki as a whole, it would have been obvious before the effective filing date of the claimed invention to incorporate this feature (playback) into the system of Patent No. 11,087,779 as taught by Suzuki.
One will be motivated to incorporate the above feature into the system of Patent No. 11,087,779 as taught by Suzuki for the benefit of having an audio feature amount extracting unit 351 that extracts the audio feature amount of each frame of the content of interest supplied from the contents selecting unit 71, supplies to the audio maximum likelihood state sequence estimating unit 352, wherein a digest contents generating unit 348 uses the highlight scene frame extracted from the frames of the content of interest to generate a digest content of the content of interest, supplies to the playback control unit 349, and a playback control unit 349 performs playback control for playing the digest content from the digest contents generating unit 348 in order to ease the processing time when performing playback control for playing the digest content (see figs. 45-46 ¶s 823, 839-840)
On the other hand, the reference of Neuhauser explicitly teaches “wherein: the feature amount includes a plurality of elements corresponding to different pitch classes, and each of the plurality of elements is set to have a value acquired by adding or averaging, across octaves, signal intensities of a band component corresponding to the corresponding pitch class in the audio signal” (see fig. 3 ¶s 41-43, 48, 51 for the feature amount includes a plurality of elements corresponding to different pitch classes, and each of the plurality of elements is set to have a value acquired by adding or averaging, across octaves, signal intensities of a band component corresponding to the corresponding pitch class in the audio signal (i.e. each of these lower-level tags may represent different features as a class that may be extracted from different aspects of audio (e.g., temporal, spectral, harmonic, rhythmic), which may be correlated and cross-correlated as shown in fig. 3B paragraph 49, furthermore, a plurality of extracted features along with a value distance/tolerance, where each feature value is expressed as a tolerable range for later comparison, each extracted audio feature is separately measured and collected as ranges (410A-420A) for template 400, depending on the feature extracted, ranges may be combined, weighted, averaged and/or normalized for unit variance as described in fig. 4 paragraph 53, moreover, pitch 414 may be derived from a histogram of pitches in an audio signal 414A, which may include periods and amplitudes of prominent peaks on a full semitone scale and/or octave independent scale as described in fig. 4 paragraph 54))
Therefore, taking the combined teachings of Patent No. 11,087,779 and Neuhauser as a whole, it would have been obvious before the effective filing date of the claimed invention to incorporate this feature (pitch) into the system of Patent No. 11,087,779 as taught by Neuhauser.
One will be motivated to incorporate the above feature into the system of Patent No. 11,087,779 as taught by Neuhauser for the benefit of having a harmonic feature extraction 205 that may be performed to extract features from the sinusoidal harmonic modeling of an audio signal, wherein harmonic modeling may be particularly advantageous for semantic analysis as natural/musical sounds are themselves harmonic, consisting of a series of frequencies at multiple ratios of the lowest frequency, or fundamental frequency f.sub.0, wherein a plurality of pitch features (e.g., salient pitch, chromagram center) and tonality features (e.g., key clarity, mode, harmonic change) are extracted, wherein the perceived fundamental frequency of a time frame (e.g., 50 ms, 50% overlap) may be calculated using a multi- pitch detection algorithm by decomposing an audio waveform into a plurality of frequency bands (e.g., one below and one above 1 kHz), computing an autocorrelation function of the envelope in each subband, and producing pitch estimates by selecting the peaks from the sum of the plurality of autocorrelation functions in order to ease the processing time when extracting features from the sinusoidal harmonic modeling of an audio signal (see fig. 2 ¶ 38)
Re claim 6, the conflicting claims are not patentably distinct from each other because claim 6 of the Instant Application is recited in claims 9 and 11 of the Patent No. 11,087,779.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4 are rejected under 35 U.S.C. 103 as being unpatentable over Wei (US 8,780,209 B2)(hereinafter Wei), and further in view of Suzuki (US 2012/0033933 A1)(hereinafter Suzuki).
Re claim 1, Wei discloses an information processing method comprising: preparing, for content that includes audio of a plurality of channels, a feature amount of the audio (see col. 8 lines 13-24 for preparing, for content that includes audio of a plurality of channels, a feature amount of the audio (i.e. each of the feature extraction modules 102a, 102b, 102c, 102d extracts at least one characteristic feature of the input and output media signals 110, 112, 118 and 120 to produce a corresponding extracted feature signal 126a, 126b, 126c and 126d as described in fig. 1 col. 7 line 67, col. 8 lines 1-4)); wherein the feature amount includes a plurality of elements depending on frequency characteristics in the respective channels (see col. 8 lines 13-24 for the feature amount includes a plurality of elements depending on frequency characteristics in the respective channels (i.e. each of the feature extraction modules 102a, 102b, 102c, 102d extracts at least one characteristic feature of the input and output media signals 110, 112, 118 and 120 to produce a corresponding extracted feature signal 126a, 126b, 126c and 126d as described in fig. 1 col. 7 line 67, col. 8 lines 1-4))
Wei fails to explicitly teach and setting playback control data for controlling playback of the content in accordance with the feature amount of the audio. However, the reference of Suzuki explicitly teaches and setting playback control data for controlling playback of the content in accordance with the feature amount of the audio type (see figs. 43-44, 46 ¶s 785, 823, 839 for setting playback control data for controlling playback of the content in accordance with the feature amount of the audio (i.e. playback control unit 349 performs playback control for playing the digest content from the digest contents generating unit 348 as described in fig. 45 paragraph 840))
Therefore, taking the combined teachings of Wei and Suzuki as a whole, it would have been obvious before the effective filing date of the claimed invention to incorporate this feature (playback) into the system of Wei as taught by Suzuki.
One will be motivated to incorporate the above feature into the system of Wei as taught by Suzuki for the benefit of having an audio feature amount extracting unit 351 that extracts the audio feature amount of each frame of the content of interest supplied from the contents selecting unit 71, supplies to the audio maximum likelihood state sequence estimating unit 352, wherein a digest contents generating unit 348 uses the highlight scene frame extracted from the frames of the content of interest to generate a digest content of the content of interest, supplies to the playback control unit 349, and a playback control unit 349 performs playback control for playing the digest content from the digest contents generating unit 348 in order to ease the processing time when performing playback control for playing the digest content (see figs. 45-46 ¶s 823, 839-840)
Re claim 2, the combination of Wei and Suzuki as discussed in claim 1 above discloses all the claimed limitations but fails to explicitly teach wherein: a scene type of the content is identified from the feature amount of the audio in the content by use of an identification model representative of relations between feature amounts of audio and scene types, and the playback control data is set according to the identified scene type. However, the reference of Suzuki explicitly teaches wherein: a scene type of the content is identified from the feature amount of the audio in the content by use of an identification model representative of relations between feature amounts of audio and scene types (see figs. 37-38, 40 ¶s 701-705 for a scene type of the content is identified from the feature amount of the audio in the content by use of an identification model representative of relations between feature amounts of audio and scene types (i.e. primitive feature amount extracting unit 241 extracts primitive feature amount that is primitive feature amount for generating audio feature amount suitable for classifying audio into scenes (e.g., "music", "non-music", "noise", "human voice", "human voice+music", "music", etc.) which is used for an audio classification (sound classification) field as described in fig. 36 paragraph 696)), and the playback control data is set according to the identified scene type (see figs. 43-44, 46 ¶s 785, 823, 839 for the playback control data is set according to the identified scene type (i.e. playback control unit 349 performs playback control for playing the digest content from the digest contents generating unit 348 as described in fig. 45 paragraph 840))
Therefore, taking the combined teachings of Wei and Suzuki as a whole, it would have been obvious before the effective filing date of the claimed invention to incorporate this feature (playback) into the system of Wei as taught by Suzuki.
Per claim 2, Wei and Suzuki are combined for the same motivation as set forth in claim 1 above. 
Re claim 3, Wei discloses an information processing method comprising: preparing, for content that includes audio of a plurality of channels, a feature amount of the audio (see col. 8 lines 13-24 for preparing, for content that includes audio of a plurality of channels, a feature amount of the audio (i.e. each of the feature extraction modules 102a, 102b, 102c, 102d extracts at least one characteristic feature of the input and output media signals 110, 112, 118 and 120 to produce a corresponding extracted feature signal 126a, 126b, 126c and 126d as described in fig. 1 col. 7 line 67, col. 8 lines 1-4)); wherein the feature amount includes a plurality of elements depending on signal intensities in the respective channels (see col. 8 lines 13-24, col. 15 lines 48-67, col. 16 lines 1-13 for the feature amount includes a plurality of elements depending on frequency characteristics in the respective channels (i.e. each of the feature extraction modules 102a, 102b, 102c, 102d extracts at least one characteristic feature of the input and output media signals 110, 112, 118 and 120 to produce a corresponding extracted feature signal 126a, 126b, 126c and 126d as described in fig. 1 col. 7 line 67, col. 8 lines 1-4, furthermore, the match confidence signal generator 849 receives the first and second extracted feature signals 826a and 826b generated by the first and second feature extraction modules 802a and 802b and generates a match confidence signal 856, the match confidence signal 856 represents the likelihood or probability that the first and second input media signals 810 and 818 "match" (i.e. they represent the same content), the match confidence signal generator 849 includes a cross correlation module 850 and a strength and consistency analyzer 852, the cross correlation module 850 performs cross correlation on the first and second extracted feature signals 826a and 826b generated by the first and second feature extraction modules 802a and 802b respectively, and outputs a cross-correlation signal 854, the strength and consistency analyzer 852 analyzes the cross-correlation signal 854 generated by the cross correlation module 850 and outputs the match confidence signal 856 as described in fig. 8 col. 16 line 14-34))
Wei fails to explicitly teach and setting playback control data for controlling playback of the content in accordance with the feature amount of the audio. However, the reference of Suzuki explicitly teaches and setting playback control data for controlling playback of the content in accordance with the feature amount of the audio type (see figs. 43-44, 46 ¶s 785, 823, 839 for setting playback control data for controlling playback of the content in accordance with the feature amount of the audio (i.e. playback control unit 349 performs playback control for playing the digest content from the digest contents generating unit 348 as described in fig. 45 paragraph 840))
Therefore, taking the combined teachings of Wei and Suzuki as a whole, it would have been obvious before the effective filing date of the claimed invention to incorporate this feature (playback) into the system of Wei as taught by Suzuki.
One will be motivated to incorporate the above feature into the system of Wei as taught by Suzuki for the benefit of having an audio feature amount extracting unit 351 that extracts the audio feature amount of each frame of the content of interest supplied from the contents selecting unit 71, supplies to the audio maximum likelihood state sequence estimating unit 352, wherein a digest contents generating unit 348 uses the highlight scene frame extracted from the frames of the content of interest to generate a digest content of the content of interest, supplies to the playback control unit 349, and a playback control unit 349 performs playback control for playing the digest content from the digest contents generating unit 348 in order to ease the processing time when performing playback control for playing the digest content (see figs. 45-46 ¶s 823, 839-840)
Re claim 4, the combination of Wei and Suzuki as discussed in claim 3 above discloses all the claimed limitations but fails to explicitly teach wherein: a scene type of the content is identified from the feature amount of the audio in the content by use of an identification model representative of relations between feature amounts of audio and scene types, and the playback control data is set according to the identified scene type. However, the reference of Suzuki explicitly teaches wherein: a scene type of the content is identified from the feature amount of the audio in the content by use of an identification model representative of relations between feature amounts of audio and scene types (see figs. 37-38, 40 ¶s 701-705 for a scene type of the content is identified from the feature amount of the audio in the content by use of an identification model representative of relations between feature amounts of audio and scene types (i.e. primitive feature amount extracting unit 241 extracts primitive feature amount that is primitive feature amount for generating audio feature amount suitable for classifying audio into scenes (e.g., "music", "non-music", "noise", "human voice", "human voice+music", "music", etc.) which is used for an audio classification (sound classification) field as described in fig. 36 paragraph 696)), and the playback control data is set according to the identified scene type (see figs. 43-44, 46 ¶s 785, 823, 839 for the playback control data is set according to the identified scene type (i.e. playback control unit 349 performs playback control for playing the digest content from the digest contents generating unit 348 as described in fig. 45 paragraph 840))
Therefore, taking the combined teachings of Wei and Suzuki as a whole, it would have been obvious before the effective filing date of the claimed invention to incorporate this feature (playback) into the system of Wei as taught by Suzuki.
Per claim 4, Wei and Suzuki are combined for the same motivation as set forth in claim 3 above.
Claims 5 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Suzuki (US 2012/0033933 A1)(hereinafter Suzuki), and further in view of Neuhauser et al. (US 2014/0180673 A1)(hereinafter Neuhauser).
Re claim 5, Suzuki discloses an information processing method comprising: preparing, for content that includes audio, a feature amount of the audio (see figs. 35, 40 ¶s 694, 729, 732-733 for preparing, for content that includes audio, a feature amount of the audio (i.e. audio feature amount extracting unit 221 extracts feature amount regarding the audio of the content for learning in a manner correlated with each frame of the image as described in fig. 36 paragraph 693)); and setting playback control data for controlling playback of the content in accordance with the feature amount of the audio (see figs. 43-44, 46 ¶s 785, 823, 839 for setting playback control data for controlling playback of the content in accordance with the feature amount of the audio (i.e. playback control unit 349 performs playback control for playing the digest content from the digest contents generating unit 348 as described in fig. 45 paragraph 840))
Suzuki fails to explicitly teach wherein: the feature amount includes a plurality of elements corresponding to different pitch classes, and each of the plurality of elements is set to have a value acquired by adding or averaging, across octaves, signal intensities of a band component corresponding to the corresponding pitch class in the audio signal. However, the reference of Neuhauser explicitly teaches wherein: the feature amount includes a plurality of elements corresponding to different pitch classes, and each of the plurality of elements is set to have a value acquired by adding or averaging, across octaves, signal intensities of a band component corresponding to the corresponding pitch class in the audio signal (see fig. 3 ¶s 41-43, 48, 51 for the feature amount includes a plurality of elements corresponding to different pitch classes, and each of the plurality of elements is set to have a value acquired by adding or averaging, across octaves, signal intensities of a band component corresponding to the corresponding pitch class in the audio signal (i.e. each of these lower-level tags may represent different features as a class that may be extracted from different aspects of audio (e.g., temporal, spectral, harmonic, rhythmic), which may be correlated and cross-correlated as shown in fig. 3B paragraph 49, furthermore, a plurality of extracted features along with a value distance/tolerance, where each feature value is expressed as a tolerable range for later comparison, each extracted audio feature is separately measured and collected as ranges (410A-420A) for template 400, depending on the feature extracted, ranges may be combined, weighted, averaged and/or normalized for unit variance as described in fig. 4 paragraph 53, moreover, pitch 414 may be derived from a histogram of pitches in an audio signal 414A, which may include periods and amplitudes of prominent peaks on a full semitone scale and/or octave independent scale as described in fig. 4 paragraph 54))
Therefore, taking the combined teachings of Suzuki and Neuhauser as a whole, it would have been obvious before the effective filing date of the claimed invention to incorporate this feature (pitch) into the system of Suzuki as taught by Neuhauser.
One will be motivated to incorporate the above feature into the system of Suzuki as taught by Neuhauser for the benefit of having a harmonic feature extraction 205 that may be performed to extract features from the sinusoidal harmonic modeling of an audio signal, wherein harmonic modeling may be particularly advantageous for semantic analysis as natural/musical sounds are themselves harmonic, consisting of a series of frequencies at multiple ratios of the lowest frequency, or fundamental frequency f.sub.0, wherein a plurality of pitch features (e.g., salient pitch, chromagram center) and tonality features (e.g., key clarity, mode, harmonic change) are extracted, wherein the perceived fundamental frequency of a time frame (e.g., 50 ms, 50% overlap) may be calculated using a multi- pitch detection algorithm by decomposing an audio waveform into a plurality of frequency bands (e.g., one below and one above 1 kHz), computing an autocorrelation function of the envelope in each subband, and producing pitch estimates by selecting the peaks from the sum of the plurality of autocorrelation functions in order to ease the processing time when extracting features from the sinusoidal harmonic modeling of an audio signal (see fig. 2 ¶ 38)
Re claim 6, the combination of Suzuki and Neuhauser as discussed in claim 5 above discloses all the claim limitations with additional claimed feature taught by Suzuki wherein: a scene type of the content is identified from the feature amount of the audio in the content by use of an identification model representative of relations between feature amounts of audio and scene types (see figs. 37-38, 40 ¶s 701-705 for a scene type of the content is identified from the feature amount of the audio in the content by use of an identification model representative of relations between feature amounts of audio and scene types (i.e. primitive feature amount extracting unit 241 extracts primitive feature amount that is primitive feature amount for generating audio feature amount suitable for classifying audio into scenes (e.g., "music", "non-music", "noise", "human voice", "human voice+music", "music", etc.) which is used for an audio classification (sound classification) field as described in fig. 36 paragraph 696)), and the playback control data is set according to the identified scene type (see figs. 43-44, 46 ¶s 785, 823, 839 for the playback control data is set according to the identified scene type (i.e. playback control unit 349 performs playback control for playing the digest content from the digest contents generating unit 348 as described in fig. 45 paragraph 840))
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOSE M MESA whose telephone number is (571)270-1706.  The examiner can normally be reached on Monday-Friday 8:30AM-6:00PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Thai Tran can be reached on 571-272-7382.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

10/27/2022
/JOSE M. MESA/
Examiner
Art Unit 2484

 
/THAI Q TRAN/Supervisory Patent Examiner, Art Unit 2484