DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Hsieh
Claims 1-4, 8, 11 and 19 are rejected under 35 U.S.C. 102(a)(1)/(a)(2) as being anticipated by Hsieh et al.(USPubN 2020/0327160; hereinafter Hsieh).
As per claim 1, Hsieh teaches a metadata server, comprising: circuitry configured to: receive a first segment from a plurality of segments of first media content(“the access component 110 accesses video content. In some embodiments, the access component 110 accesses the video content on a database, a networked resource, a server, or any other suitable repository of video content or data” in 
determine context information associated with the first segment of the first media content based on a characteristic of at least one frame of a plurality of frames in the first segment(“the recognition component 130 identifies one or more characteristics for at least a portion of frames of the plurality of frames” in Para.[0028], “the recognition component 130 determines time frames for each characteristic of the one or more characteristics within the portion of the frames” in Para.[0029]); 
generate first metadata associated with the first segment based on the determined context information, wherein the first metadata includes timing information corresponding to the determined context information to control a first set of electrical devices(“the association component 150 assigns at least one frame keyword to each time frame within the portion of the frames. In some embodiments, the association component 150 assigns the at least one frame keyword to each time frame by generating an association between each frame keyword and each time frame identified for that frame keyword. The association component 150 may generate the associations or assignments by generating metadata to be associated with the video content” in Para.[0034], “an index store 300 may comprise a content title 302, information for one or more images 304, information for frame sets 306. The information for the one or more images 304 may include times 308 at which the image occurs in video content, features 310 of the image 304, location information 312, and other information 314. The information for frame sets 306 may include a start time 316, an end time 318, frame set dimensions 320, and frame set descriptions 322” in Para.[0035]); and 
transmit the received first segment of the first media content and the generated first metadata to a media device associated with the first set of electrical devices(“the search component 180 generates a partial search result indicating at least one time frame within the video content. The search component 180 may generate the partial search result by selecting one or more frames of the video 
As per claim 2, Hsieh teaches wherein the circuitry is further configured to: transmit, to a media server, a first request which includes identification information of the first media content; receive, from the media server, the first media content including the first segment, based on the first request; generate the first metadata associated with the first segment based on the received first media content; and transmit the received first segment of the first media content and the generated first metadata to the media device(“the search component 180 generates a partial search result indicating at least one time frame within the video content. The search component 180 may generate the partial search result by selecting one or more frames of the video content associated with the at least one time frame of the time series frame index identified in operation 610 and compared in operation 620. In some embodiments, the partial search results are presented via a user interface. For example, where a single search type is identified and is associated with the time series frame index, the partial search results may be a result set presented on a display device in response to receiving the search query” in Para.[0053]).
As per claim 3, Hsieh teaches wherein the circuitry is further configured to apply one or more machine learning models on the characteristic of the at least one frame of the plurality of frames in the first segment to determine the context information associated with the first segment(“the recognition component 130 identifies one or more characteristics for at least a portion of frames of the plurality of frames. In some embodiments, the segmentation component 120 includes modules configured to use the image encoding framework described above. The modules of the segmentation component 120 may 
As per claim 4, Hsieh teaches wherein the circuitry is further configured to: retrieve the plurality of segments of the first media content; generate a plurality of metadata based on the context information associated with the corresponding segment of the plurality of segments, wherein each of the plurality of metadata is associated with the corresponding segment of the plurality of segments(“the association component 150 assigns at least one frame keyword to each time frame within the portion of the frames. In some embodiments, the association component 150 assigns the at least one frame keyword to each time frame by generating an association between each frame keyword and each time frame identified for that frame keyword. The association component 150 may generate the associations or assignments by generating metadata to be associated with the video content” in Para.[0034], “the recognition component 130 determines time frames for each characteristic of the one or more characteristics within the portion of the frames. In some instances, the recognition component 130 determines time frames during or as a sub-operation of one or more other operations, such as operation 220. In some embodiments, the recognition component 130 determines time frames for each characteristic by grouping subsets of frames including a same characteristic. The recognition component 130 may access metadata associated with each frame or frames within a grouped subset of frames to determine a range of time (e.g., a time frame or initial and terminal time codes) for a specified characteristic within the video content. In some embodiments, the recognition component 130 determines time frames for each characteristic by determining the frames in which the characteristic is present. The recognition component 130 may then compare frame information with the video content 
generate second media content from the first media content based on the generated plurality of metadata(“the search component 180 generates a partial search result indicating at least one time frame within the video content. The search component 180 may generate the partial search result by selecting one or more frames of the video content associated with the at least one time frame of the time series frame index identified in operation 610 and compared in operation 620. In some embodiments, the partial search results are presented via a user interface. For example, where a single search type is identified and is associated with the time series frame index, the partial search results may be a result set presented on a display device in response to receiving the search query” in Para.[0053]).
As per claim 8, Hsieh teaches wherein the first media content corresponds to video content or audio-video (AV) content, and wherein the characteristic of the at least one frame in the first segment of the video content or the AV content corresponds to at least one of: an object recognized in the at least one frame, a person recognized in the at least one frame, an emotional state of at least one object in the at least one frame, background information of the at least one frame, an ambient lighting condition in the at least one frame, motion information of at least one object in the at least one frame, a gesture associated with at least one object in the at least one frame, or genre information associated with the at least one frame(“the generation component 140 may generate frame keywords using a language model. The language model may be part of or associated with the image model, such as embodiments where the image model is at least a part of the recognition component 130. The language model may translate 
As per claim 11, Hsieh teaches wherein the first metadata includes the timing information and the determined context information of the first segment of the first media content(“an index store 300 may comprise a content title 302, information for one or more images 304, information for frame sets 306. The information for the one or more images 304 may include times 308 at which the image occurs in video content, features 310 of the image 304, location information 312, and other information 314. The information for frame sets 306 may include a start time 316, an end time 318, frame set dimensions 320, and frame set descriptions 322” in Para.[0035]).
As per claim 19, the limitations in the claim 19 has been discussed in the rejection claim 1 and rejected under the same rationale. 	
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Hsieh in view of Dalbee
Claims 5, 7 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Hsieh et al.(USPubN 2020/0327160; hereinafter Hsieh) in view of Dalbee et al.(USPubN 2020/0204867; hereinafter Dalbee).
As per claim 5, Hsieh teaches all of limitation of claim 1. 
Hsieh is silent about wherein the first media content corresponds to audio content, and wherein the context information of the audio content comprises at least one of: a song, a musical tone, a monologue, a dialogue, a laughter sound, a distress sound, a pleasant sound, an unpleasant sound, an ambient noise, a background sound, a loud sound, or defined sound pattern associated with a real-time object.
Dalbee teaches wherein the first media content corresponds to audio content, and wherein the context information of the audio content comprises at least one of: a song, a musical tone, a monologue, a dialogue, a laughter sound, a distress sound, a pleasant sound, an unpleasant sound, an ambient noise, a background sound, a loud sound, or defined sound pattern associated with a real-time object(“While the content is playing on a device, content data is analyzed, and a number of signatures are identified. In some embodiments, audio data is analyzed to identify audio signatures (voice or song recognition is an example where audio signatures can be used as identifiers), and each audio signature is associated, based on audio and/or video characteristics, with a particular subject within the content” in Para.[0004]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Hsieh with the above teachings of Dalbee in order to enhance an end user's experience of media content.
As per claim 7, Hsieh teaches all of limitation of claim 1. 
Hsieh is silent about wherein the first media content corresponds to audio content, and wherein the characteristic of the at least one frame in the first segment of the audio content comprises at least 
Dalbee teaches wherein the first media content corresponds to audio content, and wherein the characteristic of the at least one frame in the first segment of the audio content comprises at least one of: a loudness parameter, a pitch parameter, a tone parameter, a rate-of-speech parameter, a voice quality parameter, a phonetic parameter, an intonation parameter, an intensity of overtones, a voice modulation parameter, a pronunciation parameter, a prosody parameter, a timbre parameter, or one or more psychoacoustic parameters(“While the content is playing on a device, content data is analyzed, and a number of signatures are identified. In some embodiments, audio data is analyzed to identify audio signatures (voice or song recognition is an example where audio signatures can be used as identifiers), and each audio signature is associated, based on audio and/or video characteristics, with a particular subject within the content” in Para.[0004]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Hsieh with the above teachings of Dalbee in order to enhance an end user's experience of media content.
As per claim 9, Hsieh teaches all of limitation of claim 1. 
Hsieh is silent about wherein the circuitry is further configured to: identify a type of the first segment of the first media content, wherein the type of the first segment comprises one of an audio type segment, an audio-video (AV) type segment, or a gaming type segment; and generate the first metadata associated with the first segment based on the identified type of the first segment.
Dalbee teaches wherein the circuitry is further configured to: identify a type of the first segment of the first media content, wherein the type of the first segment comprises one of an audio type 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Hsieh with the above teachings of Dalbee in order to enhance an end user's experience of media content.

Hsieh in view of Kocks
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Hsieh et al.(USPubN 2020/0327160; hereinafter Hsieh) in view of Kocks et al.(USPubN 2013/0343597; hereinafter Kocks).
As per claim 6, Hsieh teaches all of limitation of claim 1. 
Hsieh is silent about wherein the first media content corresponds to video content or audio-video (AV) content, and wherein the context information of the video content or the AV content 
Kocks teaches wherein the first media content corresponds to video content or audio-video (AV) content, and wherein the context information of the video content or the AV content comprises at least one of an action scene, a comedy scene, a romantic scene, a suspense scene, a horror scene, a drama scene, a poetry scene, a party scene, or a dance scene(“the video clip may be associated with metadata that indicates a degree of similarity between the particular video clip and other video clips. In such embodiments, the degree of similarity may be determined based on a number of videos that describe a common event (e.g., a broadcast television program, a cable program, or a movie), that include a common scene or type of scene (e.g., sports, action, or comedy scene), or that are associated with a common individual (e.g., an actor, a politician, or a musician). The degree of similarity is not, however, limited to such exemplary indicia, and in further embodiments, the degree of similarity between video content may be based on any additional or alternate element of metadata, including, but not limited to, a broadcast channel, a country of origin, a video category, or a time slot” in Para.[0155]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Hsieh with the above teachings of Kocks in order to enhance an end user's experience of media content.

Hsieh in view of Nurmi
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Hsieh et al.(USPubN 2020/0327160; hereinafter Hsieh) in view of  Nurmi et al.(USPubN 2012/0151413; hereinafter Nurmi).
As per claim 10, Hsieh teaches all of limitation of claim 1. 
Hsieh is silent about wherein the first set of electrical devices comprises at least one of: an aroma dispenser, an electrical furniture, a lighting device, a sound reproduction device, an electrical 
Nurmi teaches wherein the first set of electrical devices comprises at least one of: an aroma dispenser, an electrical furniture, a lighting device, a sound reproduction device, an electrical curtain, an electrical toy, an electrical wind-chime, an electrical vase, a digital photo-frame, or an internet of things (IOT) device(“The mobile terminal 10 may also comprise a user interface including an output device such as a conventional earphone or speaker 24, a ringer 22, a microphone 26, a display 28, and a user input interface, all of which are coupled to the controller 20” in Para.[0022]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Hsieh with the above teachings of Nurmi in order to enhance an end user's experience of media content.
Allowable Subject Matter
Claims 12-18 and 20 allowed.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SUNGHYOUN PARK whose telephone number is (571)270-1333.  The examiner can normally be reached on M - Thur 6:00 am - 4 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, THAI Q TRAN can be reached on (571)272-7382.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/SUNGHYOUN PARK/Examiner, Art Unit 2484