DETAILED ACTION
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-7, 12, and 14-15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Cheng, US 2017/0065889.

In Reference to Claims 1, 14, and 15
	Cheng teaches a method, non-transitory computer readable storage medium and a data processing apparatus (Fig. 1 and Par. 17 and 26-28) adapted to generate a recording for a content, comprising input circuitry configured to obtain image data and corresponding audio data for the content (Fig. 2 and Par. 47-49 which teaches obtaining sound data from a game video. Fig. 4 and Par. 55-57 which teaches receiving visual features of the game video. Ans see Par. 53 which teaches that the system can use both an audio feature extraction system and video feature extraction system on the same game video), the image data comprising a plurality of image frames (Par. 47); a plurality of machine learning models comprising one or more first machine learning models and one or more second machine learning models (Par. 38 which teaches using a machine learning model for audio concept detection, and Par. 64 which teaches using a machine learning model for video concept detection), each first machine learning model configured to obtain at least a part of the image data and trained to output respective indicator data for the plurality of image frames, each second machine learning model configured to obtain at least a part of the audio data and trained to output the respective indicator data for the plurality of image frames corresponding to the at least a part of the audio data (Par. 38 and 64 which teach that the respective machine learning models have been trained to data identifying detected video game concepts in the game video), wherein the respective indicator data associated with an image frame is indicative of an interest score for the image frame (Par. 38 and 64 which teaches a “detection confidence value” for the detected concept in the corresponding video segment. See Par. 65 which teaches that video segments can constitute a key frame or key frames. See also Par. 40 and 68 which teaches detected game concepts are ranked in terms of interest) ; authoring circuitry configured to select one or more groups of image frames from the plurality of image frames in dependence upon respective indicator data associated with the plurality of image frames; and recording circuitry configured to generate the recording for the content, the recording comprising one or more of the selected groups of image frames (Par. 40-41 and 70-71 which teaches selecting and fusing game video segments containing detected game concepts of interest into a highlight video. Par. 53 which teaches that the game system can utilize both the audio and video detection in constructing the highlight video).

	In Reference to Claim 2
	Cheng teaches where each first machine learning model is trained with image data to learn a relationship between a property of the image data and the interest score (Par. 64 and 69).

	In Reference to Claim 3
	Cheng teaches where each first machine learning model is trained to learn the relationship for a different property of the image data (Given that there are “one of more” models, where there is only one machine learning model then that machine learning model is by definition trained to learn a unique and therefore “different” property. See also Par. 65 which teaches that multiple different types of concept classifiers, i.e. machine learning models, can be used to detect different concepts).

	In Reference to Claim 4
	Cheng teaches wherein the one or more first machine learning models comprises one or more from the list consisting of: a machine learning model trained to output the respective indicator data in dependence upon a luminance associated with an image frame; a machine learning model trained to output the respective indicator data in dependence upon a degree of motion for an image frame with respect to a next image frame; and a machine learning model trained to output the respective indicator data in dependence upon a status of a visual metric associated with an image frame, the visual metric indicative of a status of a character in the content or a progression of the content (Par. 58-59 and 64-65 which teaches detecting low level features including motion information and utilizing the machine learning models to identify game concepts such as scenes or characters where, as broadly claimed examiner considers a depicted game scene to constitute a visual matric which is indicative of a progression of the convent. I.e. level 1, level 2, etc.).

	In Reference to Claim 5
	Cheng teaches where each second machine learning model is trained with audio data to learn a relationship between a property of the audio data and the interest score (Par. 38-40).

	In Reference to Claim 6
	Cheng teaches wherein each second machine learning model is trained to learn the relationship for a different property of the audio data. (Par. 38 which teaches detection of properties of the audio data. Given that there are “one of more” models, where there is only one machine learning model then that machine learning model is by definition trained to learn a unique and therefore “different” property).

	In Reference to Claim 7
	Cheng teaches where the one or more second machine learning models comprises a machine learning model trained to output the respective indicator data in dependence upon a type of sound (Par. 37 which teaches various sound elements, features, or types which can be detected).

	In Reference to Claim 12
	Cheng teaches wherein the authoring circuitry is configured to select the one or more groups of image frames according to an order of priority that is dependent upon the interest score (Par. 40 and 64 which teaches that segments of higher interest are ranked higher and highlights can be selected based on rank).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Cheng, US 2017/0065889, in view of Oz et al., US 2020/0311433.

In Reference to Claim 8
Cheng teaches an apparatus as described above in reference to Claim 5. Further Cheng teaches where the one or more second machine learning models comprises a machine learning model trained to output the respective indicator data in dependence upon the music in the audio data(Par. 37 “portions of music”). However, Cheng does not teach where it is dependent on the intensity of the audio where intensity relates to a degree of one or more selected from the list consisting of: loudness; speed; and rate of occurrence of different musical notes.
Oz et al. teaches a system for detecting game events for highlights (Abstract) where rises in audio level are used for detecting highlight events (Par. 41-42 which teaches measuring audio level and associating important events with rises in audio level which examiner considers “loudness”).
It would be desirable to modify the system of Cheng to include detected of highlight segments based loudness of audio including loudness of music audio in the game video in order to identify game highlight events or scenes that are associated increases in the audio level as taught by Oz et al. as rising sound volume is often associated with excitement.
Therefore, it would have been obvious to one of ordinary skill in the art at the time of filing of the invention to modify the system of Cheng to include detected of highlight segments based loudness of audio including loudness of music audio in the game video in order to identify game highlight events or scenes that are associated increases in the audio level as taught by Oz et al.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Cheng, US 2017/0065889, in view of Dury et al., US 2017/0006322.

In Reference to Claim 10
Cheng teaches an apparatus as described above in reference to Claim 1. Further Cheng teaches where the authoring circuitry is configured to identify a highlight sequence consisting of a plurality of image frames (Par. 65). However, Cheng does not teach wherein the authoring circuitry is configured to assign a start tag to one or more of the image frames and an end tag to one or more image of the frames in the plurality of image frames, the start tag representing an image frame at a start of a group of image frames and the end tag representing an image frame at an end of a group of image frames.
Dury et al. teaches a system for identifying video segments to include in a highlight video which includes wherein the authoring circuitry is configured to assign a start tag to one or more of the image frames and an end tag to one or more image of the frames in the plurality of image frames, the start tag representing an image frame at a start of a group of image frames and the end tag representing an image frame at an end of a group of image frames (Par. 148 “An event type for the tag 466. As an example, in some embodiments tag event types may include " start" tags and "stop" tags that indicate the start of an interesting or notable event and the end of the respective event; thus, two tags (a start and stop tag) would define an event in the respective broadcast 442 as indicated by the respective participant.”).
It would be desirable to modify the apparatus of Cheng to include start and end tags to identify the beginning and ending of clips to include in highlights as taught by Dury et al. in order to assist the user in adjusting a time frame of an identified highlight to liking via the editing functionality as described in Cheng Fig. 6 and Par. 86.
Therefore, it would have been obvious to one of ordinary skill in the art at the time of filing of the invention to modify the apparatus of Cheng to include start and end tags to identify the beginning and ending of clips to include in highlights as taught by Dury et al.

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Cheng, US 2017/0065889, in view of Van Welzen et al., US 2021/0034906.

In Reference to Claim 13
Cheng teaches an apparatus as described above in reference to Claim 1, including first and second machine learning models. However, Cheng does not explicitly teach where the models are a regression model or binary classification model.
Van Welzen et al. teaches a gaming system which identifies things of interest in a game video(Abstract) where the system can use various machine learning techniques including regression (Par. 36).
It would be desirable to modify the apparatus of Cheng to utilize regression based machine learning as taught by Van Welzen et al. in order to implement the machine learning based game identification systems of Cheng using a well-known machine learning technique as taught by Van Welzen et al.
Therefore, it would have been obvious to one of ordinary skill in the art at the time of filing of the invention to modify the apparatus of Cheng to utilize regression based machine learning as taught by Van Welzen et al.

Allowable Subject Matter
Claims 9 and 11 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Response to Arguments
Applicant's arguments filed 07/07/2022 have been fully considered but they are not persuasive. Regarding Cheng, examiner appreciates the differences between the disclosed invention and the prior art described by the applicant in their arguments dated 07/07/2022. However, applicant has failed to point out any claim language which is not disclosed by the prior art. Rather applicant makes general arguments with regard to the mode of operation of the disclosed invention and the prior art. At best applicant cites “their own respective indicator data,” however even this exact language is not present in the claims. Rather the claims require that the first model and the second model output respective indicator data and that in “dependence upon respective indicator data” one or more selected images frames are selected. Although the feature extractor described in Fig. 2 and Par. 53 may first require going through the sound extraction, it still will “obtain at least a part of the image data” and output “respective image data.” Applicant is encouraged to amend the claims in order to clarify the distinction between the intended functionality of their disclosed invention and the cited prior art. And to better claim the completely independent operation of the first and second machine learning models.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CARL V LARSEN whose telephone number is (571)270-3219. The examiner can normally be reached Monday through Friday; 10:00 am - 6:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Dmitry Suhol can be reached on (571) 272-4430. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/C.V.L/Examiner, Art Unit 3715                                                                                                                                                                                                        /THOMAS J HONG/Primary Examiner, Art Unit 3715