Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on April 5, 2022 has been entered.

Response to Amendment
Applicant’s response to the last office action, filed April 5, 2022 has been entered and made of record. Claims 1-5, 8-9, 14-15, 17, 19, 24-26, 28, and 33-34 have been amended. Claims 1-34 are pending for this application.

Response to Arguments
Applicant’s arguments with respect to claims 1-34 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.


Specification
The specification is objected to as failing to provide proper antecedent basis for the claimed subject matter. See 37 CFR 1.75(d) (1) and MPEP 608.01(o). Correction of the following is required:
The specification is objected to because of lacking support for the following underlined limitations:
-- “identifying an element repeating across at least two video frames in the portion of the video frames”, cited in claims 1, 17, and 26.
-- “locating a set of templates related to the type of element, the set of templates
selected from a second group comprising the plurality of text templates, the plurality of face templates, and the plurality of logo templates”, cited in claims 1, 17, and 26.
	-- “determining that a subset of the video stream is a highlight based on the subset of the video stream exceeding a threshold level of similarity to the set of templates”, cited in claims 1, 17, and 26.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1-34 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Claims 1, 17, and 26, recites the following limitations: “identifying an element repeating across at least two video frames in the portion of the video frames”; “locating a set of templates related to the type of element, the set of templates selected from a second group comprising the plurality of text templates, the plurality of face templates, and the plurality of logo templates”; and “determining that a subset of the video stream is a highlight based on the subset of the video stream exceeding a threshold level of similarity to the set of templates”. These limitations do not have any support from the specification.
For the purpose of prior art consideration, the claims in question will be construed as best understood.

	Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5-10, 12-15, 17, 19-20, 22-26, 28-29, and 31-34 are rejected under 35 U.S.C. 103 as being unpatentable over Merler et al, (US-PGPUB 2019/0289372) in view of Li et al, (US-PGPUB 2016/0261894); and further in view of Liu et al, (US-PGPUB 2018/0157321); and further in view of Ishtiaq, (US Patent 9,888,279); and further in view of Saito et al, (US-PGPUB 2013/0050502); and further in view of Chen et al, (US-PGPUB 2016/0014482)

In regards to claim 1, Merler et al discloses a method for identifying one or more 
highlights of a video stream of an event, the method comprising:
identifying at least a portion of video frames of a video stream, (see at least: Par. 
0029, segment selector 160 selects one or more segments of the media content 115); and 
storing at least one of: the subset of the video stream corresponding to a highlight; an identifier that identifies the highlight within the video stream; and metadata pertaining particularly to the highlight, (see at least: Par. 0031, the highlight storage 190 receives the highlight clip 195 generated by the segment selector 16, [i.e., implicitly storing the subset of the video stream corresponding to a highlight]).
Merler et al does not expressly disclose at a processor, comparing the portion of the video frames of the video stream with a plurality of templates stored in a template database, wherein the plurality of templates comprises a plurality of text templates, a plurality of face templates, and a plurality of logo templates, wherein comparing the portion of the video frames with the plurality of templates comprises: identifying an element repeating across at least two video frames in the portion of the video frames; determining a type of element associated with the element repeating across the at least two video frames in the portion of the video frames, wherein the type of element is selected from a first group comprising a text element, a face element, and a logo element; locating a set of templates related to the type of element, the set of templates
selected from a second group comprising the plurality of text templates, the plurality of face templates, and the plurality of logo templates, and comparing the element repeating across the at least two video frames with the set of templates; determining that a subset of the video stream is a highlight based on the subset of the video stream exceeding a threshold level of similarity to the set of templates; and based on the determining, extracting the subset of the video stream from the video stream.
Li et al discloses marker frame processor 104 compares the logo template against frames of video 114. The comparison of the logo template may compare the accumulated logo template images 408 against every frame of video 114 or a portion of frames of video 114, [i.e., comparing the portion of video frames of the video stream with a plurality of templates stored in a template database, wherein the plurality of templates comprises a plurality of logo templates], (see at least: Fig. 5, step 502, and Par. 0038).
Merler and Li et al are combinable because they are both concerned with the video programming. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify Merler, to include the marker frame processor 104, as though by Li et al, in order to compare the accumulated logo template images 408 against every frame of video 114 or a portion of frames of video 114, (Li et al, Par. 0038)
The combine teaching Merler and Li et al as whole does not expressly disclose comparing the portion of the video frames of the video stream with a plurality of templates stored in a template database, wherein the plurality of templates comprises a plurality of face templates; wherein comparing the portion of the video frames with the plurality of templates comprises: identifying an element repeating across at least two video frames in the portion of the video frames; determining a type of element associated with the element repeating across the at least two video frames in the portion of the video frames, wherein the type of element is selected from a first group comprising a text element, a face element, and a logo element; locating a set of templates related to the type of element, the set of templates selected from a second group comprising the plurality of text templates, the plurality of face templates, and the plurality of logo templates; and comparing the type of element repeating across the at least two video frames with the set of templates; and determining that a subset of the video stream is a highlight based on the subset of the video stream exceeding a threshold level of similarity to the set of templates.
Liu et al discloses the comparing facial recognition templates contained in the entries of a meeting attendee list from the Meeting Metadata 1016, [i.e., plurality of templates stored in a template database], to facial recognition templates generated from the faces it detects in video frames it receives from the Video Camera 1002, [i.e., portion of video frames of the video stream that included faces detected in video frames], (see at least: Par. 0071).
Merler, Li et al, and Liu et al are combinable because they are all concerned with the video recognition. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Merler, and Li et al, to include the Face Detection/Facial Recognition Logic 1018, as though by Liu et al, in order to perform facial recognition operations on the faces detected in video frames received from the Video Camera 1002, (Liu et al, Par. 0071).
The combine teaching Merler, Li et al, and Liu et al as whole does not expressly disclose wherein comparing the portion of the video frames with the plurality of templates comprises: identifying an element repeating across at least two video frames in the portion of the video frames; determining a type of element associated with the element repeating across the at least two video frames in the portion of the video frames, wherein the type of element is selected from a first group comprising a text element, a face element, and a logo element; locating a set of templates related to the type of element, the set of templates selected from a second group comprising the plurality of text templates, the plurality of face templates, and the plurality of logo templates; and comparing the type of element repeating across the at least two video frames with the set of templates; and determining that a subset of the video stream is a highlight based on the subset of the video stream exceeding a threshold level of similarity to the set of templates

Ishtiaq discloses at a processor, comparing the portion of the video frames of the 
video stream with a plurality of templates stored in a template database, wherein the plurality of templates comprises a plurality of text templates, (see at least: col. 17, lines 51-67, comparing the search query to the significant words stored with the text record, and if the significant words contain any words comprised by the search query, the text record is identified as a matching text record, [i.e., comparing the portion of the video frames of the video stream with a plurality of templates stored in a template database, wherein the plurality of templates comprises a plurality of text templates. Note that the significant words stored with the text record implicitly represent plurality of text templates], wherein comparing the portion of the video frames with the plurality of templates comprises:
determining a type of element associated with the element across at least two 
video frames in the portion of the video frames, wherein the type of element is selected from a first group comprising a text element, a face element, and a logo element, (see at least: col. 4, lines 33-40, the EPG source 103 can provide EPG data (e.g., start/stop times, duration, synopsis, channel designations, descriptions, categories, etc.) for that particular video asset. For example, if the EPG data indicates that a particular program is a financial news broadcast, then the video data analyzer 111 can determine that the specific financial newscast, or a corresponding type of financial newscast on the specified channel, typically includes on-screen logos and text in the bottom right-hand corner of the screen as well as scrolling text with information about stock prices along the bottom edge of the screen, and the same EPG data may also indicate to the video data analyzer that the face of various news broadcasters will also be depicted in frames of the visual data of the newscast, [i.e., determining a type of element  associated with the element, “on-screen logos and text and/or the face associated with the various faces of the news broadcasters”, across the at least two video frames, “across the plurality of frames of Newscast, (at least two frames)”, wherein the type of element is selected from a first group comprising a text element, a face element, and a logo element, “e.g., logos, text, and/or face”]). 
Ishtiaq further discloses locating a set of templates related to the type of element, the set of templates selected from a second group comprising the plurality of text templates, the plurality of face templates, and the plurality of logo templates, (see at least: Fig. 5, step 504, col. 17, lines  6-12, the segment searcher 137 can search textual output data (or any visual, audio, and/or textual features) in analyzed content databases 117 and/or 141 for matches to the keyword, [i.e., locating a set of templates related to the type of element selected from at least one of the plurality of text templates, the plurality of face templates, and the plurality of logo templates]), and comparing the type of element with the set of templates, (see at least: see at Fig. 5, step 506, col. 17, lines 13-20, and col. 56-59, in the searching, the search query is compared to the significant words stored with the text record, and if the significant words contain any words comprised by the search query, the text record is identified as a matching text record, [i.e., comparing the element with the set of templates]).
Ishtiaq further discloses determining that a subset of the video stream is a highlight; and based on the determining, extracting the subset of the video stream from the video stream, (col. 14, line 54 through col. 15, line 8, a client device 120 may rank the segments containing the visual, audio, and textual features of the video content, and determining the importance of each of the segments to the user, [i.e., determining that a subset of the video stream is a highlight based on ranking the segments]. For instance, if the visual, audio, and textual features of the video content relate to a baseball content, the user may decide to watch only segments containing home-runs or plays with high emotion. In this case, video content segment services module 125 would extract and fuse visual, audio, and textual features of the video content that correspond to high emotion to generate the ranking, and subsequently, the client device 120 would display to the user indications about the location of the high ranked segments, or would build a summary video containing just the high ranked segments, [i.e., determining that a subset of the video stream is a highlight based on the high ranked segments, “subset of the video stream”, and based on the determining, ” the high ranked segments”, extracting, the subset of the video stream from the video stream, “build a summary video containing just the high ranked segments”]).
Merler, Li et al, and Liu et al and Ishtiaq are combinable because they are both concerned with the object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Merler, Li et al, and Liu et al, to use the segment searcher 137, as though by Ishtiaq, in order to create segments of video content using the matching text records, (Ishtiaq, col. 8, lines 17-22).
The combine teaching Merler, Li et al, and Liu et al and Ishtiaq as whole does not expressly disclose comparing the portion of the video frames of the video stream with a plurality of templates stored in a template database, wherein the plurality of templates comprises a plurality of face templates, and a plurality of logo templates; wherein comparing the portion of the video frames with the plurality of templates comprises: identifying an element repeating across at least two video frames in the portion of the video frames; and determining that a subset of the video stream is a highlight based on the subset of the video stream exceeding a threshold level of similarity to the set of templates.
Saito et al discloses wherein comparing the portion of the video frames with the plurality of templates comprises: identifying an element repeating across at least two video frames in the portion of the video frames, (see at least: Par. 0059-0065, each of the cameras 1 takes time-series images such as moving images, including images of the face of a person present in the monitor area as a moving image targeted for tracking, and the face detecting unit 26 performs processing to detect all faces (one or more faces) present in the input images. Saito et al further discloses face tracking unit 27, which performs processing to track the face of a person as a moving object integrating and optimally matches information such as the coordinates or size of the face of the person detected from the input images, and integrally manages and outputs, as a tracking result, the result of the matching of the identical persons throughout frames, [i.e., detecting one or more face elements repeating across at least two video frames in the portion of the video frames, “time-series images such as moving images”])
Merler, Li et al, Liu et al, Ishtiaq, and Saito et al are combinable because they are all concerned with the object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Merler and Ishtiaq, to use the face detecting unit 26, and face tracking unit 27, as though by Saito et al, in order to obtain highly accurate face detection result, (Saito, see Par. 0163, last line).
The combine teaching Merler, Li et al, Liu et al, Ishtiaq, and Saito et al as whole does not expressly disclose determining that a subset of the video stream is a highlight based on the subset of the video stream exceeding a threshold level of similarity to the set of templates.
Chen discloses determining that a subset of the video stream is a highlight based on the subset of the video stream exceeding a threshold level of similarity to the set of templates, (see at least: Par. 0186-0187, similar video clips can be determined using techniques including (but not limited to) by applying threshold to similarity measurements to determine similarity based upon similarity measurement. A video clips that are scored below the threshold value can be dropped from the video summary sequence. [That is, the video clips, “subset of the video stream”, is a video summary, “highlight”, based on the video clips, “subset of the video stream”, that are scored greater than the threshold, “similarity threshold”]).
Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen et al are combinable because they are all concerned with the object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Merler, Li et al, Liu et al, Ishtiaq, and Saito et al, to apply the threshold to similarity measurements, as though by Chen, in order to determine similarity based upon similarity measurement, and thereby generating the video summary sequence based on the video clips that meet the threshold value, (Chen, Par. 0187)

In regards to claim 5, the combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole discloses the limitations of claim 1.
	Furthermore, Merler discloses calculating an excitement level corresponding to the highlight, based on how exciting the highlight is expected to be for one or more users, (Merler, Par. 0033, the sports highlight system 100 uses two audio markers to determine the position and excitement level of a potential highlight clip. Further, 0055, discloses that the user interface 170 receives a query specifying the set of criteria 510 for generating the highlight clip, and the segment selector 160 selects one or more segments based on the received criteria. If the user interface 170 specifies a specific excitement level, say greater than 80, then the segment selector 160 would select segment 504, whose excitement score is 90); wherein storing the highlight, the identifier, and the metadata comprises storing, in association with the highlight, the excitement level, (Merler, see at least: Par. 0027, segment storage 140 may also store the excitement measures of each segment. The information stored for a segment may also include metadata extracted from the media content 115 regarding the segment. [i.e., the segment represents the one or more highlights]).

In regards to claim 6, the combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole discloses the limitations of claim 5.
Furthermore, Merler discloses calculating the excitement level comprises using only objective criteria that do not require access to data specific to any particular user, (Merler, Par. 0055, the user interface 170 receives a query specifying the set of criteria 510 for generating the highlight clip, and the segment selector 160 selects one or more segments based on the received criteria, [i.e., using only objective criteria that do not require access to data specific to any particular user]).
In regards to claim 7, the combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole discloses the limitations of claim 5.
Furthermore, Merler discloses wherein calculating the excitement level comprises utilizing user data specific to a particular user, such that the excitement level is specific to the particular user, (Merler, Par. 0055, if the user interface 170 specifies a specific excitement level, say greater than 80, then the segment selector 160 would select segment 504, whose excitement score is 90, [i.e., the excitement level is specific to the particular user]).

In regards to claim 8, the combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole discloses the limitations of claim 5.
Furthermore, Merler discloses receiving a user selection of a category pertaining to the one or more highlights, the category selected from the group consisting of: the excitement level or a range of the excitement levels; a league of sports teams involved in an event captured in the highlight; a player involved in the event captured in the highlight; and a season in which the event captured in the highlight occurred, (see at least: Par. 0029, the segment selector 160 may select segments featuring a particular player selected by the user through the user interface 170, [i.e., a player involved in an event captured in the highlight]. Further, Par. 0055, discloses that if the user interface 170 receives a query for "player X," then segment selector would generate the highlight clip 195 based on segments 501 and 503, whose metadata indicates "player X", [i.e., a player involved in an event captured in the highlight]); and 
Transmitting to an output device the highlight based on the user selection, (see at least: Par. 0059, the highlight system 100 detects faces within a temporal window of when a TV graphic with a player name is found. The assumption is that in the video images after the name of a player is displayed, the player's face will be visible multiple times in the video feed, [i.e., outputting only the one or more highlights matching the user selection, [i.e., outputting only the one or more highlights matching the user selection]).

In regards to claim 9, the combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole discloses the limitations of claim 1.
Furthermore, Merler discloses storing the highlight, (Merler, see at least: Par. 0031, the highlight storage 190 receives the highlight clip 195 generated by the segment selector 16, [i.e., implicitly storing the subset of the video stream corresponding to a highlight]); and
the metadata comprises storing the metadata comprising data from one or more other sources besides the video stream, (see at least: Par. 0027-0028, extracting metadata to be stored as part of the information associated with the segment in the segment storage 140. Further, Par. 0005, the metadata is based on contextual cues from environment, statistics, location, [i.e., data from one or more other sources besides the video stream]. Further, Par. 0052, discloses the metadata extractor 150, which may extract information such as names of players, stages of the sports event, statistics regarding players, location, time, and other contextual information, [i.e., this information are implicitly extracted from other sources beside the video stream, e.g., media source such as websites, electronic sports magazines, …etc., which represent the data from other sources]).
In the other hand, Ishtiaq discloses storing the identifier, (see at least: Fig. 1B, col. 8, lines 17-22, the video content segment services 125 produces segments that represent highlights of a video asset. Further, Col. 4, lines 15-18, content server 115 that stores the video data and the features for the visual data, audio data, and textual data in an analyzed content database 117, [i.e., the analyzed content database 117 implicitly store the identified segments that represent highlights of a video asset, “storing the identifier”])

In regards to claim 10, the combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole discloses the limitations of claim 1.
Furthermore, Merler discloses wherein: comparing the portion of the video stream with the plurality of templates comprises using computer vision techniques to analyze the video frames of the video stream to detect one or more desired content delineators, (Merler, see at least: Par. 0026, segment proposal module 130 uses the recognized audible and visible cues to identify segments of the media content 115 that may be included in the highlight clip 195, [i.e., using computer vision techniques to analyze video frames of the video stream to detect one or more desired content delineators]); and each content delineator indicates at least one of a start and an end of a video sequence, (Merler, Par. 0026, the segment proposal module proposes segments by identifying a start and an end of each segment based on on-screen overlay information, scene change, visual event recognition, sensor-based event recognition, and audio-based excitement measures, [i.e., each content delineator indicates at least one of a start and an end of a video sequence]).

In regards to claim 12, the combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole discloses the limitations of claim 10.
Furthermore, Merler discloses wherein using the computer vision techniques to analyze the video frames of the video stream comprises identifying the video frames depicting a replay, (Par. 0032, each classifier performs recognition by analyzing the audio or video content at different points of the media content 200. Further, Par. 0037, discloses that the classifier can be trained using a machine learning classifier such as linear support vector machine (SVM) based on replay video of similar sports events (e.g., a highlight system for golf tournaments uses replay video of golf tournaments), [i.e., identifying the video frames depicting a replay]).

In regards to claim 13, the combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole discloses the limitations of claim 10.
Furthermore, Merler discloses using the computer vision techniques to analyze the video frames of the video stream comprises identifying the video frames showing one or more faces belonging to one or more individuals selected from the group consisting of: players; coaches; anchors; fans; and commentators, (Merler, see at least: par. 0056, 0059, the extracted metadata (e.g., the name of an action or player) is used to collect training examples for training an action recognition classifier or a facial recognition classifier (e.g., for recognizing a visible action of a player or recognizing a face of a player, [i.e., identifying the video frames showing one or more faces belonging to one or more individuals, “players”).

In regards to claim 14, the combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole discloses the limitations of claim 1.
Furthermore, Ishtiaq discloses wherein using the computer vision techniques to analyze the video frames of the video stream comprises identifying the video frames showing text that matches a text template of the plurality of the set of templates, (Ishtiaq, see at least: col. 17, lines 51-67)

The prior art of record, Wan et al, (US-PGPUB 2014/0111542), discloses also the 
identifying video frames showing text that matches a text template of the plurality of the set of templates, (Wan et al, see at least: Par. 0106)

In regards to claim 15, the combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole discloses the limitations of the claim 10.
Furthermore, Li et al discloses wherein using the computer vision techniques to analyze the video frames of the video stream comprises identifying the video frames showing a logo that matches a logo template of the plurality of logo templates, (Li et al, see at least: Fig. 5, step 502, and Par. 0038, a marker frame processor 104 compares the logo template against frames of video 114. The comparison of the logo template may compare the accumulated logo template images 408 against every frame of video 114 or a portion of frames of video 114, [i.e., implicitly identifying the video frames showing a logo that matches a logo template of the plurality of logo templates, based on comparison]). 

Regarding claim 17, claim 17 recites substantially similar limitations as set forth in claim 1. As such, claim 17 is rejected for at least similar rational.
The Examiner further acknowledged the following additional limitation: “a non-transitory computer-readable medium for identifying one or more highlights of a video stream of an event, comprising instructions stored thereon, that when executed by a processor”. However, Merler discloses the “non-transitory computer-readable medium for identifying one or more highlights of a video stream of an event, comprising instructions stored thereon, that when executed by a processor”, (Merler, Par. 0071, computer readable storage medium (or media) having computer readable program instructions”.

Regarding claim 19, claim 19 recites substantially similar limitations as set forth in claim 5. As such, claim 19 is rejected for at least similar rational.

Regarding claim 20, claim 20 recites substantially similar limitations as set forth in claim 10. As such, claim 20 is rejected for at least similar rational.

Regarding claim 22, claim 22 recites substantially similar limitations as set forth in claim 12. As such, claim 22 is rejected for at least similar rational.

Regarding claim 23, claim 23 recites substantially similar limitations as set forth in claim 13. As such, claim 23 is rejected for at least similar rational.

Regarding claim 24, claim 24 recites substantially similar limitations as set forth in claim 14. As such, claim 24 is rejected for at least similar rational.

Regarding claim 25, claim 25 recites substantially similar limitations as set forth in claim 15. As such, claim 25 is rejected for at least similar rational.

Regarding claim 26, claim 26 recites substantially similar limitations as set forth in claim 1. As such, claim 26 is rejected for at least similar rational.
The Examiner further acknowledged the following additional limitation: “a system for identifying one or more highlights of a video stream of an event”. However, Merler discloses the “system for identifying one or more highlights of a video stream of an event”, (Merler, see at least: Par. 0071, “the present application may be a system”).

Regarding claim 28, claim 28 recites substantially similar limitations as set forth in claim 5. As such, claim 28 is rejected for at least similar rational.

Regarding claim 29, claim 20 recites substantially similar limitations as set forth in claim 10. As such, claim 29 is rejected for at least similar rational.

Regarding claim 31, claim 31 recites substantially similar limitations as set forth in claim 12. As such, claim 31 is rejected for at least similar rational.

Regarding claim 32, claim 32 recites substantially similar limitations as set forth in claim 13. As such, claim 32 is rejected for at least similar rational.

Regarding claim 33, claim 33 recites substantially similar limitations as set forth in claim 14. As such, claim 33 is rejected for at least similar rational.

Regarding claim 34, claim 34 recites substantially similar limitations as set forth in claim 15. As such, claim 34 is rejected for at least similar rational.

Claims 2, 18, and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen, as applied to claim 1 above; and further in view of Lee, (US-PGPUB 2016/0261929)

 In regards to claim 2, the combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole discloses the limitations of claim 1.
The combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole does not expressly disclose causing an output device to output, the video stream concurrently with comparison of the portion of the video stream with the plurality of templates.
However, Lee discloses causing an output device to output, the video stream concurrently with comparison of the portion of the video stream with the plurality of templates, (see at least: Par. 0099, output the content and the summary content at the same time).
Merler, Li et al, Liu et al, Ishtiaq, Saito et al, Chen, and Lee are combinable because they are all concerned with determining highlights. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen, to use the controller 130, as though by Lee, in order to control the outputter 120 to output the content and the summary content at the same time in accordance with a user command, (Lee, Par. 0099).

Regarding claim 18, claim 18 recites substantially similar limitations as set forth in claim 2. As such, claim 18 is rejected for at least similar rational.

Regarding claim 27, claim 27 recites substantially similar limitations as set forth in claim 2. As such, claim 27 is rejected for at least similar rational.

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen, as applied to claim 1 above; and further in view of Sohn, (US-PGPUB 2017/0164055)
The combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole discloses the limitations of the claim 1.
The combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole does not expressly disclose storing the highlight reel comprising the plurality of highlights arranged to play in a predefined order.
Sohn discloses storing a highlight reel comprising the plurality of highlights, arranged to play in a predefined order, (see at least: Par. 0031, the broadcast service providing server 110 may directly create and store the highlight image 113 of content, and may provide the highlight images of a plurality of contents to the electronic device 120 through at least one channel. Further, Par. 0035, discloses that the electronic device 120 may repeatedly play the highlight images in the order, [i.e., arranged to play in a predefined order]).
Merler, Li et al, Liu et al, Ishtiaq, Saito et al, Chen, and Sohn are combinable because they are all concerned with determining highlights. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen, to use the electronic device 120, as though by Sohn, in order to repeatedly play the highlight images in the order, (Sohn, see Par. 0035)

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen, as applied to claim 1 above; and further in view of Lindley et al, (US-PGPUB 2009/0282336)
The combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole discloses the limitations of the claim 1.
Furthermore, Merler discloses that the highlight is a portion of a plurality of highlights, (see at least: Par. 0045, identifying segments as potential highlights); and transmitting to an output device a sequence of the plurality of highlights, (see at least: Par. 0031, the computing device 800 may transmit the stored highlight clip 195 through a communications interface to a network or an external storage device).
In the other hand, Ishtiaq discloses an output device comprising mobile device, (Ishtiaq, col. 15, lines 35-38, user interface engine 121 contains a display; e.g., when the user device is a tablet computer or smartphone)
However, the combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole does not expressly disclose and matching times for each highlight of the plurality of the highlights in the video stream; and the metadata.
Lindley discloses that if the category is "highlights from a Hawaii trip," the process 300 selects media content items that have event metadata matching the Hawaii trip and that have a "highlight" metadata tag, (see at least: Par. 0052). Further, Par. 0045, discloses that the metadata can include various types of information, such as time metadata, location metadata, [which implicitly enables the matching times for each of the time and location of the highlights in the video stream; and the metadata]).
Merler, Li et al, Liu et al, Ishtiaq, Saito et al, Chen, and Lindley are combinable because they are all concerned with determining highlights. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen, to include the matching highlights with metadata times and location, as though by Lindley, in order to select the media content items whose metadata sufficiently match the selected category or categories, (Lindley, Par. 0052).

Claims 11, 21, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen, as applied to claims 1, 17, and 26 above; and further in view of Chien, (US-PGPUB  2017/0347014)

In regards to claim 11, the combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole discloses the limitations of the claim 10.
The combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole does not expressly disclose wherein using the computer vision techniques to analyze the video frames of the video stream comprises using at least one of boundary detection techniques and PSNR computation to determine whether sufficient similarity exists between an image extracted from one or more of the video frames and a template image of the plurality of templates
Chien discloses using the peak signal-to-noise ratio as an objective standard of evaluating the similarity of two patterns, (Par. 0027).
Merler, Li et al, Liu et al, Ishtiaq, Saito et al, Chen, and Chien are combinable because they are all concerned with determining highlights. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen, to use the peak signal-to-noise ratio, as though by Chien, in order to evaluate the similarity of two patterns, (Par. 0027).

Regarding claim 21, claim 21 recites substantially similar limitations as set forth in claim 11. As such, claim 21 is rejected for at least similar rational.

Regarding claim 30, claim 30 recites substantially similar limitations as set forth in claim 21. As such, claim 33 is rejected for at least similar rational.

The following prior art of record, disclose also the limitation: “boundary detection 
techniques”:
US-PGPUB 2015/0054975, (Par. 0011)
US-PGPUB 2014/0219524, (Par. 0195).

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen, as applied to claim 15 above; and further in view of Chattopadhyay et al, (US-PGPUB 2014/0016864).
The combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole discloses the limitations of the claim 15.
The combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen as whole does not expressly disclose wherein the logo template is selected from the group consisting of: a network logo associated with a television, cable, or other broadcast network that provides the video stream; and a team logo associated with a sports team involved in the event.
However, Chattopadhyay discloses wherein the logo template is selected from the group consisting of: a network logo associated with a television, cable, or other broadcast network that provides the video stream; and a team logo associated with a sports team involved in the event, (Chattopadhyay, see at least: Par. 0029, based on the logo ID, a match channel/match channel ID is determined from the channel file relative to the television network service, [i.e., a network logo associated with a television, cable, or other broadcast network that provides the video stream).
Merler, Li et al, Liu et al, Ishtiaq, Saito et al, Chen, and Chattopadhyay are combinable because they are all concerned with determining highlights. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Merler, Li et al, Liu et al, Ishtiaq, Saito et al, and Chen, to use the logo recognition system 110, as though by Chattopadhyay, in order to recognizes the logo in the feed video, (Chattopadhyay, Par. 0067)

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMARA ABDI whose telephone number is (571)270-1670. The examiner can normally be reached 9:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached on (571)272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/AMARA ABDI/Primary Examiner, Art Unit 2668                                                                                                                                                                                            05/06/2022