DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
Claims 1-38 are pending in this application. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).

The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-38 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of prior U.S. Patent No. 9,646,387 issued from parent Application No. 14/514,602. Although the claims at issue are not identical, they are not patentably distinct from each other because they are both directed towards methods and systems for determining a range of video data associated with a motion of objects during an event and identifying that range of data for transmission and playback.
Claims 1-38 are rejected on the ground of nonstatutory double patenting as being unpatentable over Claims 1-25 of prior U.S. Patent No. 10,657,653 issued from parent Application No. 15/480,694. Although the claims at issue are not identical, they are not patentably distinct from each other because 

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 12 and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Naikal et al (US PGPub US 2014/0333775 A1, filed on May 9, 2014 with a foreign priority date of May 10, 2013), hereby referred to as “Naikal”.

Consider Independent Claims 1, 12 and 20.
Naikal teaches 
-; 1. An apparatus comprising: one or more processors; and memory storing instructions that, when executed by the one or more processors, cause the apparatus to: / 12. An apparatus comprising: one or more processors; and memory storing instructions that, when executed by the one or more processors, cause the apparatus to: / 20. One or more non-transitory computer readable media storing instructions that, when executed, cause: receiving an indication of an occurrence of an event within content;   (Naikal: abstract, [0022]-[0026], Figure 1)
-; 1. receive an indication of an occurrence of an event within content; determine, based on the indication of the occurrence of the event within the content: / 12. determine, based on a description of a content item, a portion of the content item to be processed to determine a time of an occurrence of an event; / 20. determining, based on the indication of the occurrence of the event within the content:  (Naikal: [0022] FIG.1, [0029] The event processor 104 in the processing station 160 optionally requests full video data from one or more of the cameras 108A-108N during operation. For example, in response to identification of an even, the processor 104 requests video data from one or more of the cameras 108A-108N and the video output device 168 displays the video for an operator to review. The operator optionally generates additional requests for video from one or more of the other cameras 108A-108N. Thus, in one mode of operation a subset of the cameras 108A-108N transmit full video data to the processor 104, while other cameras only transmit the feature data and feature update data. As described above, the memory 120 in each of the cameras 108A-108N include an internal data storage device that is configured to buffer video data for a predetermined time period to enable the processor 104 to request additional video data that are stored in the camera.)
-; 1. an expected motion of objects associated with the event; and a portion, of the content, in which the event is expected to occur; / 12. conduct, based on an expected motion of objects, image analysis on the portion of the content item to detect a motion of objects associated with the event; / 20. an expected motion of objects associated with the event; and a portion, of the content, in which the event is expected to occur; (Naikal: [0029] For example, the memory 120 in the camera 108B includes a digital data storage device that holds a buffer of the previous 10 minutes of recorded video for the scene 112. The camera 108B generates and transmits feature vector data for objects that are present in the scene 112, including moving objects, and transmits the feature vector data to the processor 104. [0030], [0031] The training process includes a series of trials where a humans or other object perform motions that correspond to events of interest, and the motions are recorded as video from multiple viewing angles. A manual annotation process includes one or more annotators who select a limited number of key-frames from each of the video sequences to assist in generating a trained model for the human or object movements that occur in each event of interest. In one embodiment, the process of manual selection for key-frames during training includes an easy to use interface. )
-; 1. determine, after receiving the indication of the occurrence of the event within the content, and based on comparing the expected motion of objects with a motion of objects in the portion, a subset, of the portion, in which the event occurs; / 12. determine, based on the image analysis, information indicating the time, in the content item, of the occurrence of the event; / 20. determining, after receiving the indication of the occurrence of the event within the content, and based on comparing the expected motion of objects with a motion of objects in the portion, a subset, of the portion, in which the event occurs;  (Naikal: [0032] For example, in one embodiment a digital processing device receives key-frames of video data from multiple video sequences of a particular event of interest in the training data. In one configuration, the multiple video sequences include videos taken from different positions and angles of a single person or object performing a single motion in an event of interest. The multiple video sequences also include recordings of multiple people or objects that perform the motion in an event of interest during multiple trials to improve the breadth and accuracy of the training data. Each trial is performed by the subject while he or she faces a different direction and at different locations in the field of view of the cameras. [0033], [0049]-[0052], [0053] In some embodiments, the event processor 104 identifies key-frames and changes of the feature descriptors for an object between key-frames using a deformable key frame model. In FIG. 3, the event processor 104 generates a score that corresponds to the likelihood that each graph each graph generates a score)
-; 1. and send, to a second computing device, the subset of the portion. / 12. and send information indicating the time of the occurrence of the event.  / 20. and sending, to a second computing device, the subset of the portion. (Naikal:[0029], The camera 108B generates and transmits feature vector data for objects that are present in the scene 112, including moving objects, and transmits the feature vector data to the processor 104. If an event of interest occurs in the scene 112, the operator of the processor 104 requests the full video data corresponding to an identified time during which the event occurs and the camera 108B retrieves the requested video from the data storage device. Thus, even though the camera 108B does not transmit full video data to the processor 104, the processor 104 optionally retrieves video data for selected events of interest in the system 100)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

Claims 1-2, 5-13, 16-21 and 24-38 are rejected under 35 U.S.C. 103 as being unpatentable over Naikal et al (US PGPub US 2014/0333775 A1, filed on May 9, 2014 with a foreign priority date of May 10, 2013), hereby referred to as “Naikal”, in view of Kim et al (US PGPub US 2015/0246891, filed on September 25, 2014 with a foreign priority date of March 5, 2014).
Consider Independent Claims 1, 12 and 20.
Naikal teaches 
-; 1. An apparatus comprising: one or more processors; and memory storing instructions that, when executed by the one or more processors, cause the apparatus to: / 12. An apparatus comprising: one or more processors; and memory storing instructions that, when executed by (Naikal: abstract, [0022]-[0026], Figure 1)
-; 1. receive an indication of an occurrence of an event within content; determine, based on the indication of the occurrence of the event within the content: / 12. determine, based on a description of a content item, a portion of the content item to be processed to determine a time of an occurrence of an event; / 20. determining, based on the indication of the occurrence of the event within the content:  (Naikal: [0022] FIG.1, [0029] The event processor 104 in the processing station 160 optionally requests full video data from one or more of the cameras 108A-108N during operation. For example, in response to identification of an even, the processor 104 requests video data from one or more of the cameras 108A-108N and the video output device 168 displays the video for an operator to review. The operator optionally generates additional requests for video from one or more of the other cameras 108A-108N. Thus, in one mode of operation a subset of the cameras 108A-108N transmit full video data to the processor 104, while other cameras only transmit the feature data and feature update data. As described above, the memory 120 in each of the cameras 108A-108N include an internal data storage device that is configured to buffer video data for a predetermined time period to enable the processor 104 to request additional video data that are stored in the camera.)
-; 1. an expected motion of objects associated with the event; and a portion, of the content, in which the event is expected to occur; / 12. conduct, based on an expected motion of objects, image analysis on the portion of the content item to detect a motion of objects associated with the event; / 20. an expected motion of objects associated with the event; and a portion, of the content, in which the event is expected to occur; (Naikal: [0029] For example, the memory 120 in the camera 108B includes a digital data storage device that holds a buffer of the previous 10 minutes of recorded video for the scene 112. The camera 108B generates and transmits feature vector data for objects that are present in the scene 112, including moving objects, and transmits the feature vector data to the processor 104. [0030], [0031] The training process includes a series of trials where a humans or other object perform motions that correspond to events of interest, and the motions are recorded as video from multiple viewing angles. A manual annotation process includes one or more annotators who select a limited number of key-frames from each of the video sequences to assist in generating a trained model for the human or object movements that occur in each event of interest. In one embodiment, the process of manual selection for key-frames during training includes an easy to use interface. )
-; 1. determine, after receiving the indication of the occurrence of the event within the content, and based on comparing the expected motion of objects with a motion of objects in the portion, a subset, of the portion, in which the event occurs; / 12. determine, based on the image analysis, information indicating the time, in the content item, of the occurrence of the event; / 20. determining, after receiving the indication of the occurrence of the event within the content, and based on comparing the expected motion of objects with a motion of objects in the portion, a subset, of the portion, in which the event occurs;  (Naikal: [0032] For example, in one embodiment a digital processing device receives key-frames of video data from multiple video sequences of a particular event of interest in the training data. In one configuration, the multiple video sequences include videos taken from different positions and angles of a single person or object performing a single motion in an event of interest. The multiple video sequences also include recordings of multiple people or objects that perform the motion in an event of interest during multiple trials to improve the breadth and accuracy of the training data. Each trial is performed by the subject while he or she faces a different direction and at different locations in the field of view of the cameras. [0033], [0049]-[0052], [0053] In some embodiments, the event processor 104 identifies key-frames and changes of the feature descriptors for an object between key-frames using a deformable key frame model. In FIG. 3, the event processor 104 generates a score that corresponds to the likelihood that each graph each graph generates a score)
-; 1. and send, to a second computing device, the subset of the portion. / 12. and send information indicating the time of the occurrence of the event.  / 20. and sending, to a second computing device, the subset of the portion. (Naikal:[0029], The camera 108B generates and transmits feature vector data for objects that are present in the scene 112, including moving objects, and transmits the feature vector data to the processor 104. If an event of interest occurs in the scene 112, the operator of the processor 104 requests the full video data corresponding to an identified time during which the event occurs and the camera 108B retrieves the requested video from the data storage device. Thus, even though the camera 108B does not transmit full video data to the processor 104, the processor 104 optionally retrieves video data for selected events of interest in the system 100)
Naikal does not teach from dependent Claims 5, 16 and 24, the limitations for “determining, for the event, an expected audio information”.
Kim teaches 
-; 1. An apparatus comprising: one or more processors; and memory storing instructions that, when executed by the one or more processors, cause the apparatus to: / 12. An apparatus comprising: one or more processors; and memory storing instructions that, when executed by the one or more processors, cause the apparatus to: / 20. One or more non-transitory computer readable media storing instructions that, when executed, cause: receiving an indication of an occurrence of an event within content; (Kim: abstract, [0010]-[0030], [0042]-[0068], Figures 1-2; A display apparatus is provided. The display apparatus includes a receiver configured to receive content and metadata including genre information of the content, a controller configured to extract the genre information from the metadata, and calculate reliability of the genre information by analyzing the content and comparing the analyzed content with the genre information, and a video processor configured to process a video of the content according to the calculated reliability)
-; 1. receive an indication of an occurrence of an event within content; determine, based on the indication of the occurrence of the event within the content: / 12. determine, based on a description of a content item, a portion of the content item to be processed to determine a time of an occurrence of an event; / 20. determining, based on the indication of the occurrence of the event within the content:  (Kim: [0042]-[0066], Figures 1-2; The receiver 100 may receive content to for displaying on the display apparatus 100 and metadata that may include, for example, genre information of the content. The controller 120 controls an overall operation of the display apparatus 100. In particular, the controller 120 may extract the genre information of the content from the received metadata, analyze the content, calculate reliability of the genre information of the content, and control the video processor 130 to process the video of the content according to a result of the calculating the reliability. To achieve this, the display apparatus 100 may store genre identification characteristic values corresponding to a plurality of content genres, and video setting values for a plurality of video modes corresponding to the plurality of content genres); 
-; 1. an expected motion of objects associated with the event; and a portion, of the content, in which the event is expected to occur; / 12. conduct, based on an expected motion of objects, image analysis on the portion of the content item to detect a motion of objects associated with the event; / 20. an expected motion of objects associated with the event; and a portion, of the Kim: [0042]-[0067], Figures 1-2; The content recited herein includes at least one of a video and an audio and is created to be replayed via the display apparatus 100, and may include various kinds of genres such as news, a drama, a commercial, a movie, a sport, a documentary, a music concert, education, a current topic, etc. The video processor 130 may process a video of the received content under the control of the controller 120. Specifically, the content received via the receiver 110 is divided into a video signal and an audio signal by a processing operation. The video processor 130 may perform various signal processing operations with respect to the video signal. The signal divider 240 may divide the received content and metadata including genre information of the content into a video signal, an audio signal, and metadata. For example, when a content is received via a broadcast signal, the broadcast signal may include EPG information including the broadcast content and genre information of the content. In this case, the signal divider 240 may divide the broadcast signal received via the receiver 210 into a video signal, an audio signal, and EPG data including the genre information of the content. The controller 120 may determine the genre of the content according to the calculated reliability, and may control the video processor 130 to process the video of the content by using the video setting value on the video mode corresponding to the determined genre of the content from among the stored video setting values on the plurality of video modes corresponding to the plurality of content genres); 
-; 1. determine, after receiving the indication of the occurrence of the event within the content, and based on comparing the expected motion of objects with a motion of objects in the portion, a subset, of the portion, in which the event occurs; / 12. determine, based on the image analysis, information indicating the time, in the content item, of the occurrence of the event; / 20. determining, after receiving the indication of the occurrence of the event within the content, (Kim: [0053]-[0068], Figures 1-2; In this case, the genre identification characteristic included in the content information may include at least one of a shot characteristic, a motion characteristic, a brightness characteristic, a color characteristic, an edge characteristic, a text characteristic, a saturation characteristic related to the video of the content, a Mel-Frequency Cepstral Coefficients (MFCC) characteristic, a periodicity characteristic, an energy characteristic, a Zero Crossing Rate (ZCR) characteristic, a pitch characteristic, and a frequency peak characteristic related to the audio of the content. The controller 120 may analyze at least one of the video and the audio of the received content and may acquire values of such genre identification characteristics. The characteristic value recited herein refers to a value regarding at least one characteristic. Therefore, the stored characteristics values of the video and audio on the plurality of content may be plural in number for each genre, and the genre identification characteristic value acquired by analyzing the content may be plural in number); 
-; 5. The apparatus of claim 1, wherein the instructions, when executed by the one or more processors, further cause the apparatus to: determine, for the event, an expected audio; and determine the subset of the portion by determining, based on audio of the portion and based on the expected audio, the subset of the portion.  / 16. The apparatus of claim 12, wherein the instructions, when executed by the one or more processors, further cause the apparatus to: determine, for the event, an expected audio; and determine the information indicating the time by determining, based on audio of the portion of the content item and based on the expected audio, the information indicating the time.  / 24. The one or more non-transitory computer readable media of claim 20, wherein the instructions, when executed, cause: determining, for the event, an expected audio; and the determining the subset by causing determining, based on (Kim: [0062]-[0099], [0117], Figures 3-4, 7; The storage 250 may store various programs and data for driving the display apparatus 200. In particular, the storage 250 may store genre identification characteristic values corresponding to a plurality of content genres, by which the controller 220 calculates reliability of the genre information of the content extracted from the metadata and calculates a probability for each of the plurality of content genres. In addition, the storage 250 may store video and audio setting values for a plurality of video and audio modes corresponding to the plurality of content genres, such that video and audio of the content are processed according to the calculated reliability, and also may store video and audio setting values for a default mode, that is, predetermined video and audio modes. Referring to FIG. 7, in response to a content and metadata including genre information of the content being received (S710), the display apparatus 100 extracts the genre information of the content from the metadata, and calculates reliability of the genre information by analyzing the content (S720). Thereafter, the display apparatus 100 processes a video of the content according to the calculated reliability (S730))
-; 1. and send, to a second computing device, the subset of the portion. / 12. and send information indicating the time of the occurrence of the event.  / 20. and sending, to a second computing device, the subset of the portion.    (Kim: [0084] The content analysis module 222 may acquire a genre identification characteristic value from the video and audio of the content transmitted from the signal divider 240. This has been described above in the explanation of the controller of FIG. 1. [0090] In this case, referring to FIG. 4, the genre information extraction module 221 extracts the genre information 'news' from the received metadata and transmits the genre information to the probability calculation module 223. The content analysis module 222 analyzes the actually received content, extracts characteristic values of the video and audio, and transmits the characteristic values to the probability calculation module 223. [0091])
It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify Naikal's method and system for event and object identification using video analytics with Kim's method and display for content based analysis of video data, as they are both directed towards the field of video analysis and processing.  The determination of obviousness is predicated upon the following findings:  One skilled in the art would have been motivated to modify Naikal in this manner in order to improve the accuracy of content-based event and object detection by leveraging additional features. Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in the manner explained above using known engineering design, interface and programming techniques, without changing a “fundamental” operating principle of Naikal, while the teaching of Kim continues to perform the same function as originally taught prior to being combined, in order to produce the repeatable and predictable result of enhancing the accuracy of object and event detection using additional features. It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question.  

Consider Claims 2 and 18.
The combination of Naikal and Kim teaches:
-; 2. The apparatus of claim 1, wherein the instructions, when executed by the one or more processors, further cause the apparatus to: detect, based on optical character recognition, a displayed time, within the content, for the occurrence of the event.  / 18. The apparatus of claim 12, wherein the instructions, when executed by the one or more processors, further cause the apparatus to: use optical character recognition to detect a displayed time, within the content  (Kim: [0087] FIG. 4 is a view illustrating an exemplary embodiment in which the probability calculation module 223 calculates reliability of genre information of a content extracted through the genre information extraction module 221. [0088] In the graph shown in the undermost portion of FIG. 4, the horizontal axis indicates a time and the vertical axis indicates reliability of genre information of a content included in metadata. That is, the graph of FIG. 4 shows the reliability of the genre information of the content calculated by the probability calculation module 223 with time; Naikal: [0029] If an event of interest occurs in the scene 112, the operator of the processor 104 requests the full video data corresponding to an identified time during which the event occurs and the camera 108B retrieves the requested video from the data storage device. Thus, even though the camera 108B does not transmit full video data to the processor 104, the processor 104 optionally retrieves video data for selected events of interest in the system 100. Naikal: [0054]-[0058], [0056] In one embodiment, the event processor 104 applies a temporal constraint to the frames, which is to say that the event processor 104 identifies that key-frames from different cameras correspond to different views of the same event when the keyframes occur within a comparatively short time period of one another. For example, in one embodiment the event processor 104 applies a temporal-weighted scale to key-frames that are generated by the other cameras to identify the likelihood that the key-frames correspond to the same portion of the same event of interest as a key-frame from the reference camera. For example, if the key-frame 410B occurs within 100 milliseconds of the key-frame 406B, then the weighted scale assigns a high probability (e.g. 90%) that the two key-frames correspond to each other, while a longer delay of 1 second has a correspondingly lower probability (e.g. 25%) that the two key-frames correspond to one another). 

Consider Claims 5 and 16 and 24. 
The combination of Naikal and Kim teaches: 
-; 5. The apparatus of claim 1, wherein the instructions, when executed by the one or more processors, further cause the apparatus to: determine, for the event, an expected audio; and determine the subset of the portion by determining, based on audio of the portion and based on the expected audio, the subset of the portion.  / 16. The apparatus of claim 12, wherein the instructions, when executed by the one or more processors, further cause the apparatus to: determine, for the event, an expected audio; and determine the information indicating the time by determining, based on audio of the portion of the content item and based on the expected audio, the information indicating the time.  / 24. The one or more non-transitory computer readable media of claim 20, wherein the instructions, when executed, cause: determining, for the event, an expected audio; and the determining the subset by causing determining, based on audio of the portion and based on the expected audio, the subset of the portion.  (Kim: [0062]-[0099], [0110]-[0120], Figures 3-4, 6-7, 8-9; In particular, the storage 250 may store genre identification characteristic values corresponding to a plurality of content genres, by which the controller 220 calculates reliability of the genre information of the content extracted from the metadata and calculates a probability for each of the plurality of content genres. In addition, the storage 250 may store video and audio setting values for a plurality of video and audio modes corresponding to the plurality of content genres, such that video and audio of the content are processed according to the calculated reliability, and also may store video and audio setting values for a default mode, that is, predetermined video and audio modes. FIG. 6 is a view illustrating different video and audio modes set for each of the plurality of content genres. Referring to FIG. 6, genre 1 corresponds to video mode 1 and audio mode 1, genre N corresponds to video mode N and audio mode N, and a commercial corresponds to video mode C and audio mode C. Referring to FIG. 7, in response to a content and metadata including genre information of the content being received (S710), the display apparatus 100 extracts the genre information of the content from the metadata, and calculates reliability of the genre information by analyzing the content (S720). Thereafter, the display apparatus 100 processes a video of the content according to the calculated reliability (S730). FIG. 8 illustrates a case in which video and audio modes are set in a related-art method, wherein EPG information 810 indicates that news will be broadcasted from 19:00 until20:00 and National Football League (NFL) will be broadcasted beginning at 20:00. However, the actually broadcasted content is news 10 which finishes after 20:00 followed by a commercial 20 and then the NFL 30 which begins broadcasting past 20:00. That is, there is a discrepancy in the EPG information 810 which is the metadata including the genre information of the content. In this case, according to the related-art method, video and audio of the content are processed according to the EPG information 810 as indicated by reference numerals 820)

Consider Claims 6 and 17 and 25. 
The combination of Naikal and Kim teaches:
-; 6. The apparatus of claim 1, wherein the instructions, when executed by the one or more processors, further cause the apparatus to: determine, for the event, one or more expected camera angles, and determine the subset of the portion by determining, based on one or more camera angles of the portion and based on the one or more expected camera angles, the subset of the portion.  / 17. The apparatus of claim 12, wherein the instructions, when executed by the one or more processors, further cause the apparatus to: determine, for the event, one or more expected camera angles; and conduct the image analysis by conducting, based on one or more camera angles of the portion of the content item and based on the one or more expected (Kim: [0062]-[0099], [0110]-[0120], Figures 3-4, 6-7, 8-9; In particular, the storage 250 may store genre identification characteristic values corresponding to a plurality of content genres, by which the controller 220 calculates reliability of the genre information of the content extracted from the metadata and calculates a probability for each of the plurality of content genres. In addition, the storage 250 may store video and audio setting values for a plurality of video and audio modes corresponding to the plurality of content genres, such that video and audio of the content are processed according to the calculated reliability, and also may store video and audio setting values for a default mode, that is, predetermined video and audio modes. FIG. 6 is a view illustrating different video and audio modes set for each of the plurality of content genres. Referring to FIG. 6, genre 1 corresponds to video mode 1 and audio mode 1, genre N corresponds to video mode N and audio mode N, and a commercial corresponds to video mode C and audio mode C. Referring to FIG. 7, in response to a content and metadata including genre information of the content being received (S710), the display apparatus 100 extracts the genre information of the content from the metadata, and calculates reliability of the genre information by analyzing the content (S720). Thereafter, the display apparatus 100 processes a video of the content according to the calculated reliability (S730); Naikal: [0029]-[0032], [0031] The training process includes a series of trials where a humans or other object perform motions that correspond to events of interest, and the motions are recorded as video from multiple viewing angles. A manual annotation process includes one or more annotators who select a limited number of key-frames from each of the video sequences to assist in generating a trained model for the human or object movements that occur in each event of interest. In one embodiment, the process of manual selection for key-frames during training includes an easy to use interface.). 

Consider Claims 7-8 and 26-27.
The combination of Naikal and Kim teaches: 
-; 7. The apparatus of claim 1, wherein the subset of the portion comprises a time portion of the content based on a metadata identifier that indicates a time, within the content, of a play in a sporting event. / 8. The apparatus of claim 1, wherein the event is a sporting event, and the expected motion of objects indicates an expected movement of players in the sporting event.  / 26. The one or more non-transitory computer readable media of claim 20, wherein the subset comprises a time portion of the content based on a metadata identifier that indicates a time, within the content, of a play in a sporting event.  / 27. The one or more non-transitory computer readable media of claim 20, wherein the event is a sporting event, and the expected motion of objects indicates an expected movement of players in the sporting event.   (Kim: [0042]-[0067], Figures 1-2; The content recited herein includes at least one of a video and an audio and is created to be replayed via the display apparatus 100, and may include various kinds of genres such as news, a drama, a commercial, a movie, a sport, a documentary, a music concert, education, a current topic, etc. The video processor 130 may process a video of the received content under the control of the controller 120. Specifically, the content received via the receiver 110 is divided into a video signal and an audio signal by a processing operation. The video processor 130 may perform various signal processing operations with respect to the video signal. The signal divider 240 may divide the received content and metadata including genre information of the content into a video signal, an audio signal, and metadata. For example, when a content is received via a broadcast signal, the broadcast signal may include EPG information including the broadcast content and genre information of the content. In this case, the signal divider 240 may divide the broadcast signal received via the receiver 210 into a video signal, an audio signal, and EPG data including the genre information of the content. The controller 120 may determine the genre of the content according to the calculated reliability, and may control the video processor 130 to process the video of the content by using the video setting value on the video mode corresponding to the determined genre of the content from among the stored video setting values on the plurality of video modes corresponding to the plurality of content genres; Naikal: [0029]-[0032], [0029] For example, the memory 120 in the camera 108B includes a digital data storage device that holds a buffer of the previous 10 minutes of recorded video for the scene 112. The camera 108B generates and transmits feature vector data for objects that are present in the scene 112, including moving objects, and transmits the feature vector data to the processor 104. If an event of interest occurs in the scene 112, the operator of the processor 104 requests the full video data corresponding to an identified time during which the event occurs and the camera 108B retrieves the requested video from the data storage device. Thus, even though the camera 108B does not transmit full video data to the processor 104, the processor 104 optionally retrieves video data for selected events of interest in the system 100.)

Consider Claims 8 and 15 and 21.
The combination of Naikal and Kim teaches: 
-; 8. (New) The method of claim 2, wherein the expected motion of objects comprises one or more video frames. / -; 15. (New) The method of claim 9, wherein the expected motion of objects comprises one or more video frames. / -; 21. (New) The method of claim 16, wherein the Kim: [0042]-[0067], Figures 1-2; The content recited herein includes at least one of a video and an audio and is created to be replayed via the display apparatus 100, and may include various kinds of genres such as news, a drama, a commercial, a movie, a sport, a documentary, a music concert, education, a current topic, etc. The video processor 130 may process a video of the received content under the control of the controller 120. Specifically, the content received via the receiver 110 is divided into a video signal and an audio signal by a processing operation. The video processor 130 may perform various signal processing operations with respect to the video signal. The signal divider 240 may divide the received content and metadata including genre information of the content into a video signal, an audio signal, and metadata. For example, when a content is received via a broadcast signal, the broadcast signal may include EPG information including the broadcast content and genre information of the content. In this case, the signal divider 240 may divide the broadcast signal received via the receiver 210 into a video signal, an audio signal, and EPG data including the genre information of the content. The controller 120 may determine the genre of the content according to the calculated reliability, and may control the video processor 130 to process the video of the content by using the video setting value on the video mode corresponding to the determined genre of the content from among the stored video setting values on the plurality of video modes corresponding to the plurality of content genres; Naikal: [0029]-[0032], [0029] For example, the memory 120 in the camera 108B includes a digital data storage device that holds a buffer of the previous 10 minutes of recorded video for the scene 112. The camera 108B generates and transmits feature vector data for objects that are present in the scene 112, including moving objects, and transmits the feature vector data to the processor 104. If an event of interest occurs in the scene 112, the operator of the processor 104 requests the full video data corresponding to an identified time during which the event occurs and the camera 108B retrieves the requested video from the data storage device. Thus, even though the camera 108B does not transmit full video data to the processor 104, the processor 104 optionally retrieves video data for selected events of interest in the system 100.)

Consider Claims 9 and 28.
The combination of Naikal and Kim teaches: 
-; 9. The apparatus of claim 1, wherein the expected motion of objects comprises one or more motion energy vectors.  / 28. The one or more non-transitory computer readable media of claim 20, wherein the expected motion of objects comprises one or more motion energy vectors. (Kim: [0053]-[0068], Figures 1-2; In this case, the genre identification characteristic included in the content information may include at least one of a shot characteristic, a motion characteristic,…. The controller 120 may analyze at least one of the video and the audio of the received content and may acquire values of such genre identification characteristics. The characteristic value recited herein refers to a value regarding at least one characteristic. Therefore, the stored characteristics values of the video and audio on the plurality of content may be plural in number for each genre, and the genre identification characteristic value acquired by analyzing the content may be plural in number; Naikal: [0054]-[0058], [0056] In one embodiment, the event processor 104 applies a temporal constraint to the frames, which is to say that the event processor 104 identifies that key-frames from different cameras correspond to different views of the same event when the keyframes occur within a comparatively short time period of one another. For example, in one embodiment the event processor 104 applies a temporal-weighted scale to key-frames that are generated by the other cameras to identify the likelihood that the key-frames correspond to the same portion of the same event of interest as a key-frame from the reference camera); 

Consider Claims 10, 19 and 29. 
The combination of Naikal and Kim teaches: 
10. The apparatus of claim 1, wherein the portion of the content comprises an entirety of the content.  / 19. The apparatus of claim 12, wherein the portion of the content item comprises the entire content item.  / 29. The one or more non-transitory computer readable media of claim 20, wherein the portion of the content comprises an entirety of the content. (Kim: [0042]-[0067], Figures 1-2; The content recited herein includes at least one of a video and an audio and is created to be replayed via the display apparatus 100, and may include various kinds of genres such as news, a drama, a commercial, a movie, a sport, a documentary, a music concert, education, a current topic, etc. The video processor 130 may process a video of the received content under the control of the controller 120. Specifically, the content received via the receiver 110 is divided into a video signal and an audio signal by a processing operation. The video processor 130 may perform various signal processing operations with respect to the video signal. The signal divider 240 may divide the received content and metadata including genre information of the content into a video signal, an audio signal, and metadata. For example, when a content is received via a broadcast signal, the broadcast signal may include EPG information including the broadcast content and genre information of the content. In this case, the signal divider 240 may divide the broadcast signal received via the receiver 210 into a video signal, an audio signal, and EPG data including the genre information of the content. The controller 120 may determine the genre of the content according to the calculated reliability, and may control the video processor 130 to process the video of the content by using the video setting value on the video mode corresponding to the determined genre of the content from among the stored video setting values on the plurality of video modes corresponding to the plurality of content genres; Naikal: [0029]-[0032], [0029] The event processor 104 in the processing station 160 optionally requests full video data from one or more of the cameras 108A-108N during operation. For example, in response to identification of an even, the processor 104 requests video data from one or more of the cameras 108A-108N and the video output device 168 displays the video for an operator to review. For example, the memory 120 in the camera 108B includes a digital data storage device that holds a buffer of the previous 10 minutes of recorded video for the scene 112. The camera 108B generates and transmits feature vector data for objects that are present in the scene 112, including moving objects, and transmits the feature vector data to the processor 104. If an event of interest occurs in the scene 112, the operator of the processor 104 requests the full video data corresponding to an identified time during which the event occurs and the camera 108B retrieves the requested video from the data storage device.)

Consider Claims 11 and 30. 
The combination of Naikal and Kim teaches: 
-; 11. The apparatus of claim 1, wherein the expected motion of objects comprises one or more video frames.  / 30. The one or more non-transitory computer readable media of claim 20, wherein the expected motion of objects comprises one or more video frames. (Kim: [0042]-[0067], Figures 1-2; The content recited herein includes at least one of a video and an audio and is created to be replayed via the display apparatus 100, and may include various kinds of genres such as news, a drama, a commercial, a movie, a sport, a documentary, a music concert, education, a current topic, etc. The video processor 130 may process a video of the received content under the control of the controller 120. Specifically, the content received via the receiver 110 is divided into a video signal and an audio signal by a processing operation. The video processor 130 may perform various signal processing operations with respect to the video signal. The signal divider 240 may divide the received content and metadata including genre information of the content into a video signal, an audio signal, and metadata. For example, when a content is received via a broadcast signal, the broadcast signal may include EPG information including the broadcast content and genre information of the content. In this case, the signal divider 240 may divide the broadcast signal received via the receiver 210 into a video signal, an audio signal, and EPG data including the genre information of the content. The controller 120 may determine the genre of the content according to the calculated reliability, and may control the video processor 130 to process the video of the content by using the video setting value on the video mode corresponding to the determined genre of the content from among the stored video setting values on the plurality of video modes corresponding to the plurality of content genres; Naikal: [0029]-[0032], [0029] For example, the memory 120 in the camera 108B includes a digital data storage device that holds a buffer of the previous 10 minutes of recorded video for the scene 112. The camera 108B generates and transmits feature vector data for objects that are present in the scene 112, including moving objects, and transmits the feature vector data to the processor 104. If an event of interest occurs in the scene 112, the operator of the processor 104 requests the full video data corresponding to an identified time during which the event occurs and the camera 108B retrieves the requested video from the data storage device. Thus, even though the camera 108B does not transmit full video data to the processor 104, the processor 104 optionally retrieves video data for selected events of interest in the system 100.)

Claims 3-4, 14-15, and 22-23 are rejected under 35 U.S.C. 103 as being unpatentable over Naikal et al (US PGPub US 2014/0333775 A1, filed on May 9, 2014 with a foreign priority date of May 10, 2013), hereby referred to as “Naikal”, in view of Kim et al (US PGPub US 2015/0246891, filed on September 25, .

Consider Claim 3, 14 and 22. 
The combination of Naikal and Kim does teach: 
3. The apparatus of claim 1, / 14. The apparatus of claim 12, / 22. The one or more non-transitory computer readable media of claim 20, wherein the portion comprises video frames and wherein the method, further comprises instructions for: determining, based on the video frames having a motion energy vector within the portion and wherein the boundary information indicates fewer than all of the video frames (Kim: [0053]-[0068], Figures 1-2; In this case, the genre identification characteristic included in the content information may include at least one of a shot characteristic, a motion characteristic,…. The controller 120 may analyze at least one of the video and the audio of the received content and may acquire values of such genre identification characteristics. The characteristic value recited herein refers to a value regarding at least one characteristic. Therefore, the stored characteristics values of the video and audio on the plurality of content may be plural in number for each genre, and the genre identification characteristic value acquired by analyzing the content may be plural in number; Naikal: [0054]-[0058], [0056] In one embodiment, the event processor 104 applies a temporal constraint to the frames, which is to say that the event processor 104 identifies that key-frames from different cameras correspond to different views of the same event when the keyframes occur within a comparatively short time period of one another. For example, in one embodiment the event processor 104 applies a temporal-weighted scale to key-frames that are generated by the other cameras to identify the likelihood that the key-frames correspond to the same portion of the same event of interest as a key-frame from the reference camera) 

Chen teaches:
-; An apparatus comprising: (Chen: abstract, [15]-[24], [30]-[33], Fig 1;) 
-; determining by a first computing device that an event occurs within a portion of content (Chen: [0030]-[0037], [0107]-[0112], Figure 1; In one aspect, the summarization system performs a number of functions that help a user to quickly assess the breadth of scenes in a video); 
-; determining, for the event, an indication of an expected motion of objects (Chen: [0038]-[0042], [0052]-[0056], Figures 1-2; Motion-based features are extracted (stage 108) for each frame of the videos using point-based tracking algorithms. Candidate point locations to track are chosen in one frame based on measures of local image contrast according to widely known methods. Candidate points are chosen subject to an overall limit on the number of points returned, a threshold on the suitability of the point for tracking, and a minimum required distance between points. Inter-frame tracking is performed and for each point that is successfully tracked, the vector corresponding to its displacement between frames is computed. The result of the motion tracking is a collection of motion vectors.); 
-; analyzing audio energy in the video program to identify video frames in which the event occurs (Chen: [0043]-[0044], [0057]-[0062], Figures 1 and 3; To extract the audio features (stage 110), in one aspect of the invention, short-term audio energy is measured for each video. The original audio is extracted and converted to a monophonic, 16 kHz pcm signal, high-pass filtered at 4 kHz with a 3rd order Butterworth filter, and the log energy of the filtered audio is computed on 5 ms windows. This representation is used later to help identify audible occurrences of clapboards, although one skilled in the art will appreciate that a variety of sounds can be classified); 
-; 3. The apparatus of claim 1, wherein the portion comprises video frames, and wherein the instructions, when executed by the one or more processors, further cause the apparatus to: / 14. The apparatus of claim 12, wherein the portion comprises video frames, and wherein the instructions, when executed by the one or more processors, further cause the apparatus to: / 22. The one or more non-transitory computer readable media of claim 20, wherein the portion comprises video frames, and wherein the instructions, when executed, cause: (Chen: [0045]-[0056], [0063]-[0066], [0111], Figures 1-2;  Once the features have been extracted, the video is separated into segments based upon the plurality of features. Segmentation based on color (stage 112) and segmentation based on motion (stage 114) are carried out separately. The color-based segmentation identifies boundaries where there are strong changes in the color distribution of the video frames. The motion-based segmentation identifies segments where motion characteristic of pans or zooms occur, and also color bars, which usually have very little motion.)
-; 3. detect, based on the video frames having motion energy satisfying a threshold energy level, / 14. conduct the image analysis by conducting, based on video frames having motion energy satisfying a threshold energy level, / 22. detecting, based on the video frames having motion energy satisfying a threshold energy level (Chen: [0052]-[0056], [0084]-[0090], Figures 1-2; A threshold representing the minimum amount of motion required for a pan or zoom to occur is used to identify candidate pans and zooms. For each region where the motion value is greater than a threshold, the endpoints of the region are identified as the first locations forward and backward from the high motion region that are less than a selected threshold. The threshold is computed as the running average within a window of 2000 frames. The use of a running average helps in cases where the camera is more shaky than usual. The motion segments are scored as the trimmed average of the absolute value of the amount of motion in the segment. The magnitude of global motion in the x and y direction is computed, and a threshold on the number of motion points is used to identify candidate color bar segments. Thresholds on the peak motion values and the average motion value in each segment are used to remove segments with too much global motion. In the clustering operation, a number of clusters can be defined using a threshold. In this embodiment, a semi-adaptive threshold was used. When there are more than just a few segments (defined as more than 20 in our preferred embodiment), it was noted that some of the segments are usually from the same shot or are a repetition of a shot. The distance between these redundant segments is relatively small, and in a cluster tree, these shots have the lowest height. The height at which each segment is added to the agglomerative tree is put into a list that is sorted. The smallest distances generally correspond to segments from the same shot. Ideally, it would be nice to identify the knee of the sorted list. A cheap approximation to this was used by multiplying the height at the 25th percentile by 1.5. This height is used as the threshold for identifying clusters of similar shots. When there are just a few segments (defined as 20 or less), a fixed threshold is used to define the clusters, since there are too few segments to reliably estimate the distance at which the similar segments are found.)  
 (Chen: [0052]-[0056], [0084]-[0090], Figures 1-2; A threshold representing the minimum amount of motion required for a pan or zoom to occur is used to identify candidate pans and zooms. For each region where the motion value is greater than a threshold, the endpoints of the region are identified as the first locations forward and backward from the high motion region that are less than a selected threshold. The threshold is computed as the running average within a window of 2000 frames. The use of a running average helps in cases where the camera is more shaky than usual. The motion segments are scored as the trimmed average of the absolute value of the amount of motion in the segment. The magnitude of global motion in the x and y direction is computed, and a threshold on the number of motion points is used to identify candidate color bar segments. Thresholds on the peak motion values and the average motion value in each segment are used to remove segments with too much global motion. In the clustering operation, a number of clusters can be defined using a threshold. In this embodiment, a semi-adaptive threshold was used. When there are more than just a few segments (defined as more than 20 in our preferred embodiment), it was noted that some of the segments are usually from the same shot or are a repetition of a shot. The distance between these redundant segments is relatively small, and in a cluster tree, these shots have the lowest height. The height at which each segment is added to the agglomerative tree is put into a list that is sorted. The smallest distances generally correspond to segments from the same shot. Ideally, it would be nice to identify the knee of the sorted list. A cheap approximation to this was used by multiplying the height at the 25th percentile by 1.5. This height is used as the threshold for identifying clusters of similar shots. When there are just a few segments (defined as 20 or less), a fixed threshold is used to define the clusters, since there are too few segments to reliably estimate the distance at which the similar segments are found.)  
-; 3. and wherein the subset of the portion comprises fewer than all of the video frames.  / 22. and wherein the subset of the portion comprises fewer than all of the video frames.  (Chen: [0030]-[0037], [0107]-[0112], Figure 1; In one aspect, the summarization system performs a number of functions that help a user to quickly assess the breadth of scenes in a video. The system removes redundant shots, allows weighting of scenes with and without camera motion, provides selection of summary shot segments based on amount of motion within a scene, and presentation of metadata with the shot to provide the context of the shots. The dynamic and static segments are clustered separately to identify redundancies (stage 120, stage 122, respectively). In one aspect, audio features are used to identify the "clap" sound of a clapboard (stage 116), and segments in which an audio clap is identified are processed to remove the clapboard during summary segment selection (stage 124). Also during summary segment selection, if the summary is too short, a default summary mode is used instead. Finally, the summary segments are ordered (stage 126) and used to create a summary video (stage 128) with metadata to help the user better understand the summary)
It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to improve the combination of Naikal and Kim display content-based image processing with the algorithm taught by Chen for content-based video analysis and summarization. The determination of obviousness is predicated upon the following (Chen: [0014]).  It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question.

Consider Claims 4, 15 and 23. 
Claim 4 is rejected for the same reason as Claim 3 as presented above. 
Claim 15 is rejected for the same reason as Claim 14 as presented above. 
Claim 23 is rejected for the same reason as Claim 22 as presented above. 
The combination of Naikal, Kim and Chen teaches:
-; 4. The apparatus of claim 1, wherein the instructions, when executed by the one or more processors, further cause the apparatus to: determine, for the portion, a first motion energy vector; determine, for another portion of the content, a second motion energy vector; and determine the subset of the portion by comparing the first motion energy vector with the second motion energy vector.  / 15. The apparatus of claim 12, (Kim: [0053]-[0068], Figures 1-2; In this case, the genre identification characteristic included in the content information may include at least one of a shot characteristic, a motion characteristic, a brightness characteristic, a color characteristic, an edge characteristic, a text characteristic, a saturation characteristic related to the video of the content, a Mel-Frequency Cepstral Coefficients (MFCC) characteristic, a periodicity characteristic, an energy characteristic, a Zero Crossing Rate (ZCR) characteristic, a pitch characteristic, and a frequency peak characteristic related to the audio of the content. The controller 120 may analyze at least one of the video and the audio of the received content and may acquire values of such genre identification characteristics. The characteristic value recited herein refers to a value regarding at least one characteristic. Therefore, the stored characteristics values of the video and audio on the plurality of content may be plural in number for each genre, and the genre identification characteristic value acquired by analyzing the content may be plural in number; Chen: [0038]-[0042], [0052]-[0056], [0084]-[0090], Figures 1-2; Motion-based features are extracted (stage 108) for each frame of the videos using point-based tracking algorithms with thresholding operations are used. One skilled in the art will appreciate that other image processing systems, including analysis of motion energy, can also be implemented. A threshold representing the minimum amount of motion required for a pan or zoom to occur is used to identify candidate pans and zooms. For each region where the motion value is greater than a threshold, the endpoints of the region are identified as the first locations forward and backward from the high motion region that are less than a selected threshold. The threshold is computed as the running average within a window of 2000 frames. The use of a running average helps in cases where the camera is more shaky than usual. The motion segments are scored as the trimmed average of the absolute value of the amount of motion in the segment. The magnitude of global motion in the x and y direction is computed, and a threshold on the number of motion points is used to identify candidate color bar segments. Thresholds on the peak motion values and the average motion value in each segment are used to remove segments with too much global motion. In the clustering operation, a number of clusters can be defined using a threshold. In this embodiment, a semi-adaptive threshold was used. When there are more than just a few segments (defined as more than 20 in our preferred embodiment), it was noted that some of the segments are usually from the same shot or are a repetition of a shot. The distance between these redundant segments is relatively small, and in a cluster tree, these shots have the lowest height. The height at which each segment is added to the agglomerative tree is put into a list that is sorted. The smallest distances generally correspond to segments from the same shot. Ideally, it would be nice to identify the knee of the sorted list. A cheap approximation to this was used by multiplying the height at the 25th percentile by 1.5. This height is used as the threshold for identifying clusters of similar shots. When there are just a few segments (defined as 20 or less), a fixed threshold is used to define the clusters, since there are too few segments to reliably estimate the distance at which the similar segments are found. Each segment corresponds to a set of image frames). 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TAHMINA ANSARI whose telephone number is 571-270-3379.  The examiner can normally be reached on IFP Flex - Monday through Friday 9 to 5.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SUMATI LEFKOWITZ can be reached on 571-272-3638.  The fax phone numbers for the organization where this application or proceeding is assigned are 571-273-8300 for regular communications and 571-273-8300 for After Final communications. TC 2600’s customer service number is 571-272-2600.
Any inquiry of a general nature or relating to the status of this application or proceeding should be directed to the receptionist whose telephone number is 571-272-2600.




2662
/Tahmina Ansari/


/TAHMINA N ANSARI/Primary Examiner, Art Unit 2662