Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. IN2020410116993, filed on 04/20/2020.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 1, 2, 8-11, and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Carlson (US20170083770), and in view of Guo (US 10943126).
Regarding claims 1 and 13, Carlson discloses the aspects of the system for automatically detecting and marking logical scenes in media content, the system comprising ([0013] A minimum cut algorithm is utilized to segment the graph representation to detect the scenes of the video): 
at least one processor(Fig 7, ref 802); 
a non-transitory, computer-readable storage medium operably and communicatively coupled to the at least one processor and configured to store the media content and computer program instructions executable by the at least one processor ([0070]  As would be apparent to one of ordinary skill in the art, the memory component can include many types of memory, data storage, or non-transitory computer-readable storage media, such as a first data storage for program instructions for execution by the processor 802, a separate storage for images or data, a removable memory for sharing information with other devices, etc.); 
and a key detection engine configured to define the computer program instructions, wherein the computer program instructions executed by the at least one processor to cause the at least one processor to ([0058] to optimize scene segmentation, dynamic programming is used to compute the cost of all possible cuts through the graph. The particular implementation used is Dijkstra's algorithm): 
extract multiple color features from the middle frame to generate an image similarity matrix ([0031] one or more key frames are extracted for each detected shot by a key frame selection module 214. Features used for key frame extraction include colors, edges, shapes, optical flow, MPEG-7 motion descriptors (e.g., temporal motion intensity, spatial distribution of motion), MPEG discrete cosine coefficient and motion vectors, among others. AND [0032] An example of a rule-based method is to apply a similarity metric to group similar shots within a predefined time interval to segment scenes.);
extract multiple audio features from audio content of each of the plurality of shots to generate an audio similarity matrix ([0063] the cost function can be further based on audio similarity and/or textual similarity); and
generate a resultant similarity matrix based on the image similarity matrix and the audio similarity matrix ([0022] visual-based features and audio features can be combined for video segmentation AND [0061] a pattern corresponding to the similarity matrix 500 of FIG. 5 is generated and similarity is computed at every valid time point for a discrete set of possible dissolve lengths. The generated pattern is used to slide along the diagonal of the similarity matrix of the shot to determine whether there is a match of the generated pattern and the similarity matrix of the shot.).
Carlson does not disclose extract a middle frame of each of a plurality of shorts, wherein the plurality of shorts are accessed from the media content.
In a similar field of endeavor of video stream processes, Guo teaches extract a middle frame of each of a plurality of shorts, wherein the plurality of shorts are accessed from the media content (Col 13, lines 15-20: The predetermined priority-based extraction rule can be, for example, performing extraction starting from a frame in the middle to frames on both sides.).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to combine Carlson disclosure with Guo’s teaching to utilize convolutional neural networks to provide more effective video stream processing. 
Regarding claim 2, Carlson discloses wherein the resultant similarity matrix is a combination of the image similarity matrix and the audio similarity matrix ([0022] visual-based features and audio features can be combined for video segmentation AND [0061] a pattern corresponding to the similarity matrix 500 of FIG. 5 is generated and similarity is computed at every valid time point for a discrete set of possible dissolve lengths. The generated pattern is used to slide along the diagonal of the similarity matrix of the shot to determine whether there is a match of the generated pattern and the similarity matrix of the shot).
Regarding claims 8, Carlson discloses the plurality of shots includes a plurality of scenes (Fig 2, shot detection with multiple shots and scene detection with multiple scenes), and the computer program instructions further cause the at least one processor to execute a linear traversal algorithm ([0031] Various techniques can be utilized for key frame selection, including sequential comparison-based, global comparison-based, reference frame-based, clustering-based, curve simplification-based, and object-based algorithms. Sequential comparison-based algorithms sequentially compare frames successive to a previously selected key frame to determine whether the successive frame is different to the previously selected key frame by some threshold (since a linear traversal algorithm visits each frame until it finds the match, a sequential comparison-based algorithm does the same)) on each of the plurality of shots to define boundaries of the plurality of scenes ([0024] Once features of the video have been extracted, those features are analyzed to determine boundaries of shots 208 by a shot detection module 206. Various similarity metrics can be utilized by a shot detection module, such as the L-norm cosine similarity, the Euclidean distance, the histogram intersection, the chi-squared similarity, the earth mover's distance, among others. In an embodiment, respective feature vectors of a pair of adjacent frames are compared using cosine similarity. When the respective feature vectors are L2 normalized, the cosine similarity is simply a dot product of the two vectors).
Regarding claims 9, Carlson discloses wherein the computer program instructions further cause the at least one processor to correct errors of the plurality of scenes ([0027] An example of a boosting algorithm is Adaptive boosting or AdaBoost, which is a machine learning boosting algorithm which finds a highly accurate classifier (i.e., low error rate) from a combination of many “weak” classifiers (i.e., substantial error rate). Given a data set comprising examples that are within a class and not within the class and weights based on the difficulty of classifying an example and a weak set of classifiers, AdaBoost generates and calls a new weak classifier in each of a series of rounds. For each call, the distribution of weights is updated that indicates the importance of examples in the data set for the classification. On each round, the weights of each incorrectly classified example are increased, and the weights of each correctly classified example is decreased so the new classifier focuses on the difficult examples (i.e., those examples have not been correctly classified)) based on the linear traversal algorithm ([0031] Various techniques can be utilized for key frame selection, including sequential comparison-based, global comparison-based, reference frame-based, clustering-based, curve simplification-based, and object-based algorithms. Sequential comparison-based algorithms sequentially compare frames successive to a previously selected key frame to determine whether the successive frame is different to the previously selected key frame by some threshold (since a linear traversal algorithm visits each frame until it finds the match, a sequential comparison-based algorithm does the same).
Regarding claim 10, Carlson discloses wherein a threshold of a number of shots of the plurality of shots is similar between boundaries of two consecutive scenes of the plurality of scenes ([0060] a threshold-based shot detection algorithm is utilized to detect boundaries of the first set of shots. Measured pair-wise similarities between adjacent frames are compared to a predefined threshold. When the similarity metric is less than the threshold, a boundary or hard cut is identified).
Regarding claim 11, Carlson discloses wherein the boundaries of the two consecutive scenes of the plurality of scenes are considered as a part of a same scene based on the threshold of the number of shots of the plurality of shots ([0032] The action matching rule dictates that motion in the same direction in two consecutive shots belong to a same scene. The film tempo rule is based on the premise that the number of shots, sound, and the motion within shots depict the rhythm of a scene, and the rule requires that the rhythm not change within a scene. The shot/reverse shot rule determines that alternating shots belong to a same scene).
Claim(s) 3-5, 14-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Carlson (US20170083770), in view of Guo (US 10943126) and further in view of Kumar (US 20130235275).
Regarding claims 3 and 14, Carlson does not disclose wherein the computer program instructions further cause the at least one processor to reduce noise in the resultant similarity matrix to generate an output with reduced noise.
In a similar field of endeavor of scene boundary detection, Kumar teaches wherein the computer program instructions further cause the at least one processor to reduce noise in the resultant similarity matrix to generate an output with reduced noise ([0068] Due to the noise robustness property of sparse solvers, the reconstructed video segment set 603 is robust to noise. In other words, denoising is automatically achieved during the video reconstruction process).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to combine Carlson and Guo’s disclosure with Kumar’s teaching to create a video representation framework that is data adaptive, robust to noise and different content, and can be applied to wide varieties of videos including reconstruction, denoising, and semantic understanding.
Regarding claims 4 and 17, Carlson discloses wherein the computer program instructions further cause the at least one processor to generate a sequence of the plurality of shots based on the output ([0016]A scene comprises a series of consecutive shots grouped together because, for example, they are captured in the same location or they share thematic content. Scenes are analogous to chapters of a book. A shot can be a sequence of frames recorded contiguously and representing a continuous action in time or space. A shot can also be an unbroken sequence of frames captured by a single camera. ), wherein the sequence of the plurality of shots define a boundary of each of the plurality of shots of the media content ([0024] Once features of the video have been extracted, those features are analyzed to determine boundaries of shots 208 by a shot detection module 206).
Regarding claims 5 and 18, Carlson discloses wherein the computer program instructions further cause the at least one processor to reshuffle the plurality of shots based on the sequence of the plurality of shots ([0016]  A shot can be a sequence of frames recorded contiguously and representing a continuous action in time or space. A shot can also be an unbroken sequence of frames captured by a single camera.).
Regarding claim 15, Carlson and Guo do not disclose wherein the noise reduction process controls overlapping of scene boundaries of two consecutives scenes of the plurality of shots.
In a similar field of endeavor of scene boundary detection, Kumar teaches wherein the noise reduction process controls overlapping of scene boundaries of two consecutives scenes of the plurality of shots ([0068] Due to the noise robustness property of sparse solvers, the reconstructed video segment set 603 is robust to noise. In other words, denoising is automatically achieved during the video reconstruction process. AND [0075] the set of digital video sections 1003 are defined such that consecutive digital video sections 1003 overlap slightly in order to avoid missing scene boundaries that happen to occur at the end of a digital video section 1003).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to combine Carlson and Guo’s disclosure with Kumar’s teaching to create a video representation framework that is data adaptive, robust to noise and different content, and can be applied to wide varieties of videos including reconstruction, denoising, and semantic understanding.
Regarding claim 16, Carlson and Guo do not disclose wherein the noise reduction process further controls a video discontinuity in a scene of a plurality of scenes of the plurality of shots.
In a similar field of endeavor of scene boundary detection, Kumar teaches wherein the noise reduction process further controls a video discontinuity in a scene of a plurality of scenes of the plurality of shots ([0068] Due to the noise robustness property of sparse solvers, the reconstructed video segment set 603 is robust to noise. In other words, denoising is automatically achieved during the video reconstruction process. AND [0075] the set of digital video sections 1003 are defined such that consecutive digital video sections 1003 overlap slightly in order to avoid missing scene boundaries that happen to occur at the end of a digital video section 1003).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to combine Carlson and Guo’s disclosure with Kumar’s teaching to create a video representation framework that is data adaptive, robust to noise and different content, and can be applied to wide varieties of videos including reconstruction, denoising, and semantic understanding.
Claim(s) 6, 7, 19 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Carlson (US20170083770), in view of Guo (US 10943126) and further in view of Waldo (US 10140515).
Regarding claim 6 and 19, Carlson and Guo do not disclose wherein the computer program instructions further cause the at least one processor to execute an affinity propagation clustering on the resultant similarity matrix.
In a similar field of endeavor of image recognition and classification, Waldo teaches wherein the computer program instructions further cause the at least one processor to execute an affinity propagation clustering on the resultant similarity matrix (Col 13, lines 60-67: Thereafter, the remaining patches can be used in the bottom-up merge approach. In another example, a clustering approach determines the distance between patches and applies affinity propagation clustering to generate patches that are then centroids of the pair).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to combine Carlson and Guo’s disclosure with Waldo’s teaching to classify images with music that correlates with the image/video for improving automation and efficiency of music and audio playback. 
Regarding claims 7 and 20, Carlson discloses wherein the computer program instructions further cause the at least one processor to perform an automatic clustering of the plurality of shots ([0033] Graph-based algorithms generally cluster shots based on similarity (and temporal proximity) to generate a graph representation for a video. Nodes of the graph represent shots or clusters of shots and edges indicate similarity and/or temporal proximity between the connected nodes). Carlson and Guo do not disclose based on the affinity propagation clustering of the multiple color features and the multiple audio features. 
In a similar field of endeavor of image recognition and classification, Waldo teaches based on the affinity propagation clustering of the multiple color features and the multiple audio features (Fig 7, ref 704 (image descriptors are descriptions of images which includes colors) and ref 708 (playlist of songs is the audio captured)).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to combine Carlson and Guo’s disclosure with Waldo’s teaching to classify images with music that correlates with the image/video. 
Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Carlson (US20170083770), in view of Guo (US 10943126) and further in view of Azermai (EP 3340069 A1).
Regarding claim 12, Carlson discloses based on the boundaries of the two consecutive scenes of the plurality of scenes considered as the part of the same scene ([0032] The action matching rule dictates that motion in the same direction in two consecutive shots belong to a same scene. The film tempo rule is based on the premise that the number of shots, sound, and the motion within shots depict the rhythm of a scene, and the rule requires that the rhythm not change within a scene. The shot/reverse shot rule determines that alternating shots belong to a same scene). Carlson does not disclose wherein the computer program instructions further cause the at least one processor to merge the two consecutive scenes into a single scene.
In a similar field of endeavor of script narrative classification, Azermai teaches wherein the computer program instructions further cause the at least one processor to merge the two consecutive scenes into a single scene ([0048] In the next step 602, small scenes are merged into bigger scenes, for example by merging a scene with less words than a certain threshold with the next or previous scene.).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to combine Carlson and Guo’s disclosure with Azermai’s teaching to have a fully automated flow for predicting the success of a scripted narrative.
Conclusion
Any inquiry concerning this communication or earlier communications from the examinershould be directed to AHMED A NASHER whose telephone number is (571)272-1885. The examiner cannormally be reached Mon - Fri 0800 - 1700.
Examiner interviews are available via telephone, in-person, and video conferencing using aUSPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to usethe USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor,Emily Terrell can be reached on (571) 270-3717. The fax phone number for the organization where thisapplication or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained fromPatent Center. Unpublished application information in Patent Center is available to registered users. Tofile and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visithttps://www.uspto.gov/patents/apply/patent-center for more information about Patent Center andhttps://www.uspto.gov/patents/docx for information about filing in DOCX format. For additionalquestions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would likeassistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or571-272-1000.
/AHMED A NASHER/Examiner, Art Unit 2666
/EMILY C TERRELL/Supervisory Patent Examiner, Art Unit 2666