Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claims 2-3, 10, and 12 are objected to because of the following informalities:  
Claim 2 recites in line 4 “comprises the basic units” and should be replaced with “comprises that the basic units”.
Claim 3 recites in lines 3 and 5 "one or more of” which appears to be redundant with “subset of” and should be removed.
Claim 10 recites in line 6 “each have”, which appears to be a typo and should be replaced with “each has”. 
Claim 12 recites in line 1 “method of claim 11”, which appears to be a typo and should be replaced with “system of claim 11”. 
Appropriate correction is required.
CLAIM INTERPRETATION
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “computation module” in claim 11 and “hierarchical classification manager module” in claim 19.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(B)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:

The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. 

Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claims 1, 11 and 20 recite "the basic units".  There is insufficient antecedent basis for this limitation in the claims since it is unclear which basic units in “a set of successive basic units” “the basic units” are referring to. One suggestion is to replace “a set of successive basic units” with “basic units, wherein the basic units are successive”. For the rest of this office action, examiner will interpret “the basic units” as to refer to the “set of successive basic units”.
All other claims depend on claims 1 and 11 and therefore are rejected on the same ground as claims 1 and 11.
Claim 2 recites “the basic units of the first subset”, “the first subset of the basic units”, “the basic units of the second subset”, and “the second subset of the basic units”. It is unclear what these limitations means since it is unclear what the difference is between “the basic units of the first subset” and “the first subset of the basic units” or between “the basic units of the second subset” and “the second subset of the basic units”. For the rest of this office action, examiner will interpret “the basic units of the first subset” as “the first subset of the basic units”, and interpret “the basic units of the second subset” as “the second subset of the basic units”.
Claim 12 has similar issues as those discussed above regarding claim 2.
Claim 7 recites “(i) the candidate semantic label generated by all of the multiple classifier models having the highest overall associated confidence value; (ii) the candidate semantic label generated by all of the multiple classifier models having the highest average associated confidence value; (iii) the candidate semantic label generated by all of the multiple classifier models having the highest overall associated confidence value at a majority of the multiple classifier models.” However, it is unclear what these limitations mean. For example, it is unclear what it means by “the candidate semantic label generated by all of the multiple classifier models”. Does each of “the multiple classifier models” generate a same label or are some candidate semantic labels generated only by some of “the multiple classifier models” but not by others? It is also unclear what the difference is between “the highest overall associated confidence value” and “the highest average associated confidence value”. It is further unclear what “highest” means since it is unclear what other candidate labels are used for comparison to obtain the label with the “highest” score. In addition, the term “majority” is a relative term which renders the claim indefinite. The term “majority” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. For example, is 55% a majority, or 70% a majority, or 95% a majority? What if there is a different label for each of these majorities (i.e., 55%, 70%, 95%)? And there could be many more of such percentage numbers (65%, 80%, etc.).
Claim 16 has similar issues as those discussed above regarding claim 7.
References Cited in Prior Art Rejections 
The following references are cited in the prior art rejections set forth below and are referred to as noted:
Pappu et al., US 20200204879 A1, published on 2020-06-25, hereinafter Pappu.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 11 and 20 are rejected under 35 U.S.C. 102 as being anticipated by Pappu.
Regarding claim 1, Pappu discloses a computer implemented method of analyzing a video, (Pappu: Fig. 5) comprising: 
dividing the video into a set of successive basic units; (Pappu: 520 of Fig. 5. [0052]. Basic unit = shot.)
generating semantic tags for the basic units using a set of classifier nodes; (Pappu: 530-550 of Fig. 5. [0052-0053]. The claimed “tags” are interpreted as the annotations of the “one or more keyframes for each shot.” The claimed “set of classifier nodes” are interpreted as the nodes of the “classifier with a vocabulary of classes or concepts that include objects, scenes, and/or other visual descriptors (e.g., colors, shapes, etc.)” ([0053]).) and 
generating a semantic topic for the video based on the semantic tags generated for the basic units. (Pappu: 560-570 of Fig. 5. [0054-0055]. “[0054] …The retained set of annotations may represent the video (e.g., images from video modality 120) as words in a piece of text.” “[0055] …The linked annotations may represent the extracted features of video modality 120 from which one or more tags may be generated when labeling video content 110.” The claimed “semantic topic for the video” is interpreted as the “retained set of annotations” or the “one or more tags” “generated when labeling video content 110.”)
Claims 11 and 20 are the apparatus and computer readable medium (Pappu: Fig. 6 and [0069]) claims, respectively, corresponding to the method claim 1. Therefore, since claims 11 and 20 are similar in scope to claim 1, claims 11 and 20 are rejected on the same grounds as claim 1.
Allowable Subject Matter
While claims 2-10 and 12-19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, they are neither anticipated by nor obvious in view of the prior art of record.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Piacenza et al., "Generating story variants with constrained video recombination." In Proceedings of the 19th ACM international conference on Multimedia, pp. 223-232. 2011.

    PNG
    media_image1.png
    613
    740
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    830
    485
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    547
    1488
    media_image3.png
    Greyscale

Hoogs et al., "Video content annotation using visual analysis and a large semantic knowledgebase," 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., 2003, pp. II-II, doi: 10.1109/CVPR.2003.1211487.

    PNG
    media_image4.png
    621
    755
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    457
    730
    media_image5.png
    Greyscale

Xu et al., "Discovery of Shared Semantic Spaces for Multiscene Video Query and Summarization," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 6, pp. 1353-1367, June 2017, doi: 10.1109/TCSVT.2016.2532719.

    PNG
    media_image6.png
    357
    1459
    media_image6.png
    Greyscale

Soltanian et al., "Spatio-temporal VLAD encoding of visual events using temporal ordering of the mid-level deep semantics." IEEE Transactions on Multimedia 22, no. 7 (2019): 1769-1784.

    PNG
    media_image7.png
    740
    599
    media_image7.png
    Greyscale

Imran et al., "Semantic Tags for Lecture Videos," 2012 IEEE Sixth International Conference on Semantic Computing, 2012, pp. 117-120, doi: 10.1109/ICSC.2012.36.

    PNG
    media_image8.png
    374
    743
    media_image8.png
    Greyscale


    PNG
    media_image9.png
    440
    680
    media_image9.png
    Greyscale

Chen et al., "A Novel Video Summarization Based on Mining the Story-Structure and Semantic Relations Among Concept Entities," in IEEE Transactions on Multimedia, vol. 11, no. 2, pp. 295-312, Feb. 2009, doi: 10.1109/TMM.2008.2009703.

    PNG
    media_image10.png
    480
    1216
    media_image10.png
    Greyscale

Yin et al., "Encoded Semantic Tree for Automatic User Profiling Applied to Personalized Video Summarization," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 1, pp. 181-192, Jan. 2018, doi: 10.1109/TCSVT.2016.2602832.

    PNG
    media_image11.png
    700
    752
    media_image11.png
    Greyscale


    PNG
    media_image12.png
    490
    1161
    media_image12.png
    Greyscale


    PNG
    media_image13.png
    381
    698
    media_image13.png
    Greyscale

Shetty et al. (US 20160070962 A1): A computer-implemented method for selecting representative frames for videos is provided. The method includes receiving a video and identifying a set of features for each of the frames of the video. The features including frame-based features and semantic features. The semantic features identifying likelihoods of semantic concepts being present as content in the frames of the video. A set of video segments for the video is subsequently generated. Each video segment includes a chronological subset of frames from the video and each frame is associated with at least one of the semantic features. The method generates a score for each frame of the subset of frames for each video segment based at least on the semantic features, and selecting a representative frame for each video segment based on the scores of the frames in the video segment. The representative frame represents and summarizes the video segment. (abstract)

    PNG
    media_image14.png
    712
    461
    media_image14.png
    Greyscale


    PNG
    media_image15.png
    729
    288
    media_image15.png
    Greyscale
 
    PNG
    media_image16.png
    479
    240
    media_image16.png
    Greyscale

Sureshkumar et al. (US 20210117685 A1): Embodiments described herein provide a system for localized contextual video annotation. During operation, the system can segment a video into a plurality of segments based on a segmentation unit and parse a respective segment for generating multiple input modalities for the segment. A respective input modality can indicate a form of content in the segment. The system can then classify the segment into a set of semantic classes based on the input modalities and determine an annotation for the segment based on the set of semantic classes. (abstract)

    PNG
    media_image17.png
    429
    609
    media_image17.png
    Greyscale

Adami et al. (US 20140136186 A1): A computer implemented method and system for generating an alternative audible, visual and/or textual data based upon an original audible, visual and/or textual data comprising the step of inputting to a processor original audible, visual and/or textual data having an original plot, extracting a plurality of basic segments from the original audible, visual and/or textual data, defining a vocabulary of intermediate-level semantic concepts based on the plurality of basic segments and/or the original plot, inputting to the processor at least an alternative plot based upon the original plot, modifying the alternative plot in terms of the vocabulary of intermediate-level semantic concepts for generating a modified alternative plot, and modifying the plurality of basic segments of the original audible, visual and/or textual data in terms of said vocabulary of intermediate-level semantic concepts for generating a modified plurality of basic segments. (abstract)

    PNG
    media_image18.png
    628
    489
    media_image18.png
    Greyscale


Xu et al. (US 20070201558 A1): A shot-based video content analysis method and system is described for providing automatic recognition of logical story units (LSUs). The method employs vector quantization (VQ) to represent the visual content of a shot, following which a shot clustering algorithm is employed together with automatic determination of merging and splitting events. The method provides an automated way of performing the time-consuming and laborious process of organizing and indexing increasingly large video databases such that they can be easily browsed and searched using natural query structures. (abstract)

    PNG
    media_image19.png
    442
    702
    media_image19.png
    Greyscale

Shin et al. (US 20220076023 A1): Embodiments are directed to segmentation and hierarchical clustering of video. In an example implementation, a video is ingested to generate a multi-level hierarchical segmentation of the video. In some embodiments, the finest level identifies a smallest interaction unit of the video—semantically defined video segments of unequal duration called clip atoms. Clip atom boundaries are detected in various ways. For example, speech boundaries are detected from audio of the video, and scene boundaries are detected from video frames of the video. The detected boundaries are used to define the clip atoms, which are hierarchically clustered to form a multi-level hierarchical representation of the video. In some cases, the hierarchical segmentation identifies a static, pre-computed, hierarchical set of video segments, where each level of the hierarchical segmentation identifies a complete set (i.e., covering the entire range of the video) of disjoint (i.e., non-overlapping) video segments with a corresponding level of granularity. (abstract)

    PNG
    media_image20.png
    455
    671
    media_image20.png
    Greyscale


Stanton et al. (US 20160364419 A1): Indexing data is disclosed. An image and a text data associated with a dataset are received. A tag is generated using one or more hierarchical classifiers. The image and the text data are input into at least one of the one or more hierarchical classifiers. A search index is generated based at least on the generated tag. (abstract)

    PNG
    media_image21.png
    420
    362
    media_image21.png
    Greyscale
 
    PNG
    media_image22.png
    546
    365
    media_image22.png
    Greyscale

Swaminathan et al. (US 11314970 B1): A video summarization system generates a concatenated feature set by combining a feature set of a candidate video shot and a summarization feature set. Based on the concatenated feature set, the video summarization system calculates multiple action options of a reward function included in a trained reinforcement learning module. The video summarization system determines a reward outcome included in the multiple action options. The video summarization system modifies the summarization feature set to include the feature set of the candidate video shot by applying a particular modification indicated by the reward outcome. The video summarization system identifies video frames associated with the modified summarization feature set, and generates a summary video based on the identified video frames. (abstract)

    PNG
    media_image23.png
    478
    764
    media_image23.png
    Greyscale

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FENG NIU whose telephone number is (571)272-9592.  The examiner can normally be reached on Monday - Friday, 8am-5pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached on (571) 272-7409.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/FENG NIU/Primary Examiner, Art Unit 2669