DETAILED ACTION
In response to communication filed 07 May 2020, this is first Office Action of the merits. Claims 1-20 are pending.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
Claim 8 recites “computer-readable storage media” has been interpreted as --non-transitory computer-readable storage media-- based on [0095] “A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire” in the current specification. 

Claim Objections
Claims 1 and 5-7 are objected to because of the following informalities:  
Claim 1 recites “providing, by one or more computer processors” should read as -- providing, by the one or more computer processors--, “receiving, by one or more computer processors” should read as -- receiving, by the one or more computer processors--, “classifying, by one or more computer processors” should read as – classifying, by the one or more computer processors-- and “generating, by one or more computer processors” should read as -- generating, by the 
Claim 5 recites “providing, by one or more computer processors” should read as -- providing, by the one or more computer processors--, and “receiving, by one or more computer processors” should read as -- receiving, by the one or more computer processors-- as  it appears to be a typographical and it may cause antecedent basis issue. 
Claim 6 recites “training, by one or more computer processors” should read as -- training, by the one or more computer processors – as it appears to be a typographical and it may cause antecedent basis issue.
Claim 7 recites “updating, by one or more computer processors” should read as -- updating, by the one or more computer processors -- as it appears to be a typographical and it may cause antecedent basis issue
Appropriate corrections are required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-4, 8-11 and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Younessian (US 2020/0159759 A1, hereinafter “Younessian”) in view of Zhang et al. (US 2007/0255755 A1, hereinafter “Zhang”).

Regarding claim 1, Younessian teaches
A computer-implemented method comprising: (see Younessian, [0015] “Disclosed are components that may be used to perform the described methods”). 
identifying, by one or more computer processors, a plurality of independently separable aspects of a multimedia file; (see Younessian, [0021] “Content items (which may also be referred to as "content," "content data," "content information," "content asset," "multimedia asset data file,… electronic representations of video, audio, text and/or graphics" [0045] “a plurality of keyframes may be determined from a segment (e.g., portion) of a content asset… determine the plurality of keyframes, a plurality of scenes (e.g., shots, etc.) of the segment of the content asset may be determined”; [0100] “The components of the computer 601 may be, but are not limited to, one or more processors 603”- keyframes are interpreted as independently separable aspects).
providing, by one or more computer processors, at least one independently separable aspect of the plurality of independently separable aspects as input into an object detection model; (see Younessian, [0046] “An image classifier may be applied to a given keyframe to identify the objects in the given keyframe. The image classifier may use a machine learning model. The image classifier may use a supervised machine learning model (e.g., a convolutional neural network (CNN), a deep neural network (DNN)) or an unsupervised machine learning model”; [0100] “The components of the computer 601 may be, but are not limited to, one or more processors 603” - keyframes are interpreted as independently separable aspects).
receiving, by one or more computer processors, from the object detection model, an identification of at least one object and (see Younessian, [0046] “Where the image classifier is a supervised machine learning model, the image classifier may be trained. One or more objects may be identified for a segment of the content asset (e.g., video, program, show, etc.) as an aggregate of the one or more objects determined for each of the keyframes of the a corresponding level of confidence that the object is present in the multimedia file; (see Younessian, [0048] “The identified number of faces and/or the identified objects for a given segment of a content asset (e.g., video, program, show, etc.) may be  encoded in a multidimensional data structure… Each dimension of the data structure may encode a confidence score indicating a confidence of the image classifier that a corresponding object is within the given content asset”). 
classifying, by one or more computer processors, the object as either confident, (see Younessian, [0075] “by selecting, as the first plurality of objects, N objects a highest confidence score from the image classifier”; [0100] “The components of the computer 601 may be, but are not limited to, one or more processors 603”) based on whether the level of confidence meets a threshold level of confidence; and (see Younessian, [0075] “by selecting, as the first plurality of objects, N objects a highest confidence score from the image classifier”; [page18 col2] “determining that each confidence score of the plurality of confidence scores satisfies a threshold”).
the object and (see Younessian, [0046] “One or more objects may be identified for a segment of the content asset (e.g., video, program, show, etc.) as an aggregate of the one or more objects determined for each of the keyframes of the segment of the content asset”; [0100] “The components of the computer 601 may be, but are not limited to, one or more processors 603”) the classification (see Younessian, [0075] “by selecting, as the first plurality of objects, N objects a highest confidence score from the image classifier”).
Younessian does not explicitly teach generating, by one or more computer processors, a multimedia search engine based, at least in part, on the object and the classification. 
However, Zhang discloses a search engine and also teaches
generating, by one or more computer processors, a multimedia search engine based, at least in part, on video categorization and features including objects (see Zhang,  a specialized video categorization framework combines multiple classifiers based on both metadata and content features… video categorization learning function”; [0031] “A feature extraction component 145 extracts features (e.g., spatial color distributions, texture, facial recognition, object recognition, shape features, and/or the like) from the video keyframes”; [0057] “Computer system 500 includes a processor 505”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of video search engine as being disclosed and taught by Zhang in the system taught by Younessian to yield the predictable results of significantly improving the process of final categorization recall and precision (see Zhang, [0046] “The video is assigned to category… This scheme is a validation accuracy weighted combination scheme and the strength of the classifiers based on both modalities are integrated, thereby improving the performance of the final categorization recall and precision”). 
Claims 8 and 15 incorporate substantively all the limitations of claim 1 in a computer readable medium (see Younessian, [0017] “a computer program product on a computer-readable storage medium having computer-readable program instructions (e.g., computer software) embodied in the storage medium”) and system form (see Younessian, [0015] “Disclosed are components that may be used to perform the described… systems”; [0100] “The components of the computer 601 may be, but are not limited to, one or more processors 603”; [0017] “a computer program product on a computer-readable storage medium having computer-readable program instructions (e.g., computer software) embodied in the storage medium”) and are rejected under the same rationale.

Regarding claim 2, the proposed combination of Younessian and Zhang teaches
wherein the plurality of independently separable aspects include an (see Younessian, [0045] “a plurality of keyframes may be determined from a segment (e.g., portion) of a content asset… determine the plurality of keyframes, a plurality of scenes (e.g., shots, etc.) of the segment of the content asset may be determined”) audio feed, (see Younessian, [0069] “A segment of the content asset having audio content… indicative of an interview may be identified as a "guest interview" segment”) a video feed, (see Younessian, [0044] “A keyframe (intra-frame) may include a frame of a content asset (e.g., a frame of video)”) and a text feed, wherein the text feed is received from a closed captioning service (see Younessian, [0069] “The segment label may be determined based on audio content and/or closed caption information of the segment of the content asset”; [0069] “Metadata associated with keyframes may indicate closed captioning information”).
Claim 9 incorporates substantively all the limitations of claim 2 in a computer readable medium and is rejected under the same rationale.

Regarding claim 3, the proposed combination of Younessian and Zhang teaches
wherein the corresponding level of confidence that the object is present in the multimedia file is based, at least in part, on a (see Younessian, [0048] “The identified number of faces and/or the identified objects for a given segment of a content asset (e.g., video, program, show, etc.) may be encoded in a multidimensional data structure… Each dimension of the data structure may encode a confidence score indicating a confidence of the image classifier that a corresponding object is within the given content asset”) determined correlation between at least two independently separable aspects of the plurality of independently separable aspects (see Younessian, [0076] “Each dimension of the data structure may encode a confidence score indicating a confidence of the image classifier that a corresponding object is within the segment of the content asset”; [0081] A segment profile may be determined based on 
Claims 10 and 16 incorporate substantively all the limitations of claim 3 in a computer readable medium and system form and are rejected under the same rationale.

Regarding claim 4, the proposed combination of Younessian and Zhang teaches
wherein the object detection model includes (see Younessian, [0046] “The image classifier may use a machine learning model”) a trained artificial neural network (see Younessian, [0046] “The image classifier may use a supervised machine learning model (e.g., a convolutional neural network (CNN), a deep neural network (DNN))”). 
Claims 11 and 17 incorporate substantively all the limitations of claim 4 in a computer readable medium and system form and are rejected under the same rationale.

Claims 5, 12 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Younessian and Zhang in view of Chegini (US 2018/0034879 A1, hereinafter “Chegini”).

Regarding claim 5, the proposed combination of Younessian and Zhang teaches
in response to the object being classified as not confident, the objects are filtered (see Younessian, [0045] “if a confidence score for the object generated by the image classifier falls below a threshold”). 
The proposed combination of Younessian and Zhang does not explicitly teach providing, by one or more processors, the at least one independently separable aspect to an annotator for manual annotation; and receiving, by one or more processors, an annotation indicating that the object is present in the multimedia file. 

providing, by one or more processors, the at least one independently separable aspect to an annotator for manual annotation; and (see Chegini, [0142] “provide a control enabling the user to indicate when the user wants to annotate a frame being displayed (e.g., when the user has seen an object in a frame that the user wants to annotate)… the user may add an annotation to the frame”; [0202] “configured to implement the algorithms on a general purpose computer, special purpose processors”). 
receiving, by one or more processors, an annotation indicating that the object is present in the multimedia file (see Chegini, [0144] “enables a given object to be located in a plurality of frames… if the user has identified and annotated an object of interest in a given frame, the system can then identify, using the video tracking engine, that object in nearby frames… to track the object over multiple video frames”; [0202] “configured to implement the algorithms on a general purpose computer, special purpose processors”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of annotating multimedia as being disclosed and taught by Chegini in the system taught by the proposed combination of Younessian and Zhang to yield the predictable results of applying an annotation tool in order to improve human-computer interactions that may provide reduced mental workloads, improved decision-making, reduced work stress, and/or the like, for a user (see Chegini, [0196] “interactive user interfaces that improve the functioning of the basic display function of the computer itself... the user interfaces described herein which may provide significant cognitive and ergonomic efficiencies and advantages over previous systems. The interactive and dynamic user interfaces include improved human-computer interactions that may provide reduced mental workloads, improved decision-making, reduced work stress, and/or the like, for a user”). 
Claims 12 and 18 incorporate substantively all the limitations of claim 5 in a computer readable medium and system form and are rejected under the same rationale.

Claims 6-7, 13-14 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Younessian, Zhang and Chegini in view of Safronov et al. (US 2020/0192961 A1, hereinafter “Safronov”).

Regarding claim 6, the proposed combination of Younessian, Zhang and Chegini teaches
in response to receiving the annotation indicating that the object is present in the multimedia file, it also tracks the same object in other video frames (see Chegini, [0144] “enables a given object to be located in a plurality of frames… if the user has identified and annotated an object of interest in a given frame, the system can then identify, using the video tracking engine, that object in nearby frames… to track the object over multiple video frames”).
The proposed combination of Younessian, Zhang and Chegini does not explicitly teach training, by one or more processors, the object detection model utilizing the object and the annotation. 
However, Safronov discloses training machine learning algorithms and also teaches
training, by one or more processors, the object detection model utilizing the object and the annotation (see Safronov, [0101] “The training server 140 may maintain a training database 142 for storing annotation vectors and/or training objects and/or other information”; [0099] “the training server 140 is configured to train the plurality of MLAs 200”; [0003] “Machine learning algorithms (MLAs) are used to address multiple needs in computer-implemented technologies”; [0057] “executed by a computer or processor”). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of training machine learning algorithms as being disclosed and taught by Safronov in the system taught by the proposed combination of Younessian, Zhang and Chegini to yield the predictable results of effectively training machine 
Claims 13 and 19 incorporate substantively all the limitations of claim 6 in a computer readable medium and system form and are rejected under the same rationale.

Regarding claim 7, the proposed combination of Younessian, Zhang and Chegini teaches
in response to receiving the annotation indicating that the object is present in the multimedia file, it also tracks the same object in other video frames (see Chegini, [0144] “enables a given object to be located in a plurality of frames… if the user has identified and annotated an object of interest in a given frame, the system can then identify, using the video tracking engine, that object in nearby frames… to track the object over multiple video frames”).
The proposed combination of Younessian, Zhang and Chegini does not explicitly teach updating, by one or more computer processors, the multimedia search engine based, at least in part, on the object and the annotation. 
However, Safronov discloses training machine learning algorithms and also teaches
updating, by one or more computer processors, the multimedia search engine based, at least in part, on the object and the annotation (see Safronov, [0101] “The training server 140 may maintain a training database 142 for storing annotation vectors and/or training objects and/or other information”; [0099] “the training server 140 is configured to train the plurality of MLAs 200 used by the search engine server 120, the tracking server 130 and/or other servers (not depicted) associated with the search engine operator”; [0003] “Machine 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of training search engines as being disclosed and taught by Safronov in the system taught by the proposed combination of Younessian, Zhang and Chegini to yield the predictable results of effectively training machine algorithms to be utilized by search engine server (see Safronov, [0012] “quality of a ranking may, inter alia, be evaluated by tracking user interactions with the documents provided… a MLA used by the search engine server may be "adjusted" such that lower ranked documents receiving more user interactions than higher ranked documents are "promoted" in future rankings, and such a procedure may be repeated at predetermined intervals of time to take into account changes in user interactions”). 
Claims 14 and 20 incorporate substantively all the limitations of claim 7 in a computer readable medium and system form and are rejected under the same rationale.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VAISHALI SHAH whose telephone number is (571)272-8532. The examiner can normally be reached Monday - Friday (7:30 AM to 4:00 PM).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, TAMARA KYLE can be reached on (571)272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.






/VAISHALI SHAH/Primary Examiner, Art Unit 2156