DETAILED ACTION

Introduction
This office action is in response to Applicant’s submission filed on 08/09/2022. Claims
1-4, 6-14, 16-22 are pending in the application. As such, claims 1-4, 6-14, 16-22 have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment 
The response filed on 08/09/2022 has been correspondingly accepted and considered in this Office Action.  Claims 1-4, 6-14, 16-22  have been examined.  


Response to Arguments
Applicant’s amendments and remarks with respect to Claims 1-4, 6-14, 16-22 have been fully reconsidered. In response, Examiner respectfully presents that the previous objections to claim, and further rejections under 35 U.S.C. §102, are respectfully withdrawn in view of corresponding reconsidered remarks and claim amendments filed 08/09/2022 found earnestly persuasive.


Allowable Subject Matter
Claims 1-4, 6-14, 16-22 are found allowable over the prior art of record for at least the following rationale.  
The teachings in Siskind et al. (US Patent Application Publication No.: US2014/0369596 A1) hereinafter as Siskind already of record, as specifically presented in the previous Non-Final Office Action mailed 06/23/2022, have been fully reconsidered.
Examiner respectfully notes, Siskind discloses a method of testing video correlation to text query, comprises of detecting whether video corresponds to cumulative query by determining combined score of path through aggregate lattice. “Presented is a method that learns representations for word meanings from short video clips paired with sentences.” See [Siskind, 0045].  “various aspects relate to scoring a video/query pair. Recognition of words can be linked with tracking, e.g., by forming a cross-product of tracker lattices and event-recognizer lattices. Such cross-products and other unified cost functions can be used to co-optimize the selection of per-frame object detections so that the selected detections depict a track and the track depicts a word or event.” See [Siskind, 0054]. 
Siskind also discloses a multi-modal transform to the video input to generate a transformed video input. “A system is presented that shows how the compositional structure of events, in concert with the compositional structure of language, can interplay with the underlying focusing mechanisms in video action recognition, thereby providing a medium, not only for top-down and bottom-up integration, but also for multi-modal integration between vision and language. How the roles played by participants (nouns) is shown, … , and changing spatial relations between participants (prepositions) in the form of whole sentential descriptions mediated by a grammar, guides the activity-recognition process. Further, the utility and expressiveness of the framework is shown by performing three separate tasks in the domain of multi-activity videos: sentence-guided focus of attention, generation of sentential descriptions of video, and query-based video search, simply by leveraging the framework in different manners.” See [Siskind, 0097].
Siskind additionally discloses comparison of a constituency span and the transformed video input using a Context-Free Grammar (PCFG) model. “The sentence tracker can be applied to the same video D, that depicts multiple simultaneous events taking place in the field of view with different participants, with two different sentences F1 and F2.” “Generation: A video D can be taken as input and the space of that corresponds to the F* for which (t*,2*) = S(D,F*) yields the maximal t*. This can be used to generate a sentence that describes an input videoD., ...” “Retrieval: A collection D={D1,...,Dn} of videos (or a single long video temporally segmented into short clips) can be taken along with a sentential query F, compute (ti,Zi) = S (Di,F)for each Di, and find the clipDi with maximal score ti. This can be used to perform sentence-based video search.” “we take a video clip B as input and systematically search the space of all possible sentences, that can be generated by a context-free grammar, and find the sentence with maximal video-sentence score:” See [Siskind, 0206-0208 and 0255].
Siskind further discloses using the combination of keywords and features extracted from video clip to learn a constituency parser. “In some prior schemes, it has been established the correspondence between linguistic concepts and semantic features extracted from video to produce case frames which were then translated into textual descriptions. A stochastic context free grammar (SCFG) has been used to infer events from video images parsed into scene elements. Text sentences were then generated by a simplified head-driven phrase structure grammar (HPSG) based on the output of the event inference engine. High level features (e.g., semantic keywords) have been extracted from video and then a template filling approach implemented for sentence generation.” See [Siskind, 0126] 
However, Siskind does not teach the incorporation of the previously objected to but otherwise allowable subject matter from the cancelled claim 5:  “wherein the at least one constituency span and the transformed video input are compared according to the following formula:

    PNG
    media_image1.png
    390
    1056
    media_image1.png
    Greyscale
”

Notwithstanding, said aforementioned teachings of Siskind is respectfully reconsidered and found to fail to teach or fairly suggest either individually or in a reasonable combination the presented limitations in independent Claim 1, as specifically amended and recited.
Similarly, regarding independent claims 11 and 20, although different in scope from claim 1 and each other, amended independent claims 11 and 20 recite features similar to those discussed above, therefore are found allowable for the same reasons as to claim 1.
Furthermore, dependent Claims 2-4, 6-20, and 21 further limit allowable independent Claim 1 correspondingly, and thus they are also found allowable over the prior art of record by virtue of their dependency.  Similarly, dependent claims 12-14, 16-19, and 22 further limit allowable independent claim 11 correspondingly, and thus they are also found allowable over the prior art of record by virtue of their dependency.  
Any comments considered necessary by Applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.” 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Javali (US Patent Application Publication No.: 2019/0303797 A1) hereinafter as Javali.  Javali discloses a method and system to parsing using multiple videos simultaneously in parallel to decipher speech.  “The present invention provides a method and system for analyzing human speech during natural language processing interactions between humans and computers to aid in computer learning. The method processes human language tutorial videos each having a visual track, an audio track and captions. Multiple videos are simultaneously processed in parallel using stream processing to identify spoken words or phrases in the videos by comparing them with benchmark words/phrases stored on a computer. Confidence scores are determined for each of the spoken words/phrases which are assigned to a list of the benchmark words/phrases on the computer when a threshold value is met. A system administrator can identify spoken words/phrases to which the threshold value is not met.” (Javali, Abstract)

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Philip H Lam whose telephone number is (571) 272-1721.  The examiner can normally be reached 10a.m.-6:00p.m. EST. If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Bhavesh Mehta can be reached on (571) 272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/PHILIP H LAM/Examiner, Art Unit 2656                                                                                                                                                                                                        
/EDGAR X GUERRA-ERAZO/Primary Examiner, Art Unit 2656