EXAMINER’S AMENDMENT
An Examiner’s Amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this Examiner’s Amendment was given in a telephone interview with Stuart Shapiro (Registration No. 40,169) on August 30, 2022. 

The claims had been amended as following:

1.	(Currently amended) A machine-implemented method for analyzing an instructional video comprising:
analyzing, by a processing device, video data of the instructional video to form a plurality of units of work, the analyzing of the video data to form the plurality of units of work further comprising: 
extracting video frames from the video data; 
producing frame level semantics for the extracted video frames based at least on movement of at least one object in at least some of the extracted video frames and frame differencing among the extracted video frames; 
forming a plurality of video shots based on the extracted video frames and the frame level semantics, each respective video shot being based on a respective plurality of the extracted video frames; 
grouping respective pluralities of the video shots to form respective groups of video shots based on the frame level semantics and a respective summary of the each respective video shot; 
associating a respective activity to each of the respective groups of video shots; and 
generating the each respective unit of work by associating domain-based semantics included in a knowledge base to respective activities within the each of the respective groups of video shots;
wherein the instructional video shows different steps for a user to perform, wherein each respective unit of work being a respective grouping of video frames of the instructional video based on a respective logical combination of activities associated therewith, and wherein each respective unit of work includes work information including the frame level semantics, video image sequences, and text and audio of the grouping of video frames;
analyzing, by the processing device, the each respective unit of work to produce a respective action graph of a plurality of activities included in the respective unit of work, the respective action graph indicating interdependencies among the plurality of activities and including activity information for each activity including an activity title, a sequence number indicating an order of performance, and a summary description of the activity obtained from the text and audio of the grouping of video frames for the respective unit of work;
determining, by the processing device, interdependencies among activities across the plurality of units of work to produce a critical path graph, wherein the critical path graph indicates an order of performance of the activities across the plurality of units of work and includes path information for the activities indicating a corresponding unit of work and a corresponding activity;
storing, by the processing device, the respective action graphs for the each respective unit of work and the critical path graph;
presenting, by the processing device, the instructional video to the user; and
processing a natural language query from the user viewing the instructional video pertaining to a step of the instructional video to provide information satisfying the query with respect to the step of the instructional video based on the plurality of units of work, the respective action graphs, the critical path graph, and [[a]] the knowledge base including information related to a subject matter of the instructional video, wherein processing the natural language query comprises:
applying terms of the natural language query to the work information of the plurality of units of work, the activity information of the action graphs, and the path information of the critical path graph to identify the activity associated with the step of the natural language query;
determining the information for the step pertaining to the natural language query and a corresponding section of the instructional video containing the information for the step from the work information and the activity information for the identified activity; and
providing the information for the step and the corresponding section of the instructional video to the user.

2.	(Canceled).

3.	(Currently amended) The machine-implemented method of claim [[2]] 1, wherein the forming of the plurality of video shots comprises:
performing video shot summarization to produce the respective summary of the each respective video shot.

4.	(Original) The machine-implemented method of claim 3, wherein the forming of the plurality of video shots further comprises:
performing video shot boundary detection to detect an end of the each respective video shot.

5.	(Previously Presented) The machine-implemented method of claim 4, wherein the analyzing of the each respective unit of work to produce the respective action graph of the plurality of activities included in the each respective unit of work further comprises:
analyzing the frame level semantics, video image sequences and the text and audio included in the each respective unit of work to determine a respective plurality of activities included in the each respective unit of work,
using the knowledge base to resolve any entity ambiguities in the each respective unit of work, and
producing the respective action graph for the each respective unit of work based on the analyzing of the frame level semantics, video image sequences and the text and audio included in the each respective unit of work and the resolved any entity ambiguities.

6.	(Original) The machine-implemented method of claim 5, wherein the generating of the each respective unit of work further comprises:
correlating temporally distant video shots of the plurality of video shots.

7.	(Previously Presented) The machine-implemented method of claim 1, wherein the determining the interdependencies among the activities across the plurality of units of work to produce the critical path graph further comprises:
for each respective activity of the plurality of units of work, performing:
determining first activities across the plurality of units of work that must be completed before starting the respective activity, and
determining second activities across the plurality of units of work that can be started only after completion of the respective activity; and
forming the critical path graph based on the determined first activities and the determined second activities with respect to each of the respective activities.

8.	(Currently amended) A system for analyzing an instructional video comprising:
at least one processor;
at least one memory connected to the at least one processor, wherein the at least one processor is configured to perform:
analyzing video data of the instructional video to form a plurality of units of work, the analyzing of the video data to form the plurality of units of work further comprising: 
extracting video frames from the video data; 
producing frame level semantics for the extracted video frames based at least on movement of at least one object in at least some of the extracted video frames and frame differencing among the extracted video frames; 
forming a plurality of video shots based on the extracted video frames and the frame level semantics, each respective video shot being based on a respective plurality of the extracted video frames; 
grouping respective pluralities of the video shots to form respective groups of video shots based on the frame level semantics and a respective summary of the each respective video shot; 
associating a respective activity to each of the respective groups of video shots; and 
generating the each respective unit of work by associating domain-based semantics included in a knowledge base to respective activities within the each of the respective groups of video shots;
wherein the instructional video shows different steps for a user to perform, wherein each respective unit of work being a respective grouping of video frames of the instructional video based on a respective logical combination of activities associated therewith, and wherein each respective unit of work includes work information including the frame level semantics, video image sequences, and text and audio of the grouping of video frames;
analyzing the each respective unit of work to produce a respective action graph of a plurality of activities included in the respective unit of work, the respective action graph indicating interdependencies among the plurality of activities and including activity information for each activity including an activity title, a sequence number indicating an order of performance, and a summary description of the activity obtained from the text and audio of the grouping of video frames for the respective unit of work;
determining interdependencies among activities across the plurality of units of work to produce a critical path graph, wherein the critical path graph indicates an order of performance of the activities across the plurality of units of work and includes path information for the activities indicating a corresponding unit of work and a corresponding activity;
storing the respective action graphs for the each respective unit of work and the critical path graph;
presenting the instructional video to the user; and
processing a natural language query from the user viewing the instructional video pertaining to a step of the instructional video to provide information satisfying the query with respect to the step of the instructional video based on the plurality of units of work, the respective action graphs, the critical path graph, and [[a]] the knowledge base including information related to a subject matter of the instructional video, wherein processing the natural language query comprises:
applying terms of the natural language query to the work information of the plurality of units of work, the activity information of the action graphs, and the path information of the critical path graph to identify the activity associated with the step of the natural language query;
determining the information for the step pertaining to the natural language query and a corresponding section of the instructional video containing the information for the step from the work information and the activity information for the identified activity; and
providing the information for the step and the corresponding section of the instructional video to the user.

9.	(Canceled).

10.	(Currently amended) The system of claim [[9]] 8, wherein the forming of the plurality of video shots comprises:
performing video shot summarization to produce the respective summary of the each respective video shot.

11.	(Original) The system of claim 10, wherein the forming of the plurality of video shots further comprises:
performing video shot boundary detection to detect an end of the each respective video shot.

12.	(Previously Presented) The system of claim 11, wherein the analyzing of the each respective unit of work to produce the respective action graph of the plurality of activities included in the respective unit of work further comprises:
analyzing the frame level semantics, video image sequences and the text and audio included in the each respective unit of work to determine a respective plurality of activities included in the each respective unit of work,
using the knowledge base to resolve any entity ambiguities in the each respective unit of work, and
producing the respective action graph for the each respective unit of work based on the analyzing of the frame level semantics, video image sequences and the text and audio included in the each respective unit of work and the resolved any entity ambiguities.

13.	(Original) The system of claim 12, wherein the generating of the each respective unit of work further comprises:
correlating temporally distant video shots of the plurality of video shots.

14.	(Previously Presented) The system of claim 8, wherein the determining the interdependencies among the activities across the plurality of units of work to produce the critical path graph further comprises:
for each respective activity of the plurality of units of work, performing:
determining first activities across the plurality of units of work that must be completed before starting the respective activity, and
determining second activities across the plurality of units of work that can be started only after completion of the respective activity; and
forming the critical path graph based on the determined first activities and the determined second activities with respect to each of the respective activities.

15.	(Currently amended) A 
analyzing video data of the instructional video to form a plurality of units of work, the analyzing of the video data to form the plurality of units of work further comprising: 
extracting video frames from the video data; 
producing frame level semantics for the extracted video frames based at least on movement of at least one object in at least some of the extracted video frames and frame differencing among the extracted video frames; 
forming a plurality of video shots based on the extracted video frames and the frame level semantics, each respective video shot being based on a respective plurality of the extracted video frames; 
grouping respective pluralities of the video shots to form respective groups of video shots based on the frame level semantics and a respective summary of the each respective video shot; 
associating a respective activity to each of the respective groups of video shots; and 
generating the each respective unit of work by associating domain-based semantics included in a knowledge base to respective activities within the each of the respective groups of video shots;
wherein the instructional video shows different steps for a user to perform, wherein each respective unit of work being a respective grouping of video frames of the instructional video based on a respective logical combination of activities associated therewith, and wherein each respective unit of work includes work information including the frame level semantics, video image sequences, and text and audio of the grouping of video frames;
analyzing the each respective unit of work to produce a respective action graph of a plurality of activities included in the respective unit of work indicating interdependencies among the plurality of activities and including activity information for each activity including an activity title, a sequence number indicating an order of performance, and a summary description of the activity obtained from the text and audio of the grouping of video frames for the respective unit of work;
determining interdependencies among activities across the plurality of units of work to produce a critical path graph, wherein the critical path graph indicates an order of performance of the activities across the plurality of units of work and includes path information for the activities indicating a corresponding unit of work and a corresponding activity;
storing the respective action graphs for the each respective unit of work and the critical path graph;
presenting the instructional video to the user; and
processing a natural language query from the user viewing the instructional video pertaining to a step of the instructional video to provide information satisfying the query with respect to the step of the instructional video based on the plurality of units of work, the respective action graphs, the critical path graph, and [[a]] the knowledge base including information related to a subject matter of the instructional video, wherein processing the natural language query comprises:
applying terms of the natural language query to the work information of the plurality of units of work, the activity information of the action graphs, and the path information of the critical path graph to identify the activity associated with the step of the natural language query;
determining the information for the step pertaining to the natural language query and a corresponding section of the instructional video containing the information for the step from the work information and the activity information for the identified activity; and
providing the information for the step and the corresponding section of the instructional video to the user.

16.	(Canceled).

17.	(Currently amended) The computer readable storage medium of claim [[16]] 15, wherein the forming of the plurality of shots comprises:
performing video shot summarization to produce the respective summary of the each respective video shot, and
performing video shot boundary detection to detect an end of the each respective video shot.

18.	(Previously Presented) The computer readable storage medium of claim 17, wherein the analyzing of the each respective unit of work to produce the respective action graph of the plurality of activities included in the respective unit of work further comprises:
analyzing the frame level semantics, video image sequences and the text and audio included in the each respective unit of work to determine a respective plurality of activities included in the each respective unit of work,
using the knowledge base to resolve any entity ambiguities in the each respective unit of work, and
producing the respective action graph for the each respective unit of work based on the analyzing of the frame level semantics, video image sequences and the text and audio included in the each respective unit of work and the resolved any entity ambiguities.

19.	(Original) The computer readable storage medium of claim 18, wherein the generating of the each respective unit of work further comprises:
correlating temporally distant video shots of the plurality of video shots.

20.	(Previously Presented) The computer readable storage medium of claim 15, wherein the determining the interdependencies among the activities across the plurality of units of work to produce the critical path graph further comprises:
for each respective activity of the plurality of units of work, performing:
determining first activities across the plurality of units of work that must be completed before starting the respective activity, and
determining second activities across the plurality of units of work that can be started only after completion of the respective activity; and
	forming the critical path graph based on the determined first activities and the determined second activities with respect to each of the respective activities.


The following is an examiner’s statement of reasons for allowance:

The prior arts of recorded when taken individually or in combination do not expressly teach or render obvious the limitations recited in claims 1, 8 and 15 when taken in the context of the claims as a whole, especially the concept of analyzing, by a processing device, video data of the instructional video to form a plurality of units of work, the analyzing of the video data to form the plurality of units of work further comprising: extracting video frames from the video data; producing frame level semantics for the extracted video frames based at least on movement of at least one object in at least some of the extracted video frames and frame differencing among the extracted video frames; forming a plurality of video shots based on the extracted video frames and the frame level semantics, each respective video shot being based on a respective plurality of the extracted video frames; grouping respective pluralities of the video shots to form respective groups of video shots based on the frame level semantics and a respective summary of the each respective video shot; associating a respective activity to each of the respective groups of video shots; and generating the each respective unit of work by associating domain-based semantics included in a knowledge base to respective activities within the each of the respective groups of video shots; wherein the instructional video shows different steps for a user to perform, wherein each respective unit of work being a respective grouping of video frames of the instructional video based on a respective logical combination of activities associated therewith, and wherein each respective unit of work includes work information including the frame level semantics, video image sequences, and text and audio of the grouping of video frames; analyzing, by the processing device, the each respective unit of work to produce a respective action graph of a plurality of activities included in the respective unit of work, the respective action graph indicating interdependencies among the plurality of activities and including activity information for each activity including an activity title, a sequence number indicating an order of performance, and a summary description of the activity obtained from the text and audio of the grouping of video frames for the respective unit of work; determining, by the processing device, interdependencies among activities across the plurality of units of work to produce a critical path graph, wherein the critical path graph indicates an order of performance of the activities across the plurality of units of work and includes path information for the activities indicating a corresponding unit of work and a corresponding activity.
At best the prior arts of record, specifically, Hwangbo et al. (US 20170185846 A1) teaches a method for analyzing a video comprising: analyzing, by a processing device, video data of the instructional video to form a plurality of units of work, each respective unit of work being a respective grouping of video frames of the video based on a respective logical combination of activities associated therewith; analyzing, by the processing device, the each respective unit of work to produce an analysis result of a plurality of activities included in the respective unit of work; determining, by the processing device, interdependencies among activities across the plurality of units of work; analyzing, the each respective units of work to produce a respective analysis result of a plurality of activities included in the respective units of work (See abstract, paragraph [0030], paragraph [0037]).  Choe et al. (US 20140324864 A1) teaches analyzing, each respective segment to produce a respective action graph of a plurality of activities included in the respective segment, the respective action graph indicating interdependencies among the plurality of activities; determining, by the processing device, interdependencies among activities across the plurality of segments to produce a critical path graph (FIG. 4A, paragraph [0096], paragraph [0097], paragraph [0110]). Kozloski et al. (US 20170185846 A1) teaches an instructional video analysis by steps, and base on a user query, related steps are located, and corresponding video is presented to the user.

In addition, neither a reference uncovered that would have provided a basis of evidence for asserting a motivation, nor one of ordinary skilled in the art at the time the invention was made, knowing the teaching of the prior arts of record would have combined them to arrive at the present invention as recited in the context of independent claims 1, 8 and 15 as a whole.

Thus, claims 1, 8 and 15 are allowed over the prior arts of record. Dependent claims 3-7, 10-14, and 17-20 are also allowable due to its dependency of independent claims 1, 8 and 15.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance”.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to QI WAN whose telephone number is (571)272-6445.  The examiner can normally be reached on Work from 7am-4:30pm Monday to Thursday, 1st Friday 7am-4pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Welch can be reached on (571)272-7212.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Q.W./


/JENNIFER N WELCH/Supervisory Patent Examiner, Art Unit 2143