DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Amendment

This Office Action is responsive to Applicant’s remarks received on April 11, 2022.  Claims 1-18, 23 and 24 are currently under consideration.


Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4, 11, 15, 16, 23 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Jarc et al. (WO2017083768) and Dergachyova et al. (“Automatic data-driven real-time segmentation and recognition of surgical workflow”).
Regarding claim 1, Jarc et al. discloses a method comprising: 
accessing a video of a surgical procedure (“The vision system typically includes a camera instrument 528 for capturing video images and one or more video displays for displaying the captured video images” at page 11, line 5; “These actions may be detected by one or more sensors of the TSS 950, or by environment-monitoring sensors such as a video capture device” at page 23, line 11), the surgical procedure comprising a plurality of phases (“Task assessor 1108 is programmed, or otherwise configured, to assess a current surgical task being performed. In the present context, a task is a series of events, gestures, or a combination thereof, that together produce a defined surgical effect. The task can be a clinically generic operation (e.g., incision task, suturing task, etc.) or a clinically-relevant step(s)/segment(s) of a procedure (e.g., UV anastomosis of a prostatectomy). Task assessment is based on a series of recent events, and based further on data from task criteria database 1110, which associates certain defined series of events with tasks. In a related embodiment, task criteria database 1110 may also include a library of surgical procedures where these tasks are commonly performed. In this type of embodiment, the identification of surgical procedure may itself be an input into the task-assessment algorithm. A surgical procedure in this context is a high-level descriptor such as, for instance, prostatectomy, umbilical hernia repair, mastectomy, or the like” at page 24, line 17); 
dividing the video into one or more blocks, each of the one or more blocks comprising one or more video frames (“Segmenter 1004 is programmed, or otherwise configured, to discern the stages of surgical procedures as they are being performed” at page 22, line 26; each stage comprises at least one task); 
for each block: 
applying a prediction model on the one or more video frames of the respective block to obtain a phase prediction for each of the one or more video frames, the prediction model configured to predict, for an input video frame, one of the plurality of phases of the surgical procedure (“Real-time segmentation assessor 1200 may be applied intra-operatively to make a segmentation assessment with negligible latency insofar as the surgeon or medical personnel may perceive. In an example embodiment real-time segmentation assessor 1200 is configured as a recurrent neural network (RNN), using a long short-term memory (LSTM) RNN architecture for deep learning. As depicted, feature extractor 1202 applies filtering or other selection criteria to a sequence of assessed tasks and their corresponding parameters to create a feature vector as the input to RNN 1204” at page 26, line 33); 
generating an aggregated phase prediction for the respective block by aggregating the phase predictions for the one or more video frames (the sequence of tasks are grouped into their respective stages according to their identified phases); and 
modifying the video of the surgical procedure to include an indication of a predicted phase of the respective block based on the aggregated phase prediction (“In an embodiment, for captured video and annotation data 1504, indexer 1502 may break the videos into segments corresponding to assessed segments of the surgical procedure. Assessed input log 1506 stores time-series events and other input information, as well as assessed gestures. Assessed task log 1508 contains assessments of tasks and surgical procedure segments that were performed in real time” at page 33, line 7).
Jarc et al. does not explicitly disclose accepting an input video frame.
Dergachyova et al. teaches a method in the same field of endeavor of surgical workflow prediction, comprising:
accessing a video of a surgical procedure, the surgical procedure comprising a plurality of phases (“It contains 7 endoscopic videos of laparoscopic cholecystectomies. The videos are in full HD quality (1920×1080) at 25 fps” at page 1083, Data, line 3; “The laparoscopic cholecystectomy operation passes through 7 phases” at page 1083, Data, paragraph 2, line 1);
dividing the video into one or more blocks, each of the one or more blocks comprising one or more video frames (“To construct it iSPMs are parsed in order to extract all unique surgical phases which play the role of vertices of the graph. Then, we derive all edges, meaning transitions between phases” at page 1083, Surgical Process Modeling, line 13; Figure 3f shows the different phases as separated by color/shading, where each phase is separated by transitions);
for each block: 
applying a prediction model (“The last stage of our algorithm is HsMM training on signatures made of AdaBoost responses used as observations. First of all, the gSPM constructed at the first stage automatically defines set of phases S and initializes A, P and π. Then a finite observation vocabulary O is built from all unique signatures found in the training data. B is initially computed by counting the number of occurrences of all signatures from O in each phase. The model is then refined thanks to modified forward–backward algorithm.” at page 1084, Hidden semi-Markov Model, second to last paragraph) on the one or more video frames of the respective block to obtain a phase prediction for each of the one or more video frames, the prediction model configured to accept an input video frame and predict, for the input video frame, one of the plurality of phases of the surgical procedure (“At testing time each sample passes through all classifiers and obtains a signature of 2 × N length consisting of N positive/negative responses (1 or −1) and N confidence scores from 0 to 100 indicating classifiers certainty. Each ith response shows if the ith classifier recognizes the sample as belonging to its phase or not. The values of the signature containing confidence scores are divided into k intervals” at page 1084, AdaBoost classification, paragraph 3, line 1; “At testing time, the sequence of signatures representing the surgical procedure is decoded one by one with modified Viterbi algorithm inspired from [22] in order to get a sequence of phase labels attributed to each sample as final result” at page 1084, Hidden semi-Markov Model, last paragraph).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the data description as taught by Dergachyova et al. in the system of Jarc et al. as the “method can be used for on-line detection as it can deliver a classification decision each second in real time” (Dergachyova et al. at page 1089, paragraph 1, line 7).
Regarding claim 2, the Jarc et al. and Dergachyova et al. combination discloses a method wherein the phase prediction for a video frame comprises a probability vector comprising probabilities of the video frame belonging to respective phases of the surgical procedure (“As depicted, feature extractor 1202 applies filtering or other selection criteria to a sequence of assessed tasks and their corresponding parameters to create a feature vector as the input to RNN 1204. An example of a feature vector may include a pose (e.g., position and rotation matrix information), along with a joint angle of an actuator, and velocity. RNN 1204 may be a bi-directional RNN, in which there are actually two RNNs, one forward-looking, and the other backward-looking. Accordingly, predicted future values and past values are taken into account in the RNN algorithm. In an example embodiment, RNN 1204 has two layers, a bidirectional LSTM, and a gated recurrent unit (GRU) layer, with hidden nodes in each layer” Jarc et al. at page 27, line 4; the output of the RNN is a vector representing the most probable class to which the input belongs; “At testing time each sample passes through all classifiers and obtains a signature of 2 × N length consisting of N positive/negative responses (1 or −1) and N confidence scores from 0 to 100 indicating classifiers certainty. Each ith response shows if the ith classifier recognizes the sample as belonging to its phase or not. The values of the signature containing confidence scores are divided into k intervals” Dergachyova et al. at page 1084, AdaBoost classification, paragraph 3, line 1; “At testing time, the sequence of signatures representing the surgical procedure is decoded one by one with modified Viterbi algorithm inspired from [22] in order to get a sequence of phase labels attributed to each sample as final result” Dergachyova et al. at page 1084, Hidden semi-Markov Model, last paragraph).
Regarding claim 4, Jarc et al. discloses a method further comprising: in response to determining that a confidence level of the aggregated phase prediction is higher than a threshold; and determining the predicted phase of the block as the aggregated phase prediction (“Confidence measurement engine 1230 is programmed, or otherwise configured, to compute a confidence score representing a probability of correct segmentation. The confidence score may be computed with cross-entropy and log-probability computations” at page 28, line 12).
Regarding claim 11, Jarc et al. discloses a method comprising: 
accessing a video of a surgical procedure (“The vision system typically includes a camera instrument 528 for capturing video images and one or more video displays for displaying the captured video images” at page 11, line 5; “These actions may be detected by one or more sensors of the TSS 950, or by environment-monitoring sensors such as a video capture device” at page 23, line 11), the surgical procedure comprising a plurality of phases (“Task assessor 1108 is programmed, or otherwise configured, to assess a current surgical task being performed. In the present context, a task is a series of events, gestures, or a combination thereof, that together produce a defined surgical effect. The task can be a clinically generic operation (e.g., incision task, suturing task, etc.) or a clinically-relevant step(s)/segment(s) of a procedure (e.g., UV anastomosis of a prostatectomy). Task assessment is based on a series of recent events, and based further on data from task criteria database 1110, which associates certain defined series of events with tasks. In a related embodiment, task criteria database 1110 may also include a library of surgical procedures where these tasks are commonly performed. In this type of embodiment, the identification of surgical procedure may itself be an input into the task-assessment algorithm. A surgical procedure in this context is a high-level descriptor such as, for instance, prostatectomy, umbilical hernia repair, mastectomy, or the like” at page 24, line 17); 
dividing the video into one or more blocks, each of the one or more blocks comprising one or more video frames (“Segmenter 1004 is programmed, or otherwise configured, to discern the stages of surgical procedures as they are being performed” at page 22, line 26; each stage comprises at least one task); 
for each block: 
generating a feature vector for the one or more video frames in the respective block (“As depicted, feature extractor 1202 applies filtering or other selection criteria to a sequence of assessed tasks and their corresponding parameters to create a feature vector as the input to RNN 1204” at page 27, line 4); 
applying a prediction model on the feature vector to generate a phase prediction for the respective block, the phase prediction indicating a phase of the surgical procedure (“Real-time segmentation assessor 1200 may be applied intra-operatively to make a segmentation assessment with negligible latency insofar as the surgeon or medical personnel may perceive. In an example embodiment real-time segmentation assessor 1200 is configured as a recurrent neural network (RNN), using a long short-term memory (LSTM) RNN architecture for deep learning. As depicted, feature extractor 1202 applies filtering or other selection criteria to a sequence of assessed tasks and their corresponding parameters to create a feature vector as the input to RNN 1204” at page 26, line 33); and 
modifying the video of the surgical procedure to include an indication of the phase prediction for the respective block (“In an embodiment, for captured video and annotation data 1504, indexer 1502 may break the videos into segments corresponding to assessed segments of the surgical procedure. Assessed input log 1506 stores time-series events and other input information, as well as assessed gestures. Assessed task log 1508 contains assessments of tasks and surgical procedure segments that were performed in real time” at page 33, line 7).
Jarc et al. does not explicitly disclose generating a feature vector using the one or more video frames in the respective block.
Dergachyova et al. teaches a method in the same field of endeavor of surgical workflow prediction, comprising:
accessing a video of a surgical procedure, the surgical procedure comprising a plurality of phases (“It contains 7 endoscopic videos of laparoscopic cholecystectomies. The videos are in full HD quality (1920×1080) at 25 fps” at page 1083, Data, line 3; “The laparoscopic cholecystectomy operation passes through 7 phases” at page 1083, Data, paragraph 2, line 1);
dividing the video into one or more blocks, each of the one or more blocks comprising one or more video frames (“To construct it iSPMs are parsed in order to extract all unique surgical phases which play the role of vertices of the graph. Then, we derive all edges, meaning transitions between phases” at page 1083, Surgical Process Modeling, line 13; Figure 3f shows the different phases as separated by color/shading, where each phase is separated by transitions);
for each block: 
generating a feature vector using the one or more video frames in the respective block (“Along with the visual information we incorporate the signals of instrument usage. Each analysed data sample can be described as a set of instrument binary signals: 1 if instrument is in use, 0 if not. In this way we have an instrument vector of length M, where M is a total number of surgical tools (M = 10 in our case). The visual description vector and instrument vector are concatenated together to be used as input for the following AdaBoost classification” at page 1084, line 6); 
applying a prediction model (“The last stage of our algorithm is HsMM training on signatures made of AdaBoost responses used as observations. First of all, the gSPM constructed at the first stage automatically defines set of phases S and initializes A, P and π. Then a finite observation vocabulary O is built from all unique signatures found in the training data. B is initially computed by counting the number of occurrences of all signatures from O in each phase. The model is then refined thanks to modified forward–backward algorithm.” at page 1084, Hidden semi-Markov Model, second to last paragraph) on the feature vector to generate a phase prediction for respective block, the phase prediction indicating a phase of the surgical procedure (“At testing time each sample passes through all classifiers and obtains a signature of 2 × N length consisting of N positive/negative responses (1 or −1) and N confidence scores from 0 to 100 indicating classifiers certainty. Each ith response shows if the ith classifier recognizes the sample as belonging to its phase or not. The values of the signature containing confidence scores are divided into k intervals” at page 1084, AdaBoost classification, paragraph 3, line 1; “At testing time, the sequence of signatures representing the surgical procedure is decoded one by one with modified Viterbi algorithm inspired from [22] in order to get a sequence of phase labels attributed to each sample as final result” at page 1084, Hidden semi-Markov Model, last paragraph).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the data description as taught by Dergachyova et al. in the system of Jarc et al. as the “method can be used for on-line detection as it can deliver a classification decision each second in real time” (Dergachyova et al. at page 1089, paragraph 1, line 7).
Regarding claim 15, Jarc et al. discloses a method wherein the prediction model comprises one or more of a gated recurrent units neural network or a long short-term memory neural network (“In an example embodiment, RNN 1204 has two layers, a bidirectional LSTM, and a gated recurrent unit (GRU) layer, with hidden nodes in each layer” at page 27, line 11).
Regarding claim 16, the Jarc et al. and Dergachyova et al. combination discloses a method further comprising: 
refining the predicted phases of the video of the surgical procedure (“Post-processing segmentation assessor may be applied at any point following an operation using all data recorded from the operation to make a segmentation assessment” Jarc et al. at page 27, line 22; “In a related embodiment, post-processing-based segmentation assessor 1220 is employed post-operatively to update the training data 1206 and to re-assess the segmentation determinations of real-time segmentation assessor 1200” Jarc et al. at page 28, line 9).
The Jarc et al. and Dergachyova et al. combination does not explicitly disclose modifying the video of the surgical procedure using the refined predicted phases.
However, given an unconfident segmentation determination, it would therefore make sense to change that segmentation determination and reflect that in the final task log.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the event log along with the segment to accurately recall the sequence of events of the procedure for later reference.
Regarding claim 23, Jarc et al. discloses a computing device comprising: 
a processor (“the captured images can undergo image processing by a computer processor” at page 9, line 11); and 
a non-transitory computer-readable medium having processor-executable instructions stored thereupon (“The computer processor typically includes one or more data processing boards purposed for executing computer readable code stored in a non-volatile memory device of the computer processor” at page 9, line 4), which, when executed by the processor, cause the processor to:
divide a video of a surgical procedure into one or more blocks, the surgical procedure comprising a plurality of phases (“Task assessor 1108 is programmed, or otherwise configured, to assess a current surgical task being performed. In the present context, a task is a series of events, gestures, or a combination thereof, that together produce a defined surgical effect. The task can be a clinically generic operation (e.g., incision task, suturing task, etc.) or a clinically-relevant step(s)/segment(s) of a procedure (e.g., UV anastomosis of a prostatectomy). Task assessment is based on a series of recent events, and based further on data from task criteria database 1110, which associates certain defined series of events with tasks. In a related embodiment, task criteria database 1110 may also include a library of surgical procedures where these tasks are commonly performed. In this type of embodiment, the identification of surgical procedure may itself be an input into the task-assessment algorithm. A surgical procedure in this context is a high-level descriptor such as, for instance, prostatectomy, umbilical hernia repair, mastectomy, or the like” at page 24, line 17) and each of the one or more blocks comprising one or more video frames (“Segmenter 1004 is programmed, or otherwise configured, to discern the stages of surgical procedures as they are being performed” at page 22, line 26; each stage comprises at least one task); 
for each block: 
apply a prediction model on the one or more video frames of the respective block to obtain a phase prediction for each of the one or more video frames, the prediction model configured to predict, for an input video frame, one of the plurality of phases of the surgical procedure (“Real-time segmentation assessor 1200 may be applied intra-operatively to make a segmentation assessment with negligible latency insofar as the surgeon or medical personnel may perceive. In an example embodiment real-time segmentation assessor 1200 is configured as a recurrent neural network (RNN), using a long short-term memory (LSTM) RNN architecture for deep learning. As depicted, feature extractor 1202 applies filtering or other selection criteria to a sequence of assessed tasks and their corresponding parameters to create a feature vector as the input to RNN 1204” at page 26, line 33); 
generate an aggregated phase prediction for the respective block by aggregating the phase predictions for the one or more video frames (the sequence of tasks are grouped into their respective stages according to their identified phases); and 
modify the video of the surgical procedure to include an indication of a predicted phase of the respective block based on the aggregated phase prediction (“In an embodiment, for captured video and annotation data 1504, indexer 1502 may break the videos into segments corresponding to assessed segments of the surgical procedure. Assessed input log 1506 stores time-series events and other input information, as well as assessed gestures. Assessed task log 1508 contains assessments of tasks and surgical procedure segments that were performed in real time” at page 33, line 7).
Jarc et al. does not explicitly disclose accepting an input surgical image.
Dergachyova et al. teaches a device in the same field of endeavor of surgical workflow prediction, comprising:
a processor (processor of computer); and
a non-transitory computer-readable medium having processor-executable instructions stored thereupon (implied software and memory), which, when executed by the processor, cause the processor to:
divide a video of a surgical procedure (“It contains 7 endoscopic videos of laparoscopic cholecystectomies. The videos are in full HD quality (1920×1080) at 25 fps” at page 1083, Data, line 3) into one or more blocks, the surgical procedure comprising a plurality of phases (“The laparoscopic cholecystectomy operation passes through 7 phases” at page 1083, Data, paragraph 2, line 1) and each of the one or more blocks comprising one or more surgical images (“To construct it iSPMs are parsed in order to extract all unique surgical phases which play the role of vertices of the graph. Then, we derive all edges, meaning transitions between phases” at page 1083, Surgical Process Modeling, line 13; Figure 3f shows the different phases as separated by color/shading, where each phase is separated by transitions);
for each block: 
apply a prediction model (“The last stage of our algorithm is HsMM training on signatures made of AdaBoost responses used as observations. First of all, the gSPM constructed at the first stage automatically defines set of phases S and initializes A, P and π. Then a finite observation vocabulary O is built from all unique signatures found in the training data. B is initially computed by counting the number of occurrences of all signatures from O in each phase. The model is then refined thanks to modified forward–backward algorithm.” at page 1084, Hidden semi-Markov Model, second to last paragraph) on the one or more surgical images of the respective block to obtain a phase prediction for each of the one or more surgical images, the prediction model configured to accept an input surgical image and predict, for the input surgical image, one of the plurality of phases of the surgical procedure (“At testing time each sample passes through all classifiers and obtains a signature of 2 × N length consisting of N positive/negative responses (1 or −1) and N confidence scores from 0 to 100 indicating classifiers certainty. Each ith response shows if the ith classifier recognizes the sample as belonging to its phase or not. The values of the signature containing confidence scores are divided into k intervals” at page 1084, AdaBoost classification, paragraph 3, line 1; “At testing time, the sequence of signatures representing the surgical procedure is decoded one by one with modified Viterbi algorithm inspired from [22] in order to get a sequence of phase labels attributed to each sample as final result” at page 1084, Hidden semi-Markov Model, last paragraph).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the data description as taught by Dergachyova et al. in the system of Jarc et al. as the “method can be used for on-line detection as it can deliver a classification decision each second in real time” (Dergachyova et al. at page 1089, paragraph 1, line 7).
Regarding claim 24, Jarc et al. discloses a non-transitory computer-readable medium comprising processor-executable instructions (“The computer processor typically includes one or more data processing boards purposed for executing computer readable code stored in a non-volatile memory device of the computer processor” at page 9, line 4) to cause a processor to:
divide a video of a surgical procedure into one or more blocks, the surgical procedure comprising a plurality of phases (“Task assessor 1108 is programmed, or otherwise configured, to assess a current surgical task being performed. In the present context, a task is a series of events, gestures, or a combination thereof, that together produce a defined surgical effect. The task can be a clinically generic operation (e.g., incision task, suturing task, etc.) or a clinically-relevant step(s)/segment(s) of a procedure (e.g., UV anastomosis of a prostatectomy). Task assessment is based on a series of recent events, and based further on data from task criteria database 1110, which associates certain defined series of events with tasks. In a related embodiment, task criteria database 1110 may also include a library of surgical procedures where these tasks are commonly performed. In this type of embodiment, the identification of surgical procedure may itself be an input into the task-assessment algorithm. A surgical procedure in this context is a high-level descriptor such as, for instance, prostatectomy, umbilical hernia repair, mastectomy, or the like” at page 24, line 17) and each of the one or more blocks comprising one or more video frames (“Segmenter 1004 is programmed, or otherwise configured, to discern the stages of surgical procedures as they are being performed” at page 22, line 26; each stage comprises at least one task); 
for each block: 
apply a prediction model on the one or more video frames of the respective block to obtain a phase prediction for each of the one or more video frames, the prediction model configured to predict, for an input video frame, one of the plurality of phases of the surgical procedure (“Real-time segmentation assessor 1200 may be applied intra-operatively to make a segmentation assessment with negligible latency insofar as the surgeon or medical personnel may perceive. In an example embodiment real-time segmentation assessor 1200 is configured as a recurrent neural network (RNN), using a long short-term memory (LSTM) RNN architecture for deep learning. As depicted, feature extractor 1202 applies filtering or other selection criteria to a sequence of assessed tasks and their corresponding parameters to create a feature vector as the input to RNN 1204” at page 26, line 33); 
generate an aggregated phase prediction for the respective block by aggregating the phase predictions for the one or more video frames (the sequence of tasks are grouped into their respective stages according to their identified phases); and 
modify the video of the surgical procedure to include an indication of a predicted phase of the respective block based on the aggregated phase prediction (“In an embodiment, for captured video and annotation data 1504, indexer 1502 may break the videos into segments corresponding to assessed segments of the surgical procedure. Assessed input log 1506 stores time-series events and other input information, as well as assessed gestures. Assessed task log 1508 contains assessments of tasks and surgical procedure segments that were performed in real time” at page 33, line 7).
Jarc et al. does not explicitly disclose accepting an input surgical image.
Dergachyova et al. teaches a non-transitory computer-readable medium comprising processor-executable instructions (implied software and memory), which, when executed by the processor, cause the processor to:
divide a video of a surgical procedure (“It contains 7 endoscopic videos of laparoscopic cholecystectomies. The videos are in full HD quality (1920×1080) at 25 fps” at page 1083, Data, line 3) into one or more blocks, the surgical procedure comprising a plurality of phases (“The laparoscopic cholecystectomy operation passes through 7 phases” at page 1083, Data, paragraph 2, line 1) and each of the one or more blocks comprising one or more surgical images (“To construct it iSPMs are parsed in order to extract all unique surgical phases which play the role of vertices of the graph. Then, we derive all edges, meaning transitions between phases” at page 1083, Surgical Process Modeling, line 13; Figure 3f shows the different phases as separated by color/shading, where each phase is separated by transitions);
for each block: 
apply a prediction model (“The last stage of our algorithm is HsMM training on signatures made of AdaBoost responses used as observations. First of all, the gSPM constructed at the first stage automatically defines set of phases S and initializes A, P and π. Then a finite observation vocabulary O is built from all unique signatures found in the training data. B is initially computed by counting the number of occurrences of all signatures from O in each phase. The model is then refined thanks to modified forward–backward algorithm.” at page 1084, Hidden semi-Markov Model, second to last paragraph) on the one or more surgical images of the respective block to obtain a phase prediction for each of the one or more surgical images, the prediction model configured to accept an input surgical image and predict, for the input surgical image, one of the plurality of phases of the surgical procedure (“At testing time each sample passes through all classifiers and obtains a signature of 2 × N length consisting of N positive/negative responses (1 or −1) and N confidence scores from 0 to 100 indicating classifiers certainty. Each ith response shows if the ith classifier recognizes the sample as belonging to its phase or not. The values of the signature containing confidence scores are divided into k intervals” at page 1084, AdaBoost classification, paragraph 3, line 1; “At testing time, the sequence of signatures representing the surgical procedure is decoded one by one with modified Viterbi algorithm inspired from [22] in order to get a sequence of phase labels attributed to each sample as final result” at page 1084, Hidden semi-Markov Model, last paragraph).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the data description as taught by Dergachyova et al. in the system of Jarc et al. as the “method can be used for on-line detection as it can deliver a classification decision each second in real time” (Dergachyova et al. at page 1089, paragraph 1, line 7).

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Jarc et al. and Dergachyova et al. as applied to claim 2 above, and further in view of Jin et al. (“SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network”).
The Jarc et al. and Dergachyova et al. combination discloses a method wherein generating the aggregated phase prediction for the respective block comprises: 
generating the aggregated phase prediction for the block as a phase corresponding to the highest probability (the sequence of tasks are grouped into their respective stages according to their identified phases, each of which represent the most probable phase); and 
determining a confidence level of the aggregated phase prediction as the highest probability (“Confidence measurement engine 1230 is programmed, or otherwise configured, to compute a confidence score representing a probability of correct segmentation. The confidence score may be computed with cross-entropy and log-probability computations” Jarc et al. at page 28, line 12).
The Jarc et al. and Dergachyova et al. combination does not explicitly disclose averaging the probability vectors for the one or more video frames to generate an averaged probability vector for the block.
Jin et al. teaches a method in the same field of endeavor of surgical workflow prediction, wherein generating the aggregated phase prediction for the respective block comprises: 
averaging the probability vectors for the one or more video frames to generate an averaged probability vector for the block (“The ResNet ends with a 7 × 7 average pooling layer to extract the global features from each frame and finally outputs a 2048-dimensional feature vector” at page 1117, left column, line 8).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize a ResNet architecture as taught by Jin et al. for the RNN of the Jarc et al. and Dergachyova et al. combination to “encode both visual features and temporal dependencies in an end-to-end architecture for improving the recognition accuracy” (Jin et al. at page 1115, right column, last sentence).

Claims 5, 6, 12 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over the Jarc et al. and Dergachyova et al. combination as applied to claims 4, 11 and 16 above, and further in view of Volkov et al. (“Machine Learning and Coresets for Automated Real-Time Video Segmentation of Laparoscopic and Robot-Assisted Surgery”).
Regarding claim 5, the Jarc et al. and Dergachyova et al. combination discloses a method as described in claim 4 above.
The Jarc et al. and Dergachyova et al. combination does not explicitly disclose in response to determining that the confidence level of the aggregated phase prediction is lower than the threshold, determining the predicted phase of the block to be a predicted phase of a previous block.
	Volkov et al. teaches a method in the same field of endeavor of surgical workflow prediction, comprising: 
in response to determining that the confidence level of the aggregated phase prediction is lower than the threshold, determining the predicted phase of the block to be a predicted phase of a previous block (“We have a matrix consisting of several independent SVM output vectors (with a memory trail of the last β-1 such vectors). The observation function then updates the current phase, if and only if the vector sum for another phase exceeds the current one by a certainty threshold α, in which case we update our phase hypothesis to the next phase – otherwise the current phase persists” at page 757, right column, line 1).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the phase check as taught by Volkov et al. in the system of the Jarc et al. and Dergachyova et al. combination to update the phase indication only if it seems highly probable that the phase has indeed changed.
Regarding claim 6, the Jarc et al. and Dergachyova et al. combination discloses a method as described in claim 4 above.
The Jarc et al. and Dergachyova et al. combination does not explicitly disclose in response to determining that the predicted phase of the block comprises a phase earlier than the predicted phase of a previous block or a phase later than a predicted phase of a subsequent block, modifying the predicted phase of the block to be the phase prediction of the previous block.
	Volkov et al. teaches a method in the same field of endeavor of surgical workflow prediction, comprising: 
in response to determining that the predicted phase of the block comprises a phase earlier than the predicted phase of a previous block or a phase later than a predicted phase of a subsequent block, modifying the predicted phase of the block to be the phase prediction of the previous block (“We have a matrix consisting of several independent SVM output vectors (with a memory trail of the last β-1 such vectors). The observation function then updates the current phase, if and only if the vector sum for another phase exceeds the current one by a certainty threshold α, in which case we update our phase hypothesis to the next phase – otherwise the current phase persists” at page 757, right column, line 1).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the phase check as taught by Volkov et al. in the system of the Jarc et al. and Dergachyova et al. combination to update the phase indication only if it seems highly probable that the phase has indeed changed.
Regarding claim 12, the Jarc et al. and Dergachyova et al. combination discloses a method as described in claim 11 above.
The Jarc et al. and Dergachyova et al. combination does not explicitly disclose in response to determining that the confidence level of the phase prediction is lower than a threshold, generating the phase prediction for the respective block to be a phase prediction of a previous block.
	Volkov et al. teaches a method in the same field of endeavor of surgical workflow prediction, comprising: 
in response to determining that the confidence level of the phase prediction is lower than a threshold, generating the phase prediction for the respective block to be a phase prediction of a previous block (“We have a matrix consisting of several independent SVM output vectors (with a memory trail of the last β-1 such vectors). The observation function then updates the current phase, if and only if the vector sum for another phase exceeds the current one by a certainty threshold α, in which case we update our phase hypothesis to the next phase – otherwise the current phase persists” at page 757, right column, line 1).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the phase check as taught by Volkov et al. in the system of the Jarc et al. and Dergachyova et al. combination to update the phase indication only if it seems highly probable that the phase has indeed changed.
Regarding claim 18, the Jarc et al. and Dergachyova et al. combination discloses a method as described in claim 16 above.
The Jarc et al. and Dergachyova et al. combination does not explicitly disclose that refining the predicted phases comprises refining a boundary between two adjacent predicted phases of the video based on a combined feature vector of the two adjacent predicted phases.
	Volkov et al. teaches a method in the same field of endeavor of surgical workflow prediction, comprising: 
refining the predicted phases comprises refining a boundary between two adjacent predicted phases of the video based on a combined feature vector of the two adjacent predicted phases (“We have a matrix consisting of several independent SVM output vectors (with a memory trail of the last β-1 such vectors). The observation function then updates the current phase, if and only if the vector sum for another phase exceeds the current one by a certainty threshold α, in which case we update our phase hypothesis to the next phase – otherwise the current phase persists” at page 757, right column, line 1; the phases will be combined into a single phase based upon the certainty threshold).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the phase check as taught by Volkov et al. in the system of the Jarc et al. and Dergachyova et al. combination to update the phase indication only if it seems highly probable that the phase has indeed changed.

Claims 7-10 are rejected under 35 U.S.C. 103 as being unpatentable over the Jarc et al. and Dergachyova et al. combination as applied to claim 1 above, and further in view of Yao et al. (US 2019/0188567).
Regarding claim 7, the Jarc et al. and Dergachyova et al. combination discloses a method as described in claim 1 above.
The Jarc et al. and Dergachyova et al. combination does not explicitly disclose training the prediction model to find a set of parameters for the prediction model so that a value of a loss function using the set of parameters is smaller than the value of the loss function using another set of parameters, wherein the training is performed based on a plurality of training frames and respective labels of the training frames.
Yao et al. teaches a method in the same field of endeavor of neural network training, comprising training the prediction model to find a set of parameters for the prediction model so that a value of a loss function using the set of parameters is smaller than the value of the loss function using another set of parameters, wherein the training is performed based on a plurality of training frames and respective labels of the training frames (“For example, parameters updating module 103 may update the weights based on a portion of training data set 112 such that the portion is evaluated by the current iteration of the DNN model and appropriate correction or reweighting or the like is provided based on the difference (e.g., network loss) between the result from the current iteration of the DNN model and the ground truth result based on the known training set” at paragraph 0037, last sentence).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize a training architecture as taught by Yao et al. for the neural network of the Jarc et al. and Dergachyova et al. combination “to provide for network surgery that is dynamic and flexible” (Yao et al. at paragraph 0038, last sentence).
Regarding claim 8, the Jarc et al., Dergachyova et al. and Yao et al. combination discloses a method wherein training the prediction model comprises:
assigning a higher weight to a term (“For example, weights at the current iteration may be provided as updates to weights from the previous iterations such that updates include a product of a learning rate and a partial differential of the network loss for the weight” Yao et al. at paragraph 0051, line 7).
The Jarc et al., Dergachyova et al. and Yao et al. combination does not explicitly disclose receiving an indication that one of the plurality of training frames is a representative frame; and assigning a higher weight to a term associated with the representative frame in the loss function. 
However, given that a training frame is flagged accordingly, it would therefore follow that appropriate classification of that frame is important in the training of the classifier.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to more heavily weight a representative training frame to ensure that incorrect classification has a higher associated penalty.
Regarding claim 9, the Jarc et al., Dergachyova et al. and Yao et al. combination discloses a method wherein training the prediction model comprises training the prediction model under a constraint that a logical relationship between the plurality of phases cannot be violated (phases will be classified in chronological order based upon the sequence of frames received by the system).
Regarding claim 10, the Jarc et al., Dergachyova et al. and Yao et al. combination discloses a method wherein training the prediction model is further performed based on a plurality of unlabeled training videos (“Input: X: training datum (with or without label)” Yao et al. at paragraph 0053, Pseudocode A, line 1).

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over the Jarc et al. and Dergachyova et al. combination as applied to claim 11 above, and further in view of Flexman et al. (US 2020/0126661).
The Jarc et al. and Dergachyova et al. combination discloses a method generating the feature vector for the one or more video frames in the block comprises: 
applying a prediction model to the one or more video frames to generate a prediction vector for each of the video frames (“Real-time segmentation assessor 1200 may be applied intra-operatively to make a segmentation assessment with negligible latency insofar as the surgeon or medical personnel may perceive. In an example embodiment real-time segmentation assessor 1200 is configured as a recurrent neural network (RNN), using a long short-term memory (LSTM) RNN architecture for deep learning. As depicted, feature extractor 1202 applies filtering or other selection criteria to a sequence of assessed tasks and their corresponding parameters to create a feature vector as the input to RNN 1204” Jarc et al. at page 26, line 33); and 
aggregating the prediction vectors for the one or more video frames to generate the feature vector (collectively across the entirety of the procedure, the prediction vectors form the representative feature vector for the procedure).
The Jarc et al. and Dergachyova et al. combination does not explicitly disclose that generating the feature vector for the one or more video frames in the block comprises: applying a second prediction model to the one or more video frames to generate a prediction vector for each of the video frames; and aggregating the prediction vectors for the one or more video frames to generate the feature vector. 
Flexman et al. teaches a method surgical workflow prediction wherein generating the feature vector for the one or more video frames in the block comprises: 
applying a second prediction model to the one or more video frames to generate a prediction vector for each of the video frames (“employ the model or models 142 for predicting (or guiding) future behavior based upon past events or event sequences” at paragraph 0025, last sentence; “The prediction module 115 employs the combination of data collected to generate a score for a plurality of possibilities for a next step. The highest score provides the most likely candidate model for a next step or actions. For example, one or more models 142 that store procedure activities can be determined by the prediction module 115 as to which model 142 best fits current activities” at paragraph 0035, line 1).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize multiple prediction models as taught by Flexman et al. in the system of the Jarc et al. and Dergachyova et al. combination as different models are better suited to identifying particular aspects of the workflow (see Flexman et al. at paragraphs 0035 and 0038).

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Jarc et al., Dergachyova et al. and Flexman et al. as applied to claim 13 above, and further in view of Jin et al.
The Jarc et al., Dergachyova et al. and Flexman et al. combination discloses a method as described in claim 11 above.
The Jarc et al., Dergachyova et al. and Flexman et al. combination does not explicitly disclose averaging the prediction vectors for the one or more video frames.
Jin et al. teaches a method in the same field of endeavor of surgical workflow prediction, wherein generating the aggregated phase prediction for the respective block comprises: 
averaging the prediction vectors for the one or more video frames (“The ResNet ends with a 7 × 7 average pooling layer to extract the global features from each frame and finally outputs a 2048-dimensional feature vector” at page 1117, left column, line 8).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize a ResNet architecture as taught by Jin et al. for the RNN of the Jarc et al., Dergachyova et al. and Flexman et al. combination to “encode both visual features and temporal dependencies in an end-to-end architecture for improving the recognition accuracy” (Jin et al. at page 1115, right column, last sentence).
. 
Allowable Subject Matter

Claim 17 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  the prior art of record does not teach or disclose that refining the predicted phases comprise: dividing the video into a plurality of regions; generating region feature vectors for the plurality of regions; building a directed graph based on rules describing logical relationship among the plurality of phases of the video of the surgical procedure; for each region: identifying one or more neighbor regions of the region based on the directed graph; combining region feature vectors of the region and the one or more neighboring regions to generate a combined region feature vector; and generating a refined predicted phase for the region by applying the combined region feature vector to a machine learning model configured to predict the refined predicted phase for the combined region feature vector.


Response to Arguments

	Summary of Remarks (@ response page labeled 9): “Jarc does not teach at least these recitations of claims 1 and 11 because the prediction model in Jarc does not accept video frames as input and the feature vector disclosed in Jarc is not generated using video frames. Therefore, Jarc does not anticipate claims 1 and 11.

	Examiner’s Response: This argument is moot in view of the newly cited Dergachyova et al. reference.


Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KATRINA R FUJITA whose telephone number is (571)270-1574. The examiner can normally be reached Monday - Friday 9:30-5:30 pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on 5712723638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KATRINA R FUJITA/Primary Examiner, Art Unit 2662