DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This action is responsive to the request for continued examination (RCE), amendments and remarks received 28 June 2021. Claims 1 - 20 are currently pending.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 28 June 2021 has been entered.
 
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

The rejections to claims 2, 3, 13, 15 and 16 under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, are hereby withdrawn in view of the amendments and remarks received 28 June 2021.

Response to Arguments
Applicant's arguments filed 28 June 2021 have been fully considered but they are not persuasive.
On pages 7 - 9 of the remarks the Applicant’s Representative argues that the previously cited “combination fails to teach or suggest (e.g., claim 1), ‘determining proposals in the video data stream, the proposals being candidate regions for temporal action in the video data stream; and calculating values for a pair-wise relation function for relating the proposals, wherein the pair-wise relation function, including similarities between a pair of features of the proposals of the candidate regions, calculates a scalar value representing a pair-wise relation weight for pairs of the proposals’” and “also fails to teach or suggest (e.g., claim 12), ‘wherein the pair-wise relation function calculates a scalar value representing a pair-wise relation weight for pairs of the proposals’”. The Applicant’s Representative argues that figures 9A and 9B of Escorcia et al., and their corresponding description, fail to teach or suggest the aforementioned disputed claim limitations at least because the weight of Escorcia et al. “is not provided for the nodes themselves 902 and 904 which are selected as proposals, but the location in between the two nodes, e.g., 908A. Therefore, the pair-wise relation function is not calculated where scalar value representing a pair-wise relation weight for pairs of the proposals themselves.” 

Initially, the Examiner asserts that Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references. Furthermore, the Examiner asserts that, at least, Escorcia et al. disclose the aforementioned disputed claim limitations, see at least the abstract, figures 4, 5, 8 - 9B and 11, page 2 paragraph 0029, page 5 paragraphs 0056 - 0058, page 6 paragraphs 0063 - 0066, page 7 paragraphs 0073 and 0075 - 0077, page 8 paragraphs 0080 - 0084 and page 9 paragraphs 0088 - 0090 of Escorcia et al. wherein it is disclosed that “FIGS. 9A and 9B illustrate examples of associating similar action proposals based on an affinity maximization”, that the “associated action proposals may be used to generate the action proposals for the sequence of frames. As shown in FIG. 9A, each possible action location of a first frame (t) corresponds to a node of first frame nodes 902 in a graph 900. Frame nodes 902, 904, 906 may correspond to a possible action location based on an actor detected by an actor detector or based on a best match region determined from a deformation invariant expansion (e.g., expansion and matching). Each of the first frame nodes 902 is associated with one or more second frame nodes 904 of a second frame (t+1)”, that each “edge 908 between the frame nodes 902, 904, 906 represents a similarity between connected nodes (e.g., actor boxes). Given the number of frame nodes 902, 904, 906 in each frame, aspects of the present disclosure identify the most similar nodes over time to generate the action i is a confidence (e.g., level of certainty) of a detection i at frame t. The confidence is determined by the object detector or a matching confidence. cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”, that the “variables xi, xj, xji, and xij are integer variables, between zero and one. The node selections are tracked from xi, xj, xji, and xij. When xi or xj is one, a node has been selected for a path. When xi or xj is zero, a node has not been selected for a ji or xij is one, node i and node j should be connected, when xji or xij is zero, node i and node j should not be connected” and that in “equation 1, x is a confidence value determining the probability that a node (xi) or an edge (xij) belongs to the proposal.” The Examiner asserts that in Escorcia et al. the weight is provided for the nodes themselves at least because, as shown herein above and in the abovementioned cited portions, Escorcia et al. disclose that their weights are associated with either the nodes themselves or an edge and further disclose that their edges represent a similarity between connected nodes, i.e., a scalar value representing a pair-wise relation weight for pairs of the proposals. Therefore, the Examiner asserts that at least Escorcia et al. disclose the aforementioned disputed claim limitation(s).   

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1, 5, 9 - 12, 14, 17 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. U.S. Publication No. 2020/0272823 A1 in view of Escorcia et al. U.S. Publication No. 2019/0108400 A1.

With regards to claim 1, Liu et al. disclose a method of temporal action localization in video data, (Liu et al., Abstract, Figs. 1 & 9 - 11, Pg. 1 ¶ 0002 and 0006, Pg. 2 ¶ 0028 - 0029) the method comprising: receiving a stream of video data; (Liu et al., Figs. 1, 2 & 9 - 11, Pg. 1 ¶ 0006, Pg. 2 ¶ 0029, Pg. 3 ¶ 0036, Pg. 9 ¶ 0098, Pg. 10 ¶ 0102, Pg. ¶ 0116) determining proposals in the video data stream, (Liu et al., Abstract, Fig. 1, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ 0036, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 10 ¶ 0109 - 0110) the proposals being candidate regions for temporal action in the video data stream; (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 4 ¶ 0040 - 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049 [“temporal proposals 150 can correspond to video segments that potentially enclose target actions”]) and calculating values for an attention function for the proposals, (Liu et al., Fig. 2, Pg. 3 ¶ 0037 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 11 ¶ 0119 - 0124) wherein the attention function calculates a scalar value representing a weight for the proposals. (Liu et al., Fig. 2, Pg. 3 ¶ 0037 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 11 ¶ 0119 - 0124) Liu et al. fail to disclose explicitly a pair-wise relation function for relating the proposals, wherein the pair-wise relation function, including similarities between a pair of features of the proposals of the candidate regions, calculates a value representing a pair-wise relation weight for pairs of the proposals. Pertaining to analogous art, Escorcia et al. disclose a method of temporal action localization in video data, (Escorcia et al., Pg. 1 ¶ 0002, 0004 and 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0056 - 0058, Pg. 8 ¶ 0087) the method comprising: receiving a stream of video data; (Escorcia et al., Abstract, Figs. 5, 6 & 11, Pg. 1 ¶ 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0058, Pg. 6 ¶ ij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.” and “the machine based vision system may compare possible action locations in a first frame to possible action locations in a second subsequent frame. The comparison may compare a learned similarity between possible action locations in the first frame and possible action locations in the second subsequent frame”]) wherein the pair-wise relation function, including similarities between a pair of features of the proposals of the candidate regions, calculates a scalar value representing a pair-wise relation weight for pairs of the proposals. (Escorcia et al., Figs. 8 - 9B & 11, Pg. 7 ¶ 0072 - 0073 and 0075 - 0077, Pg. 8 ¶ 0080 - 0084 [“Each edge 908 between the frame nodes 902, 904, 906 represents a similarity between connected nodes (e.g., actor boxes). Given the number of frame nodes 902, 904, i is a confidence (e.g., level of certainty) of a detection I at frame t. The confidence is determined by the object detector or a matching confidence. cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”, “When xji or xij is one, node i and node j should be connected, when xji or xij is zero, node i and node j should not be connected” and “In equation 1, x is a confidence value determining the probability that a node (xi) or an edge (xij) belongs to the proposal.”]) Liu et al. and Escorcia et al. are combinable because they are both directed towards temporal action localization and classification in video data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu et al. with the teachings of Escorcia et al. This modification would have been prompted in order to enhance the base device of Liu et al. with the well-known 

-	With regards to claim 5, Liu et al. in view of Escorcia et al. disclose the method of claim 1. Liu et al. fail to disclose explicitly wherein the pair-wise relation function comprises a cosine similarity function. Pertaining to analogous art, Escorcia et al. disclose wherein the pair-wise relation function comprises a cosine similarity function. (Escorcia et al., Pg. 8 ¶ 0083 - 0084 [“cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”]) 

-	With regards to claim 9, Liu et al. in view of Escorcia et al. disclose the method of claim 1, as implemented in a cloud service. (Liu et al., Fig. 7A, Pg. 7 ¶ 0070 and 0076, Pg. 7 ¶ 0078 - Pg. 8 ¶ 0083, Pg. 9 ¶ 0092, Pg. 11 ¶ 0125) 

-	With regards to claim 10, Liu et al. in view of Escorcia et al. disclose the method of claim 1, as embodied as a set of machine-readable instructions in a non-transitory memory device. (Liu et al., Fig. 7A, Pg. 1 ¶ 0008, Pg. 7 ¶ 0072 and 0078, Pg. 8 ¶ 0082 and 0086) 

-	With regards to claim 11, Liu et al. in view of Escorcia et al. disclose the method of claim 1. ([See analysis of claim 1 provided herein above.]) Liu et al. 

-	With regards to claim 12, Liu et al. disclose an apparatus, (Liu et al., Figs. 7A - 7C, Pg. 1 ¶ 0008 - Pg. 2 ¶ 0009, Pg. 7 ¶ 0070 - 0074, Pg. 7 ¶ 0076 - Pg. 8 ¶ 0082, Pg. 8 ¶ 0086, Pg. 8 ¶ 0088 - Pg. 9 ¶ 0093, Pg. 11 ¶ 0125) comprising: a processor; (Liu et al., Fig. 7A, Pg. 1 ¶ 0008, Pg. 7 ¶ 0072, 0074 and 0078, Pg. 8 ¶ 0082 and 0086) and a memory accessible by the processor, (Liu et al., Fig. 7A, Pg. 1 ¶ 0008, Pg. 7 ¶ 0072 and 0078, Pg. 8 ¶ 0082 and 0086) wherein the memory stores a set of machine-readable instructions permitting the processor to execute (Liu et al., Fig. 7A, Pg. 1 ¶ 0008, Pg. 7 ¶ 0072 and 0078, Pg. 8 ¶ 0082 and 0086) a method of temporal action localization in video data, (Liu et al., Abstract, Figs. 1 & 9 - 11, Pg. 1 ¶ 0002 and 0006, Pg. 2 ¶ 0028 - 0029) the method comprising: receiving a stream of video data; (Liu et al., Figs. 1, 2 & 9 - 11, Pg. 1 ¶ 0006, Pg. 2 ¶ 0029, Pg. 3 ¶ 0036, Pg. 9 ¶ 0098, Pg. 10 ¶ 0102, Pg. ¶ 0116) determining proposals in the video data stream, (Liu et al., Abstract, Fig. 1, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ 0036, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 10 ¶ 0109 - 0110) the proposals being candidate regions for temporal action in the video data stream; (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 4 ¶ 0040 - 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049 [“temporal proposals 150 can correspond to attention function for the proposals, (Liu et al., Fig. 2, Pg. 3 ¶ 0037 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 11 ¶ 0119 - 0124) wherein the attention function calculates a scalar value representing a weight for the proposals. (Liu et al., Fig. 2, Pg. 3 ¶ 0037 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 11 ¶ 0119 - 0124) Liu et al. fail to disclose explicitly a pair-wise relation function for relating the proposals, wherein the pair-wise relation function calculates a value representing a pair-wise relation weight for pairs of the proposals. Pertaining to analogous art, Escorcia et al. disclose a method of temporal action localization in video data, (Escorcia et al., Pg. 1 ¶ 0002, 0004 and 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0056 - 0058, Pg. 8 ¶ 0087) the method comprising: receiving a stream of video data; (Escorcia et al., Abstract, Figs. 5, 6 & 11, Pg. 1 ¶ 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0058, Pg. 6 ¶ 0064, Pg. 8 ¶ 0085 - Pg. 9 ¶ 0091) determining proposals in the video data stream, (Escorcia et al., Abstract, Figs. 4, 5 & 11, Pg. 1 ¶ 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0056 - 0058, Pg. 6 ¶ 0061 - 0066, Pg. 8 ¶ 0087 - Pg. 9 ¶ 0088) the proposals being candidate regions for temporal action in the video data stream; (Escorcia et al., Abstract, Figs. 4 - 6, 10 & 11, Pg. 1 ¶ 0002, 0004 and 0007, Pg. 5 ¶ 0057 - 0058, Pg. 6 ¶ 0061 and 0066) and calculating values for a pair-wise relation function for relating the proposals, (Escorcia et al., Figs. 5, 8 - 9B & 11, Pg. 6 ¶ 0065, Pg. 7 ¶ 0071 - 0073 and 0075 - 0077, Pg. 8 ¶ 0080 - 0084, Pg. 9 ¶ 0088 - 0090 [“cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference i is a confidence (e.g., level of certainty) of a detection I at frame t. The confidence is determined by the object detector or a matching confidence. cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the ji or xij is one, node i and node j should be connected, when xji or xij is zero, node i and node j should not be connected” and “In equation 1, x is a confidence value determining the probability that a node (xi) or an edge (xij) belongs to the proposal.”]) Liu et al. and Escorcia et al. are combinable because they are both directed towards temporal action localization and classification in video data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu et al. with the teachings of Escorcia et al. This modification would have been prompted in order to enhance the base device of Liu et al. with the well-known technique Escorcia et al. applied to a comparable device. Calculating values for a pair-wise relation function that represent pair-wise relation weights for pairs of proposals, as taught by Escorcia et al., would enhance the base device of Liu et al. by improving its ability to reliably generate accurate temporal action proposals since related proposals would be able to be identified and connected thereby enhancing the ability of the base device to correctly locate and classify temporal actions in video data. Furthermore, this modification would have been prompted by the teachings and suggestions of Liu et al. to aggregate relevant proposals and to perform temporally weighted average pooling of proposals based on their determined relevance or importance, see at least page 3 paragraph 0036 - page 4 paragraph 0040 and page 4 paragraphs 0042 - 0048 of Liu et al. Moreover, this modification would have been prompted by the teachings and suggestions of 

-	With regards to claim 14, Liu et al. disclose a module, embodied as a set of machine-readable instructions in a non-transitory medium (Liu et al., Figs. 2 & 7A - 7C, Pg. 1 ¶ 0008, Pg. 2 ¶ 0028, Pg. 3 ¶ 0038, Pg. 7 ¶ 0071 - 0072, 0074 and 0078, Pg. 8 ¶ 0082 and 0086) for causing a processor to implement (Liu et al., Figs. 7A - 7C, Pg. 1 ¶ 0008, Pg. 7 ¶ 0072, 0074 and 0078, Pg. 8 ¶ 0082, 0086 and 0089 - 0091, Pg. 9 ¶ 0093 - 0095) a method of temporal action localization in video data, (Liu et al., Abstract, Figs. 1 & 9 - 11, Pg. 1 ¶ 0002 and 0006, Pg. 2 ¶ 0028 - 0029) the method comprising: receiving a stream of video data; (Liu et al., Figs. 1, 2 & 9 - 11, Pg. 1 ¶ 0006, Pg. 2 ¶ 0029, Pg. 3 ¶ 0036, Pg. 9 ¶ 0098, Pg. 10 ¶ 0102, Pg. ¶ 0116) determining proposals in the video data attention function for the proposals, (Liu et al., Fig. 2, Pg. 3 ¶ 0037 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 11 ¶ 0119 - 0124) wherein the attention function calculates a scalar value representing a weight for the proposals. (Liu et al., Fig. 2, Pg. 3 ¶ 0037 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 11 ¶ 0119 - 0124) Liu et al. fail to disclose explicitly a pair-wise relation function for relating the proposals, wherein the pair-wise relation function calculates a scalar value representing a pair-wise relation weight for pairs of the proposals. Pertaining to analogous art, Escorcia et al. disclose a method of temporal action localization in video data, (Escorcia et al., Pg. 1 ¶ 0002, 0004 and 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0056 - 0058, Pg. 8 ¶ 0087) the method comprising: receiving a stream of video data; (Escorcia et al., Abstract, Figs. 5, 6 & 11, Pg. 1 ¶ 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0058, Pg. 6 ¶ 0064, Pg. 8 ¶ 0085 - Pg. 9 ¶ 0091) determining proposals in the video data stream, (Escorcia et al., Abstract, Figs. 4, 5 & 11, Pg. 1 ¶ 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0056 - 0058, Pg. 6 ¶ 0061 - 0066, Pg. 8 ¶ 0087 - Pg. 9 ¶ 0088) the proposals being candidate regions for temporal action in the video data stream; (Escorcia et al., Abstract, Figs. 4 - 6, 10 & 11, Pg. 1 ¶ 0002, 0004 and 0007, Pg. 5 ¶ 0057 - 0058, Pg. 6 ¶ 0061 and 0066) and calculating values for a pair-wise relation ij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.” and “the machine based vision system may compare possible action locations in a first frame to possible action locations in a second subsequent frame. The comparison may compare a learned similarity between possible action locations in the first frame and possible action locations in the second subsequent frame”]) wherein the pair-wise relation function calculates a scalar value representing a pair-wise relation weight for pairs of the proposals. (Escorcia et al., Figs. 8 - 9B & 11, Pg. 7 ¶ 0072 - 0073 and 0075 - 0077, Pg. 8 ¶ 0080 - 0084 [“Each edge 908 between the frame nodes 902, 904, 906 represents a similarity between connected nodes (e.g., actor boxes). Given the number of frame nodes 902, 904, 906 in each frame, aspects of the present disclosure identify the most similar nodes over time to generate the action proposals. For example, as shown in a graph 950 of FIG. 9B, based on a comparison between the first frame nodes 902 and the second frame nodes 904, an affinity maximization module may determine that the first frame node A of the first frame nodes 902 has a greatest similarity to the second frame node A of the second frame nodes 904”, “similarity may be determined based on a comparison of bounding box locations or a comparison of visual features i is a confidence (e.g., level of certainty) of a detection I at frame t. The confidence is determined by the object detector or a matching confidence. cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”, “When xji or xij is one, node i and node j should be connected, when xji or xij is zero, node i and node j should not be connected” and “In equation 1, x is a confidence value determining the probability that a node (xi) or an edge (xij) belongs to the proposal.”]) Liu et al. and Escorcia et al. are combinable because they are both directed towards temporal action localization and classification in video data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu et al. with the teachings of Escorcia et al. This modification would have been prompted in order to enhance the base device of Liu et al. with the well-known technique Escorcia et al. applied to a comparable device. Calculating values for a pair-wise relation function that represent pair-wise relation weights for pairs of proposals, as taught by Escorcia et al., would enhance the base device of Liu et al. by improving its ability to reliably generate accurate temporal action proposals since related proposals would be able to be identified and connected thereby enhancing the ability of the base device to correctly locate and classify temporal actions in video data. Furthermore, this modification would have been prompted by the teachings 

-	With regards to claim 17, Liu et al. in view of Escorcia et al. disclose the module of claim 14, as implemented in a cloud service. (Liu et al., Fig. 7A, Pg. 7 ¶ 0070 and 0076, Pg. 7 ¶ 0078 - Pg. 8 ¶ 0083, Pg. 9 ¶ 0092, Pg. 11 ¶ 0125) 

 

Claims 2, 3, 13, 15 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. U.S. Publication No. 2020/0272823 A1 in view of Escorcia et al. U.S. Publication No. 2019/0108400 A1 as applied to claims 1, 12 and 14 above, and further in view of Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, and Dahua Lin, “Temporal Action Detection with Structured Segment Networks”, arXiv, arXiv:1704.06228v2, 18 Sept. 2017, pages 1 - 10, herein referred to as “Zhao et al.”.

-	With regards to claim 2, Liu et al. in view of Escorcia et al. disclose the method of claim 1, as incorporated into a two-stage temporal action localization processing comprising a first stage of generating proposals which are likely to include actions (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ start;tend], can be given a score for each class c, given by the weighted average T-CAM of all the frames within the proposal, as given by Equation (7)” and “This value corresponds to the temporal proposal score in each stream for class c. Finally, non-maximum suppression among temporal proposals of each class can be performed independently to remove highly overlapped detections.”]) wherein the generated proposals comprise all of the proposals in the video data stream. (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ 0036, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 7 ¶ 0069, Pg. 9 ¶ 0100, Pg. 10 ¶ 0108 - 0111) Liu et al. fail to disclose explicitly performing a boundary regression on each of the generated proposals. Pertaining to analogous art, Escorcia et al. disclose wherein the proposals comprise all proposals in the video data stream. (Escorcia et al., Abstract, Figs. 4 - 8, 10 & 11, Pg. 1 ¶ 0002 and 0007, Pg. 2 ¶ 0029, Pg. 3 ¶ 0033, Pg. 5 ¶ 0056 - 0059, Pg. 6 ¶ 0061 - 0067, Pg. 7 ¶ 0072, Pg. 8 ¶ 0080 and 0082, Pg. 8 ¶ 0085 - Pg. 9 ¶ 0089) Escorcia et al. fail to disclose explicitly performing a boundary regression on each of the generated proposals. Pertaining to analogous art, Zhao et al. disclose the method as incorporated into a two-stage temporal action localization processing comprising a first stage of 

-	With regards to claim 3, Liu et al. in view of Escorcia et al. in view of Zhao et al. disclose the method of claim 2. Liu et al. fail to disclose explicitly wherein the two-stage temporal action localization processing comprises a Structured Segment Network (SSN). Pertaining to analogous art, Zhao et al. disclose wherein the two-stage temporal action localization processing comprises a Structured Segment Network (SSN). (Zhao et al., Pg. 1 Abstract, Pg. 1 Fig. 1, Pg. 2 Left-Hand Column Third-Full Paragraph - Fourth-Full Paragraph, Pg. 3 § 3 ¶ 1, Pg. 3 Fig. 2, Pg. 5 § 4, Pg. 8 § 7) 

-	With regards to claim 13, Liu et al. in view of Escorcia et al. disclose the apparatus of claim 12, wherein the method is incorporated into a two-stage temporal action localization processing comprising a first stage of generating proposals which are likely to contain actions (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ 0036, Pg. 4 ¶ 0040 - 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 10 ¶ 0109 - 0110 [“temporal proposals 150 can correspond to video start;tend], can be given a score for each class c, given by the weighted average T-CAM of all the frames within the proposal, as given by Equation (7)” and “This value corresponds to the temporal proposal score in each stream for class c. Finally, non-maximum suppression among temporal proposals of each class can be performed independently to remove highly overlapped detections.”]) and wherein the generated proposals comprise all proposals in the video data stream. (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ 0036, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 7 ¶ 0069, Pg. 9 ¶ 0100, Pg. 10 ¶ 0108 - 0111) Liu et al. fail to disclose explicitly performing a boundary regression on each of the generated proposals. Pertaining to analogous art, Escorcia et al. disclose wherein the generated proposals comprise all proposals in the video data stream. (Escorcia et al., Abstract, Figs. 4 - 8, 10 & 11, Pg. 1 ¶ 0002 and 0007, Pg. 2 ¶ 0029, Pg. 3 ¶ 0033, Pg. 5 ¶ 0056 - 0059, Pg. 6 ¶ 0061 - 0067, Pg. 7 ¶ 0072, Pg. 8 ¶ 0080 and 0082, Pg. 8 ¶ 0085 - Pg. 9 ¶ 0089) Escorcia et al. fail to disclose explicitly performing a boundary regression on each of the generated proposals. Pertaining to analogous art, Zhao et al. disclose wherein the method is incorporated into a two-stage temporal action localization processing comprising a first stage of generating proposals which are likely to contain actions (Zhao et al., Pg. 1 Abstract, Pg. 3 § 3 - § 3.1, Pg. 3 Fig. 2, Pgs. 5 - 6 § 5, 

-	With regards to claim 15, Liu et al. in view of Escorcia et al. disclose the module of claim 14, as incorporated into a two-stage temporal action localization processing comprising a first stage of generating proposals which are likely to contain actions (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ 0036, Pg. 4 ¶ 0040 - 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 10 ¶ 0109 - 0110 [“temporal proposals 150 can correspond to video segments that potentially enclose target actions”]) and a second stage of performing a classification on each of the generated proposals individually. (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028, Pg. 3 ¶ 0036 - 0038, Pg. 4 ¶ 0040, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 9 ¶ 0100, Pg. 10 ¶ 0110 - 0111 [“Then, each proposal 150, defined by [tstart;tend], can be given a score for each class c, given by the weighted average T-CAM of all the frames within the proposal, as given by Equation (7)” and “This value corresponds to the temporal proposal score in each stream for class c. Finally, non-maximum suppression among temporal proposals of each class can be performed independently to remove highly overlapped detections.”]) Liu et al. fail to disclose explicitly performing a boundary regression on each of the generated proposals. Pertaining to analogous art, Zhao et al. disclose the module as 

-	With regards to claim 16, Liu et al. in view of Escorcia et al. in view of Zhao et al. disclose the module of claim 15. Liu et al. fail to disclose explicitly wherein the two-stage temporal action localization processing comprises a Structured Segment Network (SSN). Pertaining to analogous art, Zhao et al. disclose wherein the two-stage temporal action localization processing comprises a Structured Segment Network (SSN). (Zhao et al., Pg. 1 Abstract, Pg. 1 Fig. 1, Pg. 2 Left-Hand Column Third-Full Paragraph - Fourth-Full Paragraph, Pg. 3 § 3 ¶ 1, Pg. 3 Fig. 2, Pg. 5 § 4, Pg. 8 § 7) 

Claims 4, 6 - 8, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. U.S. Publication No. 2020/0272823 A1 in view of Escorcia et al. U.S. Publication No. 2019/0108400 A1 as applied to claims 1 and 14 above, and further in view of He et al. U.S. Publication No. 2019/0156210 A1.

-	With regards to claim 4, Liu et al. in view of Escorcia et al. disclose the method of claim 1. Liu et al. fail to disclose explicitly wherein the pair-wise relation function comprises a calculation of a similarity between two features of pairs of the proposals followed by a softmax operation. Pertaining to analogous ij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”]) Escorcia et al. fail to disclose explicitly wherein the pair-wise relation function comprises a calculation of a similarity followed by a softmax operation. Pertaining to analogous art, He et al. disclose wherein the pair-wise relation function comprises a calculation of a similarity between two features of pairs of the proposals followed by a softmax operation. (He et al., Pg. 12 ¶ 0071, Pg. 13 ¶ 0074 - 0075 and 0078 - 0080, Pg. 14 ¶ 0082 - 0086, Pg. 14 ¶ 0090 - Pg. 15 ¶ 0091, Pg. 15 ¶ 0095, Pg. 18 ¶ 0113) Liu et al. in view of Escorcia et al. disclose and He et al. are combinable because they are all directed towards temporal action localization and classification in video data and, similar to Escorcia et al., He et al. is also directed towards utilizing a pairwise relation function to calculate values representing pairwise relation weights between temporally related pairs of input. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined teachings of Liu et al. in view of Escorcia et al. disclose with the teachings of He et al. This modification would have been prompted in order to substitute the pair-wise relation function of Escorcia et al. for the pairwise 

-	With regards to claim 6, Liu et al. in view of Escorcia et al. disclose the method of claim 1. Liu et al. fail to disclose explicitly wherein the pair-wise relation function comprises a dot product of two embedding feature vectors. Pertaining to analogous art, He et al. disclose wherein the pair-wise relation function comprises a dot product of two embedding feature vectors. (He et al., Pg. 13 ¶ 0078 - 0080, Pg. 14 ¶ 0082 and 0086, Pg. 14 ¶ 0088 - Pg. 15 ¶ 0091, Pg. 15 ¶ 0094 - 0096, Pg. 16 ¶ 0100, Pg. 23 ¶ 0137 - 0139) Liu et al. in view of Escorcia et al. disclose and He et al. are combinable because they are all directed towards temporal action localization and classification in video data and, 

-	With regards to claim 7, Liu et al. in view of Escorcia et al. disclose the method of claim 1, further comprising: a self-attention mechanism. (Liu et al., the pair-wise relation function comprises a self-attention mechanism. Pertaining to analogous art, He et al. disclose wherein the pair-wise relation function comprises a self-attention mechanism. (He et al., Pg. 13 ¶ 0078 - 0080, Pg. 14 ¶ 0082 and 0086 - 0090, Pg. 18 ¶ 0113) Liu et al. in view of Escorcia et al. disclose and He et al. are combinable because they are all directed towards temporal action localization and classification in video data and, similar to Escorcia et al., He et al. is also directed towards utilizing a pairwise relation function to calculate values representing pairwise relation weights between temporally related pairs of input. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined teachings of Liu et al. in view of Escorcia et al. disclose with the teachings of He et al. This modification would have been prompted in order to substitute the pair-wise relation function of Escorcia et al. for the pairwise function of He et al. The pairwise function of He et al. could be substituted in place of the pair-wise relation function of Escorcia et al. utilizing well-known techniques in the art and would likely yield predictable results, in that in the combination the pairwise function of He et al. would be utilized to determine the similarity, affinity, between pairs of proposals. Furthermore, this modification would have been prompted by the teachings and suggestions of Escorcia et al. that the similarity, affinity, between proposals may be determined in a variety of different fashions, see at least page 7 paragraphs 0071 - 0076 and page 8 paragraphs 0082 - 0084 of Escorcia et al. This 

-	With regards to claim 8, Liu et al. in view of Escorcia et al. disclose the method of claim 1, further comprising: a fully-connected (fc) layer. (Liu et al., Figs. 1 & 2, Pg. 3 ¶ 0038, Pg. 11 ¶ 0118 - 0124) Liu et al. fail to disclose explicitly wherein the pair-wise relation function is implemented in a fully-connected (fc) layer. Pertaining to analogous art, He et al. disclose wherein the pair-wise relation function is implemented in a fully-connected (fc) layer. (He et al., Pg. 14 ¶ 0082 - 0088, Pg. 15 ¶ 0092 - 0096, Pg. 16 ¶ 0100 [“another choice of the pairwise function f may be based on a concatenation form, which is formulated as: f(xi,xj)=ReLU(                        
                            
                                
                                    w
                                
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    [θ(xi),Φ(xj)]) (5) As used herein, [∙,∙] may denote concatenation and wf may indicate a weight vector that projects the concatenated vector to a scaler. In particular embodiments, the normalization factor may be set as C(x)=N. As used herein, ReLU may indicate a function of a rectified linear unit.” The Examiner asserts that the pairwise function of He et al. that is based on their disclosed concatenation form corresponds to the claimed limitation at least because the instant specification discloses a substantially similar process for implementing the pair-wise relation function in an fc layer, see at least page 3 paragraphs 0041 - 0043 of the instant application’s corresponding patent application publication.]) Liu et al. in view of Escorcia et al. disclose and He et al. 

-	With regards to claim 19, Liu et al. in view of Escorcia et al. disclose the module of claim 14. Liu et al. fail to disclose explicitly wherein the pair-wise relation function comprises a calculation of a similarity between two features of pairs of the proposals followed by a softmax operation. Pertaining to analogous art, Escorcia et al. disclose wherein the pair-wise relation function comprises a calculation of a similarity between two features of pairs of the proposals. (Escorcia et al., Figs. 8 & 11, Pg. 7 ¶ 0072 - 0073 and 0075, Pg. 8 ¶ 0080 - 0084 [“cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”]) Escorcia et al. fail to disclose explicitly wherein the pair-wise relation function comprises a calculation of a similarity followed by a softmax operation. Pertaining to analogous art, He et al. disclose wherein the pair-wise relation function comprises a calculation of a similarity between two features of pairs of the proposals followed by a softmax operation. (He et al., Pg. 12 ¶ 0071, Pg. 13 ¶ 0074 - 0075 and 0078 - 0080, Pg. 14 ¶ 0082 - 0086, Pg. 14 ¶ 0090 - Pg. 15 ¶ 0091, Pg. 15 ¶ 0095, Pg. 18 ¶ 0113) Liu et al. in view of Escorcia et al. disclose and He et al. are combinable because they are all directed towards temporal action localization and classification in video data and, similar to Escorcia et al., He et al. is also directed towards utilizing a pairwise relation function to calculate values representing pairwise relation weights between temporally related pairs of input. It would have been obvious to one of 

-	With regards to claim 20, Liu et al. in view of Escorcia et al. disclose the module of claim 14, further comprising: a fully-connected (fc) layer. (Liu et al., Figs. 1 & 2, Pg. 3 ¶ 0038, Pg. 11 ¶ 0118 - 0124) Liu et al. fail to disclose explicitly wherein the pair-wise relation function comprises a fully-connected (fc) layer. Pertaining to analogous art, He et al. disclose wherein the pair-wise relation fc) layer. (He et al., Pg. 14 ¶ 0082 - 0088, Pg. 15 ¶ 0092 - 0096, Pg. 16 ¶ 0100 [“another choice of the pairwise function f may be based on a concatenation form, which is formulated as: f(xi,xj)=ReLU(                        
                            
                                
                                    w
                                
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    [θ(xi),Φ(xj)]) (5) As used herein, [∙,∙] may denote concatenation and wf may indicate a weight vector that projects the concatenated vector to a scaler. In particular embodiments, the normalization factor may be set as C(x)=N. As used herein, ReLU may indicate a function of a rectified linear unit.” The Examiner asserts that the pairwise function of He et al. that is based on their disclosed concatenation form corresponds to the claimed limitation at least because the instant specification discloses a substantially similar process for implementing the pair-wise relation function in an fc layer, see at least page 3 paragraphs 0041 - 0043 of the instant application’s corresponding patent application publication.]) Liu et al. in view of Escorcia et al. disclose and He et al. are combinable because they are all directed towards temporal action localization and classification in video data and, similar to Escorcia et al., He et al. is also directed towards utilizing a pairwise relation function to calculate values representing pairwise relation weights between temporally related pairs of input. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined teachings of Liu et al. in view of Escorcia et al. disclose with the teachings of He et al. This modification would have been prompted in order to substitute the pair-wise relation function of Escorcia et al. for the pairwise function of He et al. The pairwise function of He et al. could be substituted in place of the pair-wise . 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
a.	Ahammad et al. U.S. Publication No. 2008/0310734 A1; which is directed towards action-recognition systems and methods for detecting and locating a particular action in a video that utilizes motion similarity values representing similarity between a group of pixels in a frame of a query video and a group of pixels in a frame of a test video during operation. 
b.	Gupta et al. U.S. Publication No. 2019/0266407 A1; which is directed towards methods and systems for classifying actions in video segments in 
c.	Han et al. U.S. Publication No. 2009/0316983 A1; which is directed towards a method and system for real-time action detection and classification that utilizes a strong classifier created based on motion patterns. 
d.	Lan et al. U.S. Publication No. 2019/0080176 A1; which is directed towards an on-line action detection system that provides frame-wise action recognition utilizing a trained neural network model having the ability of detecting an action with the knowledge of the current frame and probably preceding frame(s). 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC RUSH whose telephone number is (571) 270-3017. The examiner can normally be reached on 9am - 5pm Monday - Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571) 272 - 7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published 



/ERIC RUSH/Primary Examiner, Art Unit 2667