DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This action is responsive to the amendments and remarks received 30 November 2021. Claims 1 - 20 are currently pending. 

Claim Objections
Claim 1 is objected to because of the following informalities: Line 5 of claim 1 recites, in part, “regions for temporal localization action in the video” which appears to contain inconsistent claim terminology and/or a minor informality. The Examiner suggests amending the claim to --regions for temporal action localization. Appropriate correction is required.
Claim 14 is objected to because of the following informalities: Lines 8 - 9 of claim 14 recite, in part, “a scalar value representing at least pair-wise relation weight for pairs” which appears to contain a grammatical error and/or minor informality. The Examiner suggests amending the claim to --a scalar value representing at least a pair-wise relation weight for pairs-- in order to improve the clarity and precision of the claim. Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1 - 11 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation "the at least a pair-wise relation weight" in lines 8 - 9. There is insufficient antecedent basis for this limitation in the claim.
Claim 1 recites the limitation "the at least the pair of the proposals" in line 9. There is insufficient antecedent basis for this limitation in the claim.
Claims 2 - 11 are also rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, due to being dependent upon a rejected base claim but would overcome the rejection if their base claim overcomes the rejection.

Response to Arguments
Applicant's arguments filed 30 November 2021 have been fully considered but they are not persuasive.
On pages 7 - 8 of the remarks the Applicant’s Representative argues that Liu et al. do not disclose or suggest a “pair-wise relation function for relating the proposals, wherein the pair-wise relation function calculates a value representing at least pair-wise relation weight for pairs of the proposals”. The Applicant’s 
The Examiner respectfully disagrees, in part. 
Initially, in response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Furthermore, the Examiner asserts that Liu et al. was/is not relied upon to disclose the aforementioned disputed claim limitation. Liu et al. was/is relied upon to disclose at least an “attention function for the proposals, wherein the attention function calculates a scalar value representing the at least a weight for the proposals”, see at least figures 1 - 2, page 3 paragraph 0037 - page 4 paragraph 0045, page 4 paragraph 0047 - page 5 paragraph 0050 and page 11 paragraphs 0119 - 0124 of Liu et al. Additionally, although Liu et al. describe their proposals as one-dimensional temporal action proposals, the Examiner asserts that the proposals of Liu et al. correspond to a subset of two-dimensional frames of video data, i.e., a one-dimensional temporal interval of the video data, see at least page 3 paragraphs 0031 and 0035 - 0036, page 4 paragraph 0040, page 4 paragraph 0047 - page 5 paragraph 0049, page 5 paragraph 0056 and page 11 paragraphs 0118 - 0119 of Liu et al. wherein they disclose, for example, that “Example models according to start;tend]”. However, the Examiner noted/notes that Liu et al. fail to disclose explicitly “a pair-wise relation function for relating the proposals, wherein the pair-wise relation function, including similarities between at least a pair of features of the proposals of the candidate regions, calculates a value representing the at least a pair-wise relation weight for the at least the pair of the proposals.” Pertaining to analogous art, Escorcia et al. was/is relied upon to disclose “at least a pair-wise relation function for relating the proposals, wherein the pair-wise relation function, including similarities between at least a pair of features of the proposals of the candidate regions, calculates a scalar value representing the at least a pair-wise relation weight for the at least the pair of the proposals”, see at least figures 5, 8 - 9B and 11, page 6 paragraphs 0063 - 0066, page 7 paragraphs 0072 - 0073 and 0075 - 0077, page 8 paragraphs 0080 - 0084 and page 9 paragraphs 0089 - 0090 of Escorcia et al. wherein it is disclosed that “a possible action location 800 (e.g., box proposal) is detected at a current frame t. Based on the possible action location 800 of the current frame t, possible action location samples 802 are generated for a number of consecutive frames”, that “the number of consecutive frames is not limited to only a subsequent frame from the current frame. Rather, any number of frames may be used, such that a best match region is determined for each frame from frame t+1 to frame t+n”, that “FIGS. 9A and 9B illustrate i is a confidence (e.g., level of certainty) of a detection I at frame t. The confidence is determined by the object detector or a matching confidence. cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.” and that to “associate a most similar possible action location, the machine based vision system may compare possible action locations in a first frame to possible action locations in a second subsequent frame. The comparison may compare a learned similarity between possible action locations in the first frame and possible action locations in the second subsequent frame. The learned similarity may be a learned semantic visual feature similarity between possible action locations in the first frame and possible action locations in the second subsequent frame.” The Examiner asserts that, as shown herein above, Escorcia et al. disclose associating similar action proposals based on affinity maximization by calculating a confidence value (scalar value) defining the similarity (e.g., affinity) between pairs of proposals, that a cosine similarity between features obtained from the bounding boxes can be utilized as the similarity between pairs of proposals and identifying the most similar proposals over time. The Examiner 
On page 9 of the remarks the Applicant’s Representative argues that the combination of Liu et al. in view of Escorcia et al. is improper because the “one-dimensional features explicitly teaches away from the pair-wise relation function as claimed.” 
The Examiner respectfully disagrees. 
Initially, the Examiner asserts that it is unclear as to how one-dimensional features would teach away from the pair-wise relation function. The Examiner asserts that, as shown herein above in section 10a, the proposals of Liu et al. correspond to a subset of two-dimensional frames of video data. Furthermore, the Examiner asserts that Liu et al. disclose processing m dimensional feature representations to identify frames relevant to any action and estimate time intervals for action candidates, see at least page 3 paragraph 0038 - page 4 paragraph 0040, page 4 paragraph 0047 - page 5 paragraph 0050, page 5 
On pages 9 - 11 of the remarks the Applicant’s Representative argues that Escorcia et al. do not teach or suggest wherein the “pair-wise relation function (including similarities between at least a pair of features of the proposals of the candidate regions, calculates a scalar value representing the at least a pair-wise relation weight for the at least the pair of the proposals).” The Applicant’s Representative argues that the disclosure by Escorcia et al. of comparing possible action locations, nodes, between frames to determine a possible action location in a first frame that has a greatest similarity to a possible action location in a second frame and setting a value of an edge between possible action locations in the first and second frames that have the greatest similarity to one (1) does not disclose the aforementioned disputed claim limitation(s) and also shows that the weight in Escorcia et al. “is not provided in representing a pair-wise relation weight for at least pairs of the proposals.” 
The Examiner respectfully disagrees. 
The Examiner asserts that Escorcia et al. disclose the aforementioned disputed claim limitation(s), see at least figures 5, 8 - 9B and 11, page 6 paragraphs 0063 - 0066, page 7 paragraphs 0072 - 0073 and 0075 - 0077, page i is a confidence (e.g., level of certainty) of a detection I at frame t. The confidence is determined by the object detector or a matching confidence. cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.” and that to “associate a most similar possible action location, the machine based vision system may compare possible action locations in a first frame to possible action locations in a second subsequent frame. The comparison may compare a learned similarity between possible action locations in the first frame and possible action locations in the second subsequent frame. The learned similarity may be a learned semantic visual feature similarity between possible action . 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1, 5, 9 - 12, 14, 17 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. U.S. Publication No. 2020/0272823 A1 in view of Escorcia et al. U.S. Publication No. 2019/0108400 A1.

-	With regards to claim 1, Liu et al. disclose a method of temporal action localization in video data, (Liu et al., Abstract, Figs. 1 & 9 - 11, Pg. 1 ¶ 0002 and 0006, Pg. 2 ¶ 0028 - 0029) the method comprising: receiving a stream of video data; (Liu et al., Figs. 1, 2 & 9 - 11, Pg. 1 ¶ 0006, Pg. 2 ¶ 0029, Pg. 3 ¶ 0036, Pg. 9 ¶ 0098, Pg. 10 ¶ 0102, Pg. ¶ 0116) determining proposals in the video data stream, (Liu et al., Abstract, Fig. 1, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ 0036, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 10 ¶ 0109 - 0110) the proposals including candidate regions for temporal localization action in the video data stream; (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 4 ¶ 0040 - 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 5 ¶ 0056 [“temporal proposals 150 can correspond to video segments that potentially enclose target actions”]) and calculating values for at least an attention function for the proposals, (Liu et al., Fig. 2, Pg. 3 ¶ 0037 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 11 ¶ 0119 - 0124) wherein the attention function calculates a scalar value representing the at least a weight for the proposals. (Liu et al., Fig. 2, Pg. 3 ¶ 0037 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 11 ¶ 0119 - 0124) Liu et al. fail to disclose explicitly a pair-wise relation function for relating the proposals, wherein the pair-wise relation function, including similarities between at least a pair of features of the proposals of the representing the at least a pair-wise relation weight for the at least the pair of the proposals. Pertaining to analogous art, Escorcia et al. disclose a method of temporal action localization in video data, (Escorcia et al., Pg. 1 ¶ 0002, 0004 and 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0056 - 0058, Pg. 8 ¶ 0087) the method comprising: receiving a stream of video data; (Escorcia et al., Abstract, Figs. 5, 6 & 11, Pg. 1 ¶ 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0058, Pg. 6 ¶ 0064, Pg. 8 ¶ 0085 - Pg. 9 ¶ 0091) determining proposals in the video data stream, (Escorcia et al., Abstract, Figs. 4, 5 & 11, Pg. 1 ¶ 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0056 - 0058, Pg. 6 ¶ 0061 - 0066, Pg. 8 ¶ 0087 - Pg. 9 ¶ 0088) the proposals including candidate regions for temporal localization action in the video data stream; (Escorcia et al., Abstract, Figs. 4 - 6, 10 & 11, Pg. 1 ¶ 0002, 0004 and 0007, Pg. 5 ¶ 0057 - 0058, Pg. 6 ¶ 0061 and 0066) and calculating values for at least a pair-wise relation function for relating the proposals, (Escorcia et al., Figs. 5, 8 - 9B & 11, Pg. 6 ¶ 0065, Pg. 7 ¶ 0071 - 0073 and 0075 - 0077, Pg. 8 ¶ 0080 - 0084, Pg. 9 ¶ 0088 - 0090 [“cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.” and “the machine based vision system may compare possible action locations in a first frame to possible action locations in a second subsequent frame. The comparison may compare a learned similarity between possible action locations in the first frame and possible action locations in the second i is a confidence (e.g., level of certainty) of a detection I at frame t. The confidence is determined by the object detector or a matching confidence. cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”, “When xji or xij is one, node i and node j should be connected, when xji or xij is zero, node i and node j should not be connected” and “In equation 1, x is a confidence value i) or an edge (xij) belongs to the proposal.”]) Liu et al. and Escorcia et al. are combinable because they are both directed towards temporal action localization and classification in video data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu et al. with the teachings of Escorcia et al. This modification would have been prompted in order to enhance the base device of Liu et al. with the well-known technique Escorcia et al. applied to a comparable device. Calculating values for a pair-wise relation function that represent pair-wise relation weights for pairs of proposals, as taught by Escorcia et al., would enhance the base device of Liu et al. by improving its ability to reliably generate accurate temporal action proposals since related proposals would be able to be identified and connected thereby enhancing the ability of the base device to correctly locate and classify temporal actions in video data. Furthermore, this modification would have been prompted by the teachings and suggestions of Liu et al. to aggregate relevant proposals and to perform temporally weighted average pooling of proposals based on their determined relevance or importance, see at least page 3 paragraph 0036 - page 4 paragraph 0040 and page 4 paragraphs 0042 - 0048 of Liu et al. Moreover, this modification would have been prompted by the teachings and suggestions of Escorcia et al. that calculating a similarity between pairs of proposals can help in generating accurate action proposals especially in situations wherein a potential action location was lost during tracking and/or wherein the proposals are noisy, see at least page 6 paragraphs 0061 and 0066 - 0068, page 7 paragraph 0074 and 

-	With regards to claim 5, Liu et al. in view of Escorcia et al. disclose the method of claim 1. Liu et al. fail to disclose explicitly wherein the pair-wise relation function comprises a cosine similarity function. Pertaining to analogous art, Escorcia et al. disclose wherein the pair-wise relation function comprises a cosine similarity function. (Escorcia et al., Pg. 8 ¶ 0083 - 0084 [“cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”]) 

-	With regards to claim 9, Liu et al. in view of Escorcia et al. disclose the method of claim 1, as implemented in a cloud service. (Liu et al., Fig. 7A, Pg. 7 ¶ 0070 and 0076, Pg. 7 ¶ 0078 - Pg. 8 ¶ 0083, Pg. 9 ¶ 0092, Pg. 11 ¶ 0125) 

-	With regards to claim 10, Liu et al. in view of Escorcia et al. disclose the method of claim 1, as embodied as a set of machine-readable instructions in a non-transitory memory device. (Liu et al., Fig. 7A, Pg. 1 ¶ 0008, Pg. 7 ¶ 0072 and 0078, Pg. 8 ¶ 0082 and 0086) 

-	With regards to claim 11, Liu et al. in view of Escorcia et al. disclose the method of claim 1. ([See analysis of claim 1 provided herein above.]) Liu et al. disclose a computer product comprising a non-transitory memory device having stored therein a set of machine-readable instructions permitting a processor to execute (Liu et al., Fig. 7A, Pg. 1 ¶ 0008, Pg. 7 ¶ 0072 and 0078, Pg. 8 ¶ 0082 and 0086) the method of claim 1. ([The Examiner asserts that Liu et al. in view of Escorcia et al. disclose the method of claim 1, see analysis of claim 1 provided herein above.]) 

-	With regards to claim 12, Liu et al. disclose an apparatus, (Liu et al., Figs. 7A - 7C, Pg. 1 ¶ 0008 - Pg. 2 ¶ 0009, Pg. 7 ¶ 0070 - 0074, Pg. 7 ¶ 0076 - Pg. 8 ¶ 0082, Pg. 8 ¶ 0086, Pg. 8 ¶ 0088 - Pg. 9 ¶ 0093, Pg. 11 ¶ 0125) comprising: a processor; (Liu et al., Fig. 7A, Pg. 1 ¶ 0008, Pg. 7 ¶ 0072, 0074 and 0078, Pg. 8 ¶ 0082 and 0086) and a memory accessible by the processor, (Liu et al., Fig. 7A, Pg. 1 ¶ 0008, Pg. 7 ¶ 0072 and 0078, Pg. 8 ¶ 0082 and 0086) wherein the memory stores a set of machine-readable instructions permitting the processor to execute (Liu et al., Fig. 7A, Pg. 1 ¶ 0008, Pg. 7 ¶ 0072 and 0078, Pg. 8 ¶ 0082 and 0086) a method of temporal action localization in video data, (Liu et al., Abstract, Figs. 1 & 9 - 11, Pg. 1 ¶ 0002 and 0006, Pg. 2 ¶ 0028 - 0029) the attention function for the proposals, (Liu et al., Fig. 2, Pg. 3 ¶ 0037 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 11 ¶ 0119 - 0124) wherein the attention function calculates a scalar value representing a weight for the proposals. (Liu et al., Fig. 2, Pg. 3 ¶ 0037 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 11 ¶ 0119 - 0124) Liu et al. fail to disclose explicitly a pair-wise relation function for relating the proposals, wherein the pair-wise relation function calculates a value representing a pair-wise relation weight for pairs of the proposals. Pertaining to analogous art, Escorcia et al. disclose a method of temporal action localization in video data, (Escorcia et al., Pg. 1 ¶ 0002, 0004 and 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0056 - 0058, Pg. 8 ¶ 0087) the method comprising: receiving a stream of video data; (Escorcia et al., Abstract, Figs. 5, 6 & 11, Pg. 1 ¶ 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0058, Pg. 6 ¶ 0064, Pg. 8 ¶ 0085 - Pg. 9 ¶ 0091) determining proposals in the video data stream, (Escorcia et al., Abstract, Figs. 4, 5 & 11, Pg. 1 ¶ 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0056 - 0058, Pg. 6 ¶ 0061 - 0066, Pg. 8 ¶ 0087 - Pg. 9 ¶ 0088) the proposals being candidate regions for temporal action in the video data ij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.” and “the machine based vision system may compare possible action locations in a first frame to possible action locations in a second subsequent frame. The comparison may compare a learned similarity between possible action locations in the first frame and possible action locations in the second subsequent frame”]) wherein the pair-wise relation function calculates a scalar value representing a pair-wise relation weight for pairs of the proposals. (Escorcia et al., Figs. 8 - 9B & 11, Pg. 7 ¶ 0072 - 0073 and 0075 - 0077, Pg. 8 ¶ 0080 - 0084 [“Each edge 908 between the frame nodes 902, 904, 906 represents a similarity between connected nodes (e.g., actor boxes). Given the number of frame nodes 902, 904, 906 in each frame, aspects of the present disclosure identify the most similar nodes over time to generate the action proposals. For example, as shown in a graph 950 of FIG. 9B, based on a comparison between the first frame nodes 902 and the second frame nodes 904, an affinity maximization module may determine that the first frame node A of the first frame nodes 902 has a greatest i is a confidence (e.g., level of certainty) of a detection I at frame t. The confidence is determined by the object detector or a matching confidence. cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”, “When xji or xij is one, node i and node j should be connected, when xji or xij is zero, node i and node j should not be connected” and “In equation 1, x is a confidence value determining the probability that a node (xi) or an edge (xij) belongs to the proposal.”]) Liu et al. and Escorcia et al. are combinable because they are both directed towards temporal action localization and classification in video data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu et al. with the teachings of Escorcia et al. This modification would have been prompted in order to enhance the base device of Liu et al. with the well-known technique Escorcia et al. applied to a comparable device. Calculating values for a pair-wise relation function that represent pair-wise relation weights for pairs of proposals, as taught by Escorcia et al., would enhance the base device of Liu et al. by improving its ability to reliably generate accurate temporal action proposals since related proposals would be able to be identified and connected thereby 

-	With regards to claim 14, Liu et al. disclose a module, embodied as a set of machine-readable instructions in a non-transitory medium (Liu et al., Figs. 2 & 7A - 7C, Pg. 1 ¶ 0008, Pg. 2 ¶ 0028, Pg. 3 ¶ 0038, Pg. 7 ¶ 0071 - 0072, 0074 attention function for the proposals, (Liu et al., Fig. 2, Pg. 3 ¶ 0037 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 11 ¶ 0119 - 0124) wherein the attention function calculates a scalar value representing at least weight for the proposals. (Liu et al., Fig. 2, Pg. 3 ¶ 0037 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 11 ¶ 0119 - 0124) Liu et al. fail to disclose explicitly a pair-wise relation function for relating the proposals, wherein the pair-wise relation function calculates a scalar value representing at least pair-wise relation weight for pairs of the proposals. Pertaining to analogous art, Escorcia et al. disclose a method of temporal action localization in video data, (Escorcia et al., Pg. 1 ¶ 0002, 0004 and 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0056 - 0058, Pg. 8 ¶ 0087) the method comprising: receiving a stream of video data; ij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.” and “the machine based vision system may compare possible action locations in a first frame to possible action locations in a second subsequent frame. The comparison may compare a learned similarity between possible action locations in the first frame and possible action locations in the second subsequent frame”]) wherein the pair-wise relation function calculates a scalar value representing at least pair-wise relation weight for pairs of the proposals. (Escorcia et al., Figs. 8 - 9B & 11, Pg. 7 ¶ 0072 - 0073 and 0075 - 0077, Pg. 8 ¶ 0080 - 0084 [“Each edge 908 between the frame nodes 902, 904, 906 represents a similarity between connected nodes (e.g., actor boxes). Given the number of frame nodes 902, 904, i is a confidence (e.g., level of certainty) of a detection I at frame t. The confidence is determined by the object detector or a matching confidence. cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”, “When xji or xij is one, node i and node j should be connected, when xji or xij is zero, node i and node j should not be connected” and “In equation 1, x is a confidence value determining the probability that a node (xi) or an edge (xij) belongs to the proposal.”]) Liu et al. and Escorcia et al. are combinable because they are both directed towards temporal action localization and classification in video data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu et al. with the teachings of Escorcia et al. This modification would have been prompted in order to enhance the base device of Liu et al. with the well-known 

-	With regards to claim 17, Liu et al. in view of Escorcia et al. disclose the module of claim 14, as implemented in a cloud service. (Liu et al., Fig. 7A, Pg. 7 ¶ 0070 and 0076, Pg. 7 ¶ 0078 - Pg. 8 ¶ 0083, Pg. 9 ¶ 0092, Pg. 11 ¶ 0125) 

-	With regards to claim 18, Liu et al. in view of Escorcia et al. disclose the module of claim 14, as embodied as a set of machine-readable instructions in the non-transitory medium including a non-transitory memory device, (Liu et al., Fig. 7A, Pg. 1 ¶ 0008, Pg. 2 ¶ 0028, Pg. 7 ¶ 0072 and 0078, Pg. 8 ¶ 0082 and 0086) and wherein the proposals comprise all proposals in the video data stream. (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ 0036, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 7 ¶ 0069, Pg. 9 ¶ 0100, Pg. 10 ¶ 0108 - 0111) In addition, Escorcia et al. disclose wherein the proposals comprise all proposals in the video data stream. (Escorcia et al., Abstract, Figs. 4 - 8, 10 & 11, Pg. 1 ¶ 0002 and 0007, Pg. 2 ¶ 0029, Pg. 3 ¶ 0033, Pg. 5 ¶ 0056 - 0059, Pg. 6 ¶ 0061 - 0067, Pg. 7 ¶ 0072, Pg. 8 ¶ 0080 and 0082, Pg. 8 ¶ 0085 - Pg. 9 ¶ 0089) 

Claims 2, 3, 13, 15 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. U.S. Publication No. 2020/0272823 A1 in view of Escorcia et al. U.S. Publication No. 2019/0108400 A1 as applied to claims 1, 12 and 14 above, and further in view of Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, and Dahua Lin, “Temporal Action Detection with Structured Segment Networks”, .

-	With regards to claim 2, Liu et al. in view of Escorcia et al. disclose the method of claim 1, as incorporated into a two-stage temporal action localization processing comprising a first stage of generating proposals which are likely to include actions (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ 0036, Pg. 4 ¶ 0040 - 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 10 ¶ 0109 - 0110 [“temporal proposals 150 can correspond to video segments that potentially enclose target actions”]) and a second stage of performing a classification on each of the generated proposals individually, (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028, Pg. 3 ¶ 0036 - 0038, Pg. 4 ¶ 0040, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 9 ¶ 0100, Pg. 10 ¶ 0110 - 0111 [“Then, each proposal 150, defined by [tstart;tend], can be given a score for each class c, given by the weighted average T-CAM of all the frames within the proposal, as given by Equation (7)” and “This value corresponds to the temporal proposal score in each stream for class c. Finally, non-maximum suppression among temporal proposals of each class can be performed independently to remove highly overlapped detections.”]) wherein the generated proposals comprise all of the proposals in the video data stream. (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ 0036, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 7 ¶ 0069, Pg. 9 ¶ 0100, Pg. 10 ¶ 0108 - 0111) Liu et al. fail to disclose explicitly performing a boundary regression on each of the generated proposals. Pertaining to analogous art, Escorcia et al. disclose wherein the proposals comprise all proposals in the video data stream. (Escorcia et al., 

-	With regards to claim 3, Liu et al. in view of Escorcia et al. in view of Zhao et al. disclose the method of claim 2. Liu et al. fail to disclose explicitly wherein the two-stage temporal action localization processing comprises a Structured Segment Network (SSN). Pertaining to analogous art, Zhao et al. disclose wherein the two-stage temporal action localization processing comprises a Structured Segment Network (SSN). (Zhao et al., Pg. 1 Abstract, Pg. 1 Fig. 1, Pg. 2 Left-Hand Column Third-Full Paragraph - Fourth-Full Paragraph, Pg. 3 § 3 ¶ 1, Pg. 3 Fig. 2, Pg. 5 § 4, Pg. 8 § 7) 

start;tend], can be given a score for each class c, given by the weighted average T-CAM of all the frames within the proposal, as given by Equation (7)” and “This value corresponds to the temporal proposal score in each stream for class c. Finally, non-maximum suppression among temporal proposals of each class can be performed independently to remove highly overlapped detections.”]) and wherein the generated proposals comprise all proposals in the video data stream. (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ 0036, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 7 ¶ 0069, Pg. 9 ¶ 0100, Pg. 10 ¶ 0108 - 0111) Liu et al. fail to disclose explicitly performing a boundary regression on each of the generated proposals. Pertaining to analogous art, Escorcia et al. disclose wherein the generated proposals comprise all proposals in the video data stream. (Escorcia et al., Abstract, Figs. 4 - 8, 10 & 11, Pg. 1 ¶ 0002 and 0007, Pg. 2 ¶ 0029, Pg. 3 ¶ 0033, Pg. 5 ¶ 0056 - 0059, Pg. 6 ¶ 0061 - 0067, Pg. 

-	With regards to claim 15, Liu et al. in view of Escorcia et al. disclose the module of claim 14, as incorporated into a two-stage temporal action localization processing comprising a first stage of generating proposals which are likely to contain actions (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ 0036, Pg. 4 ¶ 0040 - 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 10 ¶ 0109 - 0110 [“temporal proposals 150 can correspond to video segments that potentially enclose target actions”]) and a second stage of performing a classification on each of the generated proposals individually. (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028, Pg. 3 ¶ 0036 - 0038, Pg. 4 ¶ 0040, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 9 ¶ 0100, Pg. 10 ¶ 0110 - 0111 [“Then, each proposal 150, defined by [tstart;tend], can be given a score for each class c, given by the weighted average T-CAM of all 

-	With regards to claim 16, Liu et al. in view of Escorcia et al. in view of Zhao et al. disclose the module of claim 15. Liu et al. fail to disclose explicitly wherein the two-stage temporal action localization processing comprises a Structured Segment Network (SSN). Pertaining to analogous art, Zhao et al. disclose wherein the two-stage temporal action localization processing comprises a Structured Segment Network (SSN). (Zhao et al., Pg. 1 Abstract, Pg. 1 Fig. 1, Pg. 2 Left-Hand Column Third-Full Paragraph - Fourth-Full Paragraph, Pg. 3 § 3 ¶ 1, Pg. 3 Fig. 2, Pg. 5 § 4, Pg. 8 § 7) 

Claims 4, 6 - 8, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. U.S. Publication No. 2020/0272823 A1 in view of Escorcia  as applied to claims 1 and 14 above, and further in view of He et al. U.S. Publication No. 2019/0156210 A1.

-	With regards to claim 4, Liu et al. in view of Escorcia et al. disclose the method of claim 1. Liu et al. fail to disclose explicitly wherein the pair-wise relation function comprises a calculation of a similarity between two features of pairs of the proposals followed by a softmax operation. Pertaining to analogous art, Escorcia et al. disclose wherein the pair-wise relation function comprises a calculation of a similarity between two features of pairs of the proposals. (Escorcia et al., Figs. 8 & 11, Pg. 7 ¶ 0072 - 0073 and 0075, Pg. 8 ¶ 0080 - 0084 [“cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”]) Escorcia et al. fail to disclose explicitly wherein the pair-wise relation function comprises a calculation of a similarity followed by a softmax operation. Pertaining to analogous art, He et al. disclose wherein the pair-wise relation function comprises a calculation of a similarity between two features of pairs of the proposals followed by a softmax operation. (He et al., Pg. 12 ¶ 0071, Pg. 13 ¶ 0074 - 0075 and 0078 - 0080, Pg. 14 ¶ 0082 - 0086, Pg. 14 ¶ 0090 - Pg. 15 ¶ 0091, Pg. 15 ¶ 0095, Pg. 18 ¶ 0113) Liu et al. in view of Escorcia et al. disclose and He et al. are combinable because they are all directed towards temporal action localization and classification in video data and, similar to Escorcia et al., He et al. is also directed towards utilizing a pairwise 

-	With regards to claim 6, Liu et al. in view of Escorcia et al. disclose the method of claim 1. Liu et al. fail to disclose explicitly wherein the pair-wise relation function comprises a dot product of two embedding feature vectors. 

-	With regards to claim 7, Liu et al. in view of Escorcia et al. disclose the method of claim 1, further comprising: a self-attention mechanism. (Liu et al., Figs. 1 & 2, Pg. 3 ¶ 0036 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - 0048, Pg. 11 ¶ 0119 - 0124) Liu et al. fail to disclose explicitly wherein the pair-wise relation function comprises a self-attention mechanism. Pertaining to analogous art, He et al. disclose wherein the pair-wise relation function comprises a self-attention mechanism. (He et al., Pg. 13 ¶ 0078 - 0080, Pg. 14 ¶ 0082 and 0086 - 0090, Pg. 18 ¶ 0113) Liu et al. in view of Escorcia et al. disclose and He et al. are combinable because they are all directed towards temporal action localization and classification in video data and, similar to Escorcia et al., He et al. is also directed towards utilizing a pairwise relation function to calculate values representing pairwise relation weights between temporally related pairs of input. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined teachings of Liu et al. in view of Escorcia et al. disclose with the teachings of He et al. This modification would have been prompted in order to substitute the pair-wise relation function of Escorcia et al. for the pairwise function of He et al. The pairwise function of He et al. could be substituted in place of the pair-wise relation function of Escorcia et al. utilizing well-known techniques in the art and would likely yield predictable 

-	With regards to claim 8, Liu et al. in view of Escorcia et al. disclose the method of claim 1, further comprising: a fully-connected (fc) layer. (Liu et al., Figs. 1 & 2, Pg. 3 ¶ 0038, Pg. 11 ¶ 0118 - 0124) Liu et al. fail to disclose explicitly wherein the pair-wise relation function is implemented in a fully-connected (fc) layer. Pertaining to analogous art, He et al. disclose wherein the pair-wise relation function is implemented in a fully-connected (fc) layer. (He et al., Pg. 14 ¶ 0082 - 0088, Pg. 15 ¶ 0092 - 0096, Pg. 16 ¶ 0100 [“another choice of the pairwise function f may be based on a concatenation form, which is formulated as: f(xi,xj)=ReLU(                        
                            
                                
                                    w
                                
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    [θ(xi),Φ(xj)]) (5) As used herein, [∙,∙] may denote concatenation and wf may indicate a weight vector that projects the concatenated vector to a scaler. In particular embodiments, the normalization factor may be set as C(x)=N. As used herein, ReLU may indicate a function of a rectified linear 

-	With regards to claim 19, Liu et al. in view of Escorcia et al. disclose the module of claim 14. Liu et al. fail to disclose explicitly wherein the pair-wise relation function comprises a calculation of a similarity between two features of pairs of the proposals followed by a softmax operation. Pertaining to analogous art, Escorcia et al. disclose wherein the pair-wise relation function comprises a calculation of a similarity between two features of pairs of the proposals. (Escorcia et al., Figs. 8 & 11, Pg. 7 ¶ 0072 - 0073 and 0075, Pg. 8 ¶ 0080 - 0084 [“cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”]) Escorcia et al. fail to disclose explicitly wherein the pair-wise relation function comprises a calculation of a similarity followed by a softmax operation. Pertaining to analogous art, He et al. disclose wherein the pair-wise relation function comprises a calculation of a similarity between two features of pairs of the proposals followed by a softmax operation. (He et al., Pg. 12 ¶ 0071, Pg. 13 ¶ 0074 - 0075 and 0078 - 0080, Pg. 14 ¶ 0082 - 

-	With regards to claim 20, Liu et al. in view of Escorcia et al. disclose the module of claim 14, further comprising: a fully-connected (fc) layer. (Liu et al., Figs. 1 & 2, Pg. 3 ¶ 0038, Pg. 11 ¶ 0118 - 0124) Liu et al. fail to disclose explicitly wherein the pair-wise relation function comprises a fully-connected (fc) layer. Pertaining to analogous art, He et al. disclose wherein the pair-wise relation function comprises a fully-connected (fc) layer. (He et al., Pg. 14 ¶ 0082 - 0088, Pg. 15 ¶ 0092 - 0096, Pg. 16 ¶ 0100 [“another choice of the pairwise function f may be based on a concatenation form, which is formulated as: f(xi,xj)=ReLU(                        
                            
                                
                                    w
                                
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    [θ(xi),Φ(xj)]) (5) As used herein, [∙,∙] may denote concatenation and wf may indicate a weight vector that projects the concatenated vector to a scaler. In particular embodiments, the normalization factor may be set as C(x)=N. As used herein, ReLU may indicate a function of a rectified linear unit.” The Examiner asserts that the pairwise function of He et al. that is based on their disclosed concatenation form corresponds to the claimed limitation at least because the instant specification discloses a substantially similar process for implementing the pair-wise relation function in an fc layer, see at least page 3 paragraphs 0041 - 0043 of the instant application’s corresponding patent application publication.]) Liu et al. in view of Escorcia et al. disclose and He et al. are combinable because they are all directed towards temporal action localization and classification in video data and, similar to Escorcia et al., He et al. is also directed towards utilizing a pairwise relation function to calculate . 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC RUSH whose telephone number is (571) 270-3017. The examiner can normally be reached 9am - 5pm Monday - Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571) 272 - 7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.




/ERIC RUSH/Primary Examiner, Art Unit 2667