DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This action is responsive to the request for continued examination (RCE), amendments and remarks received 07 June 2022. Claims 1 - 20 are currently pending. 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 07 June 2022 has been entered.

Claim Objections
Claim 8 is objected to because of the following informalities: Line 3 and line 4 of claim 8 recite, in part, “the function” which appear to contain inconsistent claim terminology. The Examiner suggests amending line 3 and line 4 of claim 8 to --the pair-wise relation function-- in order to maintain consistency with line 6 of claim 1 and to improve the clarity and precision of the claim. Appropriate correction is required.
Claim 8 is objected to because of the following informalities: Line 4 of claim 8 recites, in part, “the input features” which appear to contain inconsistent claim terminology. The Examiner suggests amending line 4 of claim 8 to --the pair of input features-- in order to maintain consistency with line 3 of claim 8 and to improve the clarity and precision of the claim. Appropriate correction is required.
Claim 8 is objected to because of the following informalities: Line 5 of claim 8 recites, in part, “subspace, output features for a proposal is viewed as a weighted average” which appear to contain a grammatical error and/or a minor informality. The Examiner suggests amending the claim to --subspace, and output features for a proposal [[is]] are viewed as a weighted average-- in order to improve the clarity and precision of the claim. Appropriate correction is required.
Claim 9 is objected to because of the following informalities: Line 3 of claim 9 recites, in part, “generating proposals which are to include actions” which appear to contain a minor informality. The Examiner suggests amending the claim to --generating proposals which are likely to include actions-- in order to maintain consistency with line 11 of claim 1 and to improve the clarity and precision of the claim. Appropriate correction is required.
Claim 13 is objected to because of the following informalities: Line 7 of claim 13 recites, in part, “generating proposals which are to include actions” which appear to contain a minor informality. The Examiner suggests amending the claim to --generating proposals which are likely to include actions-- in order to maintain consistency with line 14 of claim 12 and to improve the clarity and precision of the claim. Appropriate correction is required.
Claim 13 is objected to because of the following informalities: Line 10 and line 11 of claim 13 recite, in part, “the function” which appear to contain inconsistent claim terminology. The Examiner suggests amending line 10 and line 11 of claim 13 to --the pair-wise relation function-- in order to maintain consistency with line 10 of claim 12 and to improve the clarity and precision of the claim. Appropriate correction is required.
Claim 13 is objected to because of the following informalities: Line 11 of claim 13 recites, in part, “the input features” which appear to contain inconsistent claim terminology. The Examiner suggests amending line 11 of claim 13 to --the pair of input features-- in order to maintain consistency with line 10 of claim 13 and to improve the clarity and precision of the claim. Appropriate correction is required.
Claim 13 is objected to because of the following informalities: Line 12 of claim 13 recites, in part, “subspace, output features for a proposal is viewed as a weighted average” which appear to contain a grammatical error and/or a minor informality. The Examiner suggests amending the claim to --subspace, and output features for a proposal [[is]] are viewed as a weighted average-- in order to improve the clarity and precision of the claim. Appropriate correction is required.
Claim 17 is objected to because of the following informalities: Line 3 of claim 17 recites, in part, “generating proposals which are to include actions” which appear to contain a minor informality. The Examiner suggests amending the claim to --generating proposals which are likely to include actions-- in order to maintain consistency with line 11 of claim 14 and to improve the clarity and precision of the claim. Appropriate correction is required.
Claim 20 is objected to because of the following informalities: Line 3 and line 4 of claim 20 recite, in part, “the function” which appear to contain inconsistent claim terminology. The Examiner suggests amending line 3 and line 4 of claim 20 to --the pair-wise relation function-- in order to maintain consistency with line 7 of claim 14 and to improve the clarity and precision of the claim. Appropriate correction is required.
Claim 20 is objected to because of the following informalities: Line 4 of claim 20 recites, in part, “the input features” which appear to contain inconsistent claim terminology. The Examiner suggests amending line 4 of claim 20 to --the pair of input features-- in order to maintain consistency with line 3 of claim 20 and to improve the clarity and precision of the claim. Appropriate correction is required.
Claim 20 is objected to because of the following informalities: Line 5 of claim 20 recites, in part, “subspace, output features for a proposal is viewed as a weighted average” which appear to contain a grammatical error and/or a minor informality. The Examiner suggests amending the claim to --subspace, and output features for a proposal [[is]] are viewed as a weighted average-- in order to improve the clarity and precision of the claim. Appropriate correction is required.
The objections to claims 1 and 14, due to minor informalities, are hereby withdrawn in view of the amendments and remarks received 07 June 2022.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1 - 11, 13 and 15 - 19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation "the pair of proposals" in line 9. There is insufficient antecedent basis for this limitation in the claim.
Claim 2 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention because it is unclear as to which generated proposals “the generated proposals” recited on line 4, along with the subsequent recitation of "the generated proposals” on line 5, are referencing. Are they referring to the “proposals” recited on line 4 of claim 1, the “proposals” recited on line 11 of claim 1 or the “proposals” recited on line 2 of claim 2? Additionally, it is unclear as to whether the “proposals” recited on line 4 of claim 1, the “proposals” recited on line 11 of claim 1 and/or the “proposals” recited on line 2 of claim 2 are the same proposals or different proposals. Clarification and appropriate correction are required. For purposes of examination the Examiner will treat the claims as requiring and referencing a single same set of proposals. 
Claim 4 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention because it is unclear as to which proposals “the proposals” recited on line 2 are referencing. Are they referring to the “proposals” recited on line 4 of claim 1 or the “proposals” recited on line 11 of claim 1? Additionally, it is unclear as to whether the “proposals” recited on line 4 of claim 1 and the “proposals” recited on line 11 of claim 1 are the same proposals or different proposals. Clarification and appropriate correction are required. For purposes of examination the Examiner will treat the claims as requiring and referencing a single same set of proposals.
Claim 9 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention because it is unclear as to which generated proposals “the generated proposals” recited on line 4 are referencing. Are they referring to the “proposals” recited on line 4 of claim 1, the “proposals” recited on line 11 of claim 1 or the “proposals” recited on line 3 of claim 9? Additionally, it is unclear as to whether the “proposals” recited on line 4 of claim 1, the “proposals” recited on line 11 of claim 1 and/or the “proposals” recited on line 3 of claim 9 are the same proposals or different proposals. Clarification and appropriate correction are required. For purposes of examination the Examiner will treat the claims as requiring and referencing a single same set of proposals.
Claim 13 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention because it is unclear as to which generated proposals “the generated proposals” recited on line 4, along with the subsequent recitation of "the generated proposals” on line 5, are referencing. Are they referring to the “proposals” recited on line 8 of claim 12, the “proposals” recited on line 14 of claim 12 or the “proposals” recited on lines 2 - 3 of claim 13? Additionally, it is unclear as to whether the “proposals” recited on line 8 of claim 12, the “proposals” recited on line 14 of claim 12 and/or the “proposals” recited on lines 2 - 3 of claim 13 are the same proposals or different proposals. Clarification and appropriate correction are required. For purposes of examination the Examiner will treat the claims as requiring and referencing a single same set of proposals.
Claim 13 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention because it is unclear as to which two-stage temporal action localization processing “the two-stage temporal action localization processing” recited on line 6 is referencing. Is it referring to the “two-stage temporal action localization processing” recited on line 13 of claim 12 or the “two-stage temporal action localization processing” recited on line 2 of claim 12? Additionally, it is unclear as to whether the “two-stage temporal action localization processing” recited on line 13 of claim 12 and the “two-stage temporal action localization processing” recited on line 2 of claim 12 are the same or different. Clarification and appropriate correction are required. For purposes of examination the Examiner will treat the claims as requiring and referencing a single same two-stage temporal action localization processing.
Claim 13 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention because it is unclear as to which first stage “the first stage” recited on line 7 is referencing. Is it referring to the “first stage” recited on lines 13 - 14 of claim 12 or the “first stage” recited on line 2 of claim 12? Additionally, it is unclear as to whether the “first stage” recited on lines 13 - 14 of claim 12 and the “first stage” recited on line 2 of claim 12 are the same or different. Clarification and appropriate correction are required. For purposes of examination the Examiner will treat the claims as requiring and referencing a single same first stage.
Claim 13 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention because it is unclear as to which generated proposals “the generated proposals” recited on line 8 are referencing. Are they referring to the “proposals” recited on line 8 of claim 12, the “proposals” recited on line 14 of claim 12, the “proposals” recited on lines 2 - 3 of claim 13 or the “proposals” recited on line 7 of claim 13? Additionally, it is unclear as to whether the “proposals” recited on line 8 of claim 12, the “proposals” recited on line 14 of claim 12, the “proposals” recited on lines 2 - 3 of claim 13 and/or the “proposals” recited on line 7 of claim 13 are the same proposals or different proposals. Clarification and appropriate correction are required. For purposes of examination the Examiner will treat the claims as requiring and referencing a single same set of proposals.
Claim 15 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention because it is unclear as to which generated proposals “the generated proposals” recited on line 4 are referencing. Are they referring to the “proposals” recited on line 5 of claim 14, the “proposals” recited on line 11 of claim 14 or the proposals recited on line 2 of claim 15? Additionally, it is unclear as to whether the “proposals” recited on line 5 of claim 14, the “proposals” recited on line 11 of claim 14 and/or the proposals recited on line 2 of claim 15 are the same proposals or different proposals. Clarification and appropriate correction are required. For purposes of examination the Examiner will treat the claims as requiring and referencing a single same set of proposals.
Claim 17 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention because it is unclear as to which generated proposals “the generated proposals” recited on line 4 are referencing. Are they referring to the “proposals” recited on line 5 of claim 14, the “proposals” recited on line 11 of claim 14 or the proposals recited on line 3 of claim 17? Additionally, it is unclear as to whether the “proposals” recited on line 5 of claim 14, the “proposals” recited on line 11 of claim 14 and/or the proposals recited on line 3 of claim 17 are the same proposals or different proposals. Clarification and appropriate correction are required. For purposes of examination the Examiner will treat the claims as requiring and referencing a single same set of proposals.
Claim 18 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention because it is unclear as to which proposals “the proposals” recited on line 4 are referencing. Are they referring to the “proposals” recited on line 5 of claim 14 or the “proposals” recited on line 11 of claim 14? Additionally, it is unclear as to whether the “proposals” recited on line 5 of claim 14 and the “proposals” recited on line 11 of claim 14 are the same proposals or different proposals. Clarification and appropriate correction are required. For purposes of examination the Examiner will treat the claims as requiring and referencing a single same set of proposals.
Claim 19 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention because it is unclear as to which proposals “the proposals” recited on line 2 are referencing. Are they referring to the “proposals” recited on line 5 of claim 14 or the “proposals” recited on line 11 of claim 14? Additionally, it is unclear as to whether the “proposals” recited on line 5 of claim 14 and the “proposals” recited on line 11 of claim 14 are the same proposals or different proposals. Clarification and appropriate correction are required. For purposes of examination the Examiner will treat the claims as requiring and referencing a single same set of proposals.
Claims 3, 5 - 8, 10, 11 and 16 are also rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, due to being dependent upon a rejected base claim(s) but would be withdrawn from the rejection if their base claim(s) overcome the rejection.

Response to Arguments
Applicant's arguments filed 07 June 2022 have been fully considered but they are not persuasive. 
On pages 8 - 11 of the remarks the Applicant’s Representative argues that Liu et al. do not disclose or suggest a “pair-wise relation function for relating the proposals, wherein the pair-wise relation function calculates a value representing at least pair-wise relation weight for pairs of the proposals”. The Applicant’s Representative argues that Liu et al. do not disclose or suggest the aforementioned disputed claim limitation at least because Liu et al. disclose generating one-dimensional temporal action proposals from which target actions can be localized and thus Liu et al. deal with “one-dimensional connected components” instead of a pair-wise function. 
The Examiner respectfully disagrees, in part. 
Initially, in response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Furthermore, the Examiner asserts that Liu et al. was/is not relied upon to disclose the aforementioned disputed claim limitation. Liu et al. was/is relied upon to disclose at least an “attention function for the proposals, wherein the attention function calculates a scalar value representing the at least a weight for the proposals”, see at least figures 1 - 2, page 3 paragraph 0037 - page 4 paragraph 0045, page 4 paragraph 0047 - page 5 paragraph 0050 and page 11 paragraphs 0119 - 0124 of Liu et al. Additionally, although Liu et al. describe their proposals as one-dimensional temporal action proposals, the Examiner asserts that the proposals of Liu et al. correspond to a subset of two-dimensional frames of video data, i.e., a one-dimensional temporal interval of the video data, see at least page 3 paragraphs 0031 and 0035 - 0036, page 4 paragraph 0040, page 4 paragraph 0047 - page 5 paragraph 0049, page 5 paragraph 0056 and page 11 paragraphs 0118 - 0119 of Liu et al. wherein they disclose, for example, that “Example models according to example aspects of the present disclosure can provide for deep neural networks to predict class labels per video using a subset of representative and unique frames to target action, which can be selected automatically from an input video”, that “each proposal 150, defined by [tstart;tend]” and that for an “RGB stream (e.g., 115 in FIG. 1), the smallest dimension of a frame was rescaled to 256 and a central crop of size 224x224 was performed. Other suitable input sizes could similarly be used.” 
The Examiner notes however that Liu et al. fail to disclose explicitly “a pair-wise relation function for relating the proposals, wherein the pair-wise relation function, including similarities between at least a pair of features of the proposals of the candidate regions, calculates a value representing at least a pair-wise relation weight for pairs of the proposals.” 
Pertaining to analogous art, Escorcia et al. disclose “at least a pair-wise relation function for relating the proposals, wherein the pair-wise relation function, including similarities between at least a pair of features of the proposals of the candidate regions, calculates a scalar value representing at least a pair-wise relation weight for pairs of the proposals”, see at least figures 5, 8 - 9B and 11, page 6 paragraphs 0063 - 0066, page 7 paragraphs 0072 - 0073 and 0075 - 0077, page 8 paragraphs 0080 - 0084 and page 9 paragraphs 0089 - 0090 of Escorcia et al. wherein it is disclosed that “a possible action location 800 (e.g., box proposal) is detected at a current frame t. Based on the possible action location 800 of the current frame t, possible action location samples 802 are generated for a number of consecutive frames”, that “the number of consecutive frames is not limited to only a subsequent frame from the current frame. Rather, any number of frames may be used, such that a best match region is determined for each frame from frame t+1 to frame t+n”, that “FIGS. 9A and 9B illustrate examples of associating similar action proposals based on an affinity maximization”, that the “associated action proposals may be used to generate the action proposals for the sequence of frames. As shown in FIG. 9A, each possible action location of a first frame (t) corresponds to a node of first frame nodes 902 in a graph 900. Frame nodes 902, 904, 906 may correspond to a possible action location based on an actor detected by an actor detector or based on a best match region determined from a deformation invariant expansion (e.g., expansion and matching). Each of the first frame nodes 902 is associated with one or more second frame nodes 904 of a second frame (t+1)”, that each “edge 908 between the frame nodes 902, 904, 906 represents a similarity between connected nodes (e.g., actor boxes). Given the number of frame nodes 902, 904, 906 in each frame, aspects of the present disclosure identify the most similar nodes over time to generate the action proposals. For example, as shown in a graph 950 of FIG. 9B, based on a comparison between the first frame nodes 902 and the second frame nodes 904, an affinity maximization module may determine that the first frame node A of the first frame nodes 902 has a greatest similarity to the second frame node A of the second frame nodes 904”, that, for example, “third frame node B may have the greatest similarity to second frame node A. Therefore, a second edge 908B between the third frame node B and the second frame node A is set to one. The other edges 908 to third frame node A and third frame node C may be set to zero”, that the “similarity may be determined based on a comparison of bounding box locations or a comparison of visual features between two bounding boxes. That is, an affinity between a pair of boxes from consecutive frames may be determined based on an appearance comparison, a location comparison, and/or motion models. The action proposals of the sequence of frames is determined by maximizing a global affinity of the network”, that the “affinity maximization may be determined based on equation 1…where ci is a confidence (e.g., level of certainty) of a detection I at frame t. The confidence is determined by the object detector or a matching confidence. cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.” and that to “associate a most similar possible action location, the machine based vision system may compare possible action locations in a first frame to possible action locations in a second subsequent frame. The comparison may compare a learned similarity between possible action locations in the first frame and possible action locations in the second subsequent frame. The learned similarity may be a learned semantic visual feature similarity between possible action locations in the first frame and possible action locations in the second subsequent frame.” The Examiner asserts that, as shown herein above, Escorcia et al. disclose associating similar action proposals based on affinity maximization by calculating a confidence value (scalar value) defining the similarity (e.g., affinity) between pairs of proposals, that a cosine similarity between features obtained from the bounding boxes can be utilized as the similarity between pairs of proposals and identifying the most similar proposals over time. The Examiner asserts that at least the confidence value(s) defining the similarity (e.g., affinity) between pairs of proposals of Escorcia et al. represents the at least a pair-wise relation weight for the at least the pair of the proposals and that the confidence value(s) of Escorcia et al. is related to a similarity between at least a pair of features of the proposals since Escorcia et al. disclose that the confidence value(s) defines the similarity (e.g., affinity) between pairs of proposals, see at least page 8 paragraphs 0080 - 0084 of Escorcia et al. Furthermore, the Examiner asserts that the equation utilized by Escorcia et al. to determine their confidence value(s) defining the similarity (e.g., affinity) between pairs of proposals corresponds to the claimed pair-wise relation function. 
Therefore, the Examiner asserts that Escorcia et al. disclose the aforementioned disputed claim limitation(s) and that the combination of Liu et al. in view of Escorcia et al. disclose and suggest amended claims 1, 12 and 14.
On pages 8 - 12 of the remarks the Applicant’s Representative argues that the combination of Liu et al. and Escorcia et al. do not disclose or suggest temporal action localization in video data “incorporated into a two-stage temporal action localization processing comprising a first stage of generating proposals which are likely to include actions.” The Applicant’s Representative argues that Liu et al. merely recite “determining a temporal location of a target action in the video based at least in part on the one or more weighted temporal class activation maps, but not as incorporated into a two-stage temporal action localization processing comprising a first stage of generating proposals which are likely to include actions.” 
The Examiner respectfully disagrees. 
The Examiner asserts that Liu et al. disclose the aforementioned disputed claim limitation, see at least figures 1 and 2, page 2 paragraphs 0028 - 0030, page 3 paragraphs 0034 - 0036, page 4 paragraph 0043 - page 5 paragraph 0049, page 9 paragraph 0100 and page 10 paragraphs 0109 - 0110 of Liu et al. wherein they disclose, for example, “methods for localizing actions in video using a deep neural network”, that “a network model (e.g., a deep neural network) can select a subset of frames useful for action recognition, where the loss function can measure classification error and sparsity of frame selection per video. For localization, Temporal Class Activation Mappings (T-CAMs) can be employed to generate one dimensional temporal action proposals from which target actions can be localized in a temporal domain”, generating “one dimension temporal action proposals 150 from which target actions can be localized (at 160) in the temporal domain”, that the “temporal proposals 150 can correspond to video segments that potentially enclose target actions” and that then “each proposal 150, defined by [tstart;tend], can be given a score for each class c, given by the weighted average T-CAM of all the frames within the proposal, as given by Equation (7)”. The Examiner asserts that, as shown herein above, Liu et al. disclose temporal action localization in video data “incorporated into a two-stage temporal action localization processing comprising a first stage of generating proposals which are likely to include actions” at least because Liu et al. disclose first generating temporal action proposals, corresponding to the claimed first stage of the claimed two-stage temporal action localization processing, and then processing the generated temporal action proposals to perform action classification and/or localization. Therefore, the Examiner asserts that, at least, Liu et al. disclose the aforementioned disputed claim limitation(s).
On page 12 of the remarks the Applicant’s Representative argues that the combination of Liu et al. in view of Escorcia et al. is improper because the “one-dimensional features explicitly teaches away from the pair-wise relation function as claimed.” 
The Examiner respectfully disagrees. 
Initially, the Examiner asserts that it is unclear as to how one-dimensional features would teach away from the pair-wise relation function as claimed. Additionally, the Examiner asserts that, as shown herein above in section 32a of the instant Office Action responding to the Applicant’s arguments on pages 8 - 11 of the remarks, the proposals of Liu et al. correspond to a subset of two-dimensional frames of video data. Furthermore, the Examiner asserts that Liu et al. disclose processing m dimensional feature representations to identify frames relevant to any action and estimate time intervals for action candidates, see at least page 3 paragraph 0038 - page 4 paragraph 0040, page 4 paragraph 0047 - page 5 paragraph 0050, page 5 paragraph 0056, page 7 paragraph 0069 and page 11 paragraphs 0118 - 0119 of Liu et al. Lastly, the Examiner asserts that Liu et al. do not teach away from the claimed pair-wise relation function because they do not criticize, discredit or otherwise discourage the use of the claimed pair-wise relation function. Therefore, the Examiner asserts that Liu et al. do not teach away from the proposed claimed invention.
On pages 12 - 14 of the remarks the Applicant’s Representative argues that Escorcia et al. do not teach or suggest wherein the “pair-wise relation function (including similarities between at least a pair of features of the proposals of the candidate regions, calculates a scalar value representing the at least a pair-wise relation weight for the at least the pair of the proposals).” The Applicant’s Representative argues that the disclosure by Escorcia et al. of comparing possible action locations, nodes, between frames to determine a possible action location in a first frame that has a greatest similarity to a possible action location in a second frame and setting a value of an edge between possible action locations in the first and second frames that have the greatest similarity to one (1) does not disclose the aforementioned disputed claim limitation(s) and that it also shows that the weight in Escorcia et al. “is not provided in representing a pair-wise relation weight for at least pairs of the proposals.” 
The Examiner respectfully disagrees. 
The Examiner asserts that Escorcia et al. disclose the aforementioned disputed claim limitation(s), see at least figures 5, 8 - 9B and 11, page 6 paragraphs 0063 - 0066, page 7 paragraphs 0072 - 0073 and 0075 - 0077, page 8 paragraphs 0080 - 0084 and page 9 paragraphs 0089 - 0090 of Escorcia et al. wherein it is disclosed that “a possible action location 800 (e.g., box proposal) is detected at a current frame t. Based on the possible action location 800 of the current frame t, possible action location samples 802 are generated for a number of consecutive frames”, that “the number of consecutive frames is not limited to only a subsequent frame from the current frame. Rather, any number of frames may be used, such that a best match region is determined for each frame from frame t+1 to frame t+n”, that “FIGS. 9A and 9B illustrate examples of associating similar action proposals based on an affinity maximization”, that the “associated action proposals may be used to generate the action proposals for the sequence of frames. As shown in FIG. 9A, each possible action location of a first frame (t) corresponds to a node of first frame nodes 902 in a graph 900. Frame nodes 902, 904, 906 may correspond to a possible action location based on an actor detected by an actor detector or based on a best match region determined from a deformation invariant expansion (e.g., expansion and matching). Each of the first frame nodes 902 is associated with one or more second frame nodes 904 of a second frame (t+1)”, that each “edge 908 between the frame nodes 902, 904, 906 represents a similarity between connected nodes (e.g., actor boxes). Given the number of frame nodes 902, 904, 906 in each frame, aspects of the present disclosure identify the most similar nodes over time to generate the action proposals. For example, as shown in a graph 950 of FIG. 9B, based on a comparison between the first frame nodes 902 and the second frame nodes 904, an affinity maximization module may determine that the first frame node A of the first frame nodes 902 has a greatest similarity to the second frame node A of the second frame nodes 904”, that, for example, “third frame node B may have the greatest similarity to second frame node A. Therefore, a second edge 908B between the third frame node B and the second frame node A is set to one. The other edges 908 to third frame node A and third frame node C may be set to zero”, that the “similarity may be determined based on a comparison of bounding box locations or a comparison of visual features between two bounding boxes. That is, an affinity between a pair of boxes from consecutive frames may be determined based on an appearance comparison, a location comparison, and/or motion models. The action proposals of the sequence of frames is determined by maximizing a global affinity of the network”, that “ci is a confidence (e.g., level of certainty) of a detection I at frame t. The confidence is determined by the object detector or a matching confidence. cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.” and that to “associate a most similar possible action location, the machine based vision system may compare possible action locations in a first frame to possible action locations in a second subsequent frame. The comparison may compare a learned similarity between possible action locations in the first frame and possible action locations in the second subsequent frame. The learned similarity may be a learned semantic visual feature similarity between possible action locations in the first frame and possible action locations in the second subsequent frame.” The Examiner asserts that, as shown herein above, Escorcia et al. disclose associating similar action proposals based on affinity maximization by calculating a confidence value (scalar value) defining the similarity (e.g., affinity) between pairs of proposals, that a cosine similarity between features obtained from the bounding boxes can be utilized as the similarity between pairs of proposals and identifying the most similar proposals over time. Furthermore, the Examiner asserts that at least the confidence value(s) defining the similarity (e.g., affinity) between pairs of proposals of Escorcia et al. represents the at least a pair-wise relation weight for the at least the pair of the proposals at least because Escorcia et al. disclose that the confidence value(s) defines the similarity (e.g., affinity) between pairs of proposals, see at least page 8 paragraphs 0080 - 0084 of Escorcia et al. Moreover, the Examiner asserts that the equation utilized by Escorcia et al. to determine their confidence value(s) defining the similarity (e.g., affinity) between pairs of proposals corresponds to the claimed pair-wise relation function. Therefore, the Examiner asserts that Escorcia et al. disclose the aforementioned disputed claim limitation(s) and that the combination of Liu et al. in view of Escorcia et al. disclose and suggest amended claims 1, 12 and 14. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1, 5, 10 - 12, 14 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. U.S. Publication No. 2020/0272823 A1 in view of Escorcia et al. U.S. Publication No. 2019/0108400 A1.

-	With regards to claim 1, Liu et al. disclose a method of temporal action localization in video data, (Liu et al., Abstract, Figs. 1 & 9 - 11, Pg. 1 ¶ 0002 and 0006, Pg. 2 ¶ 0028 - 0029) the method comprising: receiving a stream of video data; (Liu et al., Figs. 1, 2 & 9 - 11, Pg. 1 ¶ 0006, Pg. 2 ¶ 0029, Pg. 3 ¶ 0036, Pg. 9 ¶ 0098, Pg. 10 ¶ 0102, Pg. ¶ 0116) determining proposals in the video data stream, (Liu et al., Abstract, Fig. 1, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ 0036, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 10 ¶ 0109 - 0110) the proposals including candidate regions for temporal action localization in the video data stream; (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 4 ¶ 0040 - 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 5 ¶ 0056 [“temporal proposals 150 can correspond to video segments that potentially enclose target actions”]) and calculating values for at least an attention function for the proposals, (Liu et al., Fig. 2, Pg. 3 ¶ 0037 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 11 ¶ 0119 - 0124) wherein the attention function calculates a scalar value representing at least a weight for the proposals, (Liu et al., Fig. 2, Pg. 3 ¶ 0037 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 11 ¶ 0119 - 0124) as incorporated into a two-stage temporal action localization processing comprising a first stage of generating proposals which are likely to include actions. (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - Pg. 3 ¶ 0031, Pg. 3 ¶ 0034 - 0036, Pg. 4 ¶ 0043 - Pg. 5 ¶ 0049, Pg. 5 ¶ 0056, Pg. 9 ¶ 0100, Pg. 10 ¶ 0109 - 0110, Pg. 11 ¶ 0118 - 0119 [“a network model (e.g., a deep neural network) can select a subset of frames useful for action recognition, where the loss function can measure classification error and sparsity of frame selection per video. For localization, Temporal Class Activation Mappings (T-CAMs) can be employed to generate one dimensional temporal action proposals from which target actions can be localized in a temporal domain”, “temporal proposals 150 can correspond to video segments that potentially enclose target actions”, each proposal 150, defined by [tstart;tend], can be given a score for each class c, given by the weighted average T-CAM of all the frames within the proposal, as given by Equation (7)” and “For the RGB stream (e.g., 115 in FIG. 1), the smallest dimension of a frame was rescaled to 256 and a central crop of size 224x224 was performed. Other suitable input sizes could similarly be used”]) Liu et al. fail to disclose explicitly a pair-wise relation function for relating the proposals, wherein the pair-wise relation function, including similarities between at least a pair of features of the proposals of the candidate regions, calculates a value representing at least a pair-wise relation weight for at least the pair of the proposals. Pertaining to analogous art, Escorcia et al. disclose a method of temporal action localization in video data, (Escorcia et al., Pg. 1 ¶ 0002, 0004 and 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0056 - 0058, Pg. 8 ¶ 0087) the method comprising: receiving a stream of video data; (Escorcia et al., Abstract, Figs. 5, 6 & 11, Pg. 1 ¶ 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0058, Pg. 6 ¶ 0064, Pg. 8 ¶ 0085 - Pg. 9 ¶ 0091) determining proposals in the video data stream, (Escorcia et al., Abstract, Figs. 4, 5 & 11, Pg. 1 ¶ 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0056 - 0058, Pg. 6 ¶ 0061 - 0066, Pg. 8 ¶ 0087 - Pg. 9 ¶ 0088) the proposals including candidate regions for temporal action localization in the video data stream; (Escorcia et al., Abstract, Figs. 4 - 6, 10 & 11, Pg. 1 ¶ 0002, 0004 and 0007, Pg. 5 ¶ 0057 - 0058, Pg. 6 ¶ 0061 and 0066) and calculating values for at least a pair-wise relation function for relating the proposals, (Escorcia et al., Figs. 5, 8 - 9B & 11, Pg. 6 ¶ 0065, Pg. 7 ¶ 0071 - 0073 and 0075 - 0077, Pg. 8 ¶ 0080 - 0084, Pg. 9 ¶ 0088 - 0090 [“cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.” and “the machine based vision system may compare possible action locations in a first frame to possible action locations in a second subsequent frame. The comparison may compare a learned similarity between possible action locations in the first frame and possible action locations in the second subsequent frame”]) wherein the pair-wise relation function, including similarities between at least a pair of features of the proposals of the candidate regions, calculates a scalar value representing at least a pair-wise relation weight for at least the pair of the proposals. (Escorcia et al., Figs. 8 - 9B & 11, Pg. 7 ¶ 0072 - 0073 and 0075 - 0077, Pg. 8 ¶ 0080 - 0084 [“Each edge 908 between the frame nodes 902, 904, 906 represents a similarity between connected nodes (e.g., actor boxes). Given the number of frame nodes 902, 904, 906 in each frame, aspects of the present disclosure identify the most similar nodes over time to generate the action proposals. For example, as shown in a graph 950 of FIG. 9B, based on a comparison between the first frame nodes 902 and the second frame nodes 904, an affinity maximization module may determine that the first frame node A of the first frame nodes 902 has a greatest similarity to the second frame node A of the second frame nodes 904”, “similarity may be determined based on a comparison of bounding box locations or a comparison of visual features between two bounding boxes”, “ci is a confidence (e.g., level of certainty) of a detection I at frame t. The confidence is determined by the object detector or a matching confidence. cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”, “When xji or xij is one, node i and node j should be connected, when xji or xij is zero, node i and node j should not be connected” and “In equation 1, x is a confidence value determining the probability that a node (xi) or an edge (xij) belongs to the proposal.”]) Liu et al. and Escorcia et al. are combinable because they are both directed towards temporal action localization and classification in video data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu et al. with the teachings of Escorcia et al. This modification would have been prompted in order to enhance the base device of Liu et al. with the well-known technique Escorcia et al. applied to a comparable device. Calculating values for a pair-wise relation function that represent pair-wise relation weights for pairs of proposals, as taught by Escorcia et al., would enhance the base device of Liu et al. by improving its ability to reliably generate accurate temporal action proposals since related proposals would be able to be identified and connected thereby enhancing the ability of the base device to correctly locate and classify temporal actions in video data. Furthermore, this modification would have been prompted by the teachings and suggestions of Liu et al. to aggregate relevant proposals and to perform temporally weighted average pooling of proposals based on their determined relevance or importance, see at least page 3 paragraph 0036 - page 4 paragraph 0040 and page 4 paragraphs 0042 - 0048 of Liu et al. Moreover, this modification would have been prompted by the teachings and suggestions of Escorcia et al. that calculating a similarity between pairs of proposals can help in generating accurate action proposals especially in situations wherein a potential action location was lost during tracking and/or wherein the proposals are noisy, see at least page 6 paragraphs 0061 and 0066 - 0068, page 7 paragraph 0074 and page 8 paragraphs 0083 - 0086 of Escorcia et al. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that a pair-wise relation function would be utilized to calculate values that represent pair-wise relation weights for pairs of proposals so as to enable proposals that are relevant to each other to be identified and connected thereby improving the ability of the base device to determine temporal locations with the greatest likelihood of action. Therefore, it would have been obvious to combine Liu et al. with Escorcia et al. to obtain the invention as specified in claim 1.

-	With regards to claim 5, Liu et al. in view of Escorcia et al. disclose the method of claim 1. Liu et al. fail to disclose explicitly wherein the pair-wise relation function comprises a cosine similarity function. Pertaining to analogous art, Escorcia et al. disclose wherein the pair-wise relation function comprises a cosine similarity function. (Escorcia et al., Pg. 8 ¶ 0083 - 0084 [“cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”]) 

-	With regards to claim 10, Liu et al. in view of Escorcia et al. disclose the method of claim 1, as embodied as a set of machine-readable instructions in a non-transitory memory device. (Liu et al., Fig. 7A, Pg. 1 ¶ 0008, Pg. 7 ¶ 0072 and 0078, Pg. 8 ¶ 0082 and 0086) 

-	With regards to claim 11, Liu et al. in view of Escorcia et al. disclose the method of claim 1. ([See analysis of claim 1 provided herein above.]) Liu et al. disclose a computer product comprising a non-transitory memory device having stored therein a set of machine-readable instructions permitting a processor to execute (Liu et al., Fig. 7A, Pg. 1 ¶ 0008, Pg. 7 ¶ 0072 and 0078, Pg. 8 ¶ 0082 and 0086) the method of claim 1. ([The Examiner asserts that Liu et al. in view of Escorcia et al. disclose the method of claim 1, see analysis of claim 1 provided herein above.]) 

-	With regards to claim 12, Liu et al. disclose an apparatus, (Liu et al., Figs. 7A - 7C, Pg. 1 ¶ 0008 - Pg. 2 ¶ 0009, Pg. 7 ¶ 0070 - 0074, Pg. 7 ¶ 0076 - Pg. 8 ¶ 0082, Pg. 8 ¶ 0086, Pg. 8 ¶ 0088 - Pg. 9 ¶ 0093, Pg. 11 ¶ 0125) comprising: a processor; (Liu et al., Fig. 7A, Pg. 1 ¶ 0008, Pg. 7 ¶ 0072, 0074 and 0078, Pg. 8 ¶ 0082 and 0086) and a memory accessible by the processor, (Liu et al., Fig. 7A, Pg. 1 ¶ 0008, Pg. 7 ¶ 0072 and 0078, Pg. 8 ¶ 0082 and 0086) wherein the memory stores a set of machine-readable instructions permitting the processor to execute (Liu et al., Fig. 7A, Pg. 1 ¶ 0008, Pg. 7 ¶ 0072 and 0078, Pg. 8 ¶ 0082 and 0086) a method of temporal action localization in video data, (Liu et al., Abstract, Figs. 1 & 9 - 11, Pg. 1 ¶ 0002 and 0006, Pg. 2 ¶ 0028 - 0029) the method comprising: receiving a stream of video data; (Liu et al., Figs. 1, 2 & 9 - 11, Pg. 1 ¶ 0006, Pg. 2 ¶ 0029, Pg. 3 ¶ 0036, Pg. 9 ¶ 0098, Pg. 10 ¶ 0102, Pg. ¶ 0116) determining proposals in the video data stream, (Liu et al., Abstract, Fig. 1, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ 0036, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 10 ¶ 0109 - 0110) the proposals being candidate regions for temporal action in the video data stream; (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 4 ¶ 0040 - 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049 [“temporal proposals 150 can correspond to video segments that potentially enclose target actions”]) and calculating values for an attention function for the proposals, (Liu et al., Fig. 2, Pg. 3 ¶ 0037 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 11 ¶ 0119 - 0124) wherein the attention function calculates a scalar value representing a weight for the proposals, (Liu et al., Fig. 2, Pg. 3 ¶ 0037 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 11 ¶ 0119 - 0124) as incorporated into a two-stage temporal action localization processing comprising a first stage of generating proposals which are likely to include actions. (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - Pg. 3 ¶ 0031, Pg. 3 ¶ 0034 - 0036, Pg. 4 ¶ 0043 - Pg. 5 ¶ 0049, Pg. 5 ¶ 0056, Pg. 9 ¶ 0100, Pg. 10 ¶ 0109 - 0110, Pg. 11 ¶ 0118 - 0119 [“a network model (e.g., a deep neural network) can select a subset of frames useful for action recognition, where the loss function can measure classification error and sparsity of frame selection per video. For localization, Temporal Class Activation Mappings (T-CAMs) can be employed to generate one dimensional temporal action proposals from which target actions can be localized in a temporal domain”, “temporal proposals 150 can correspond to video segments that potentially enclose target actions”, each proposal 150, defined by [tstart;tend], can be given a score for each class c, given by the weighted average T-CAM of all the frames within the proposal, as given by Equation (7)” and “For the RGB stream (e.g., 115 in FIG. 1), the smallest dimension of a frame was rescaled to 256 and a central crop of size 224x224 was performed. Other suitable input sizes could similarly be used”]) Liu et al. fail to disclose explicitly a pair-wise relation function for relating the proposals, wherein the pair-wise relation function calculates a value representing a pair-wise relation weight for pairs of the proposals. Pertaining to analogous art, Escorcia et al. disclose a method of temporal action localization in video data, (Escorcia et al., Pg. 1 ¶ 0002, 0004 and 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0056 - 0058, Pg. 8 ¶ 0087) the method comprising: receiving a stream of video data; (Escorcia et al., Abstract, Figs. 5, 6 & 11, Pg. 1 ¶ 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0058, Pg. 6 ¶ 0064, Pg. 8 ¶ 0085 - Pg. 9 ¶ 0091) determining proposals in the video data stream, (Escorcia et al., Abstract, Figs. 4, 5 & 11, Pg. 1 ¶ 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0056 - 0058, Pg. 6 ¶ 0061 - 0066, Pg. 8 ¶ 0087 - Pg. 9 ¶ 0088) the proposals being candidate regions for temporal action in the video data stream; (Escorcia et al., Abstract, Figs. 4 - 6, 10 & 11, Pg. 1 ¶ 0002, 0004 and 0007, Pg. 5 ¶ 0057 - 0058, Pg. 6 ¶ 0061 and 0066) and calculating values for a pair-wise relation function for relating the proposals, (Escorcia et al., Figs. 5, 8 - 9B & 11, Pg. 6 ¶ 0065, Pg. 7 ¶ 0071 - 0073 and 0075 - 0077, Pg. 8 ¶ 0080 - 0084, Pg. 9 ¶ 0088 - 0090 [“cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.” and “the machine based vision system may compare possible action locations in a first frame to possible action locations in a second subsequent frame. The comparison may compare a learned similarity between possible action locations in the first frame and possible action locations in the second subsequent frame”]) wherein the pair-wise relation function calculates a scalar value representing a pair-wise relation weight for pairs of the proposals. (Escorcia et al., Figs. 8 - 9B & 11, Pg. 7 ¶ 0072 - 0073 and 0075 - 0077, Pg. 8 ¶ 0080 - 0084 [“Each edge 908 between the frame nodes 902, 904, 906 represents a similarity between connected nodes (e.g., actor boxes). Given the number of frame nodes 902, 904, 906 in each frame, aspects of the present disclosure identify the most similar nodes over time to generate the action proposals. For example, as shown in a graph 950 of FIG. 9B, based on a comparison between the first frame nodes 902 and the second frame nodes 904, an affinity maximization module may determine that the first frame node A of the first frame nodes 902 has a greatest similarity to the second frame node A of the second frame nodes 904”, “similarity may be determined based on a comparison of bounding box locations or a comparison of visual features between two bounding boxes”, “ci is a confidence (e.g., level of certainty) of a detection I at frame t. The confidence is determined by the object detector or a matching confidence. cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”, “When xji or xij is one, node i and node j should be connected, when xji or xij is zero, node i and node j should not be connected” and “In equation 1, x is a confidence value determining the probability that a node (xi) or an edge (xij) belongs to the proposal.”]) Liu et al. and Escorcia et al. are combinable because they are both directed towards temporal action localization and classification in video data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu et al. with the teachings of Escorcia et al. This modification would have been prompted in order to enhance the base device of Liu et al. with the well-known technique Escorcia et al. applied to a comparable device. Calculating values for a pair-wise relation function that represent pair-wise relation weights for pairs of proposals, as taught by Escorcia et al., would enhance the base device of Liu et al. by improving its ability to reliably generate accurate temporal action proposals since related proposals would be able to be identified and connected thereby enhancing the ability of the base device to correctly locate and classify temporal actions in video data. Furthermore, this modification would have been prompted by the teachings and suggestions of Liu et al. to aggregate relevant proposals and to perform temporally weighted average pooling of proposals based on their determined relevance or importance, see at least page 3 paragraph 0036 - page 4 paragraph 0040 and page 4 paragraphs 0042 - 0048 of Liu et al. Moreover, this modification would have been prompted by the teachings and suggestions of Escorcia et al. that calculating a similarity between pairs of proposals can help in generating accurate action proposals especially in situations wherein a potential action location was lost during tracking and/or wherein the proposals are noisy, see at least page 6 paragraphs 0061 and 0066 - 0068, page 7 paragraph 0074 and page 8 paragraphs 0083 - 0086 of Escorcia et al. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that a pair-wise relation function would be utilized to calculate values that represent pair-wise relation weights for pairs of proposals so as to enable proposals that are relevant to each other to be identified and connected thereby improving the ability of the base device to determine temporal locations with the greatest likelihood of action. Therefore, it would have been obvious to combine Liu et al. with Escorcia et al. to obtain the invention as specified in claim 12. 

-	With regards to claim 14, Liu et al. disclose a module, embodied as a set of machine-readable instructions in a non-transitory medium (Liu et al., Figs. 2 & 7A - 7C, Pg. 1 ¶ 0008, Pg. 2 ¶ 0028, Pg. 3 ¶ 0038, Pg. 7 ¶ 0071 - 0072, 0074 and 0078, Pg. 8 ¶ 0082 and 0086) for causing a processor to implement (Liu et al., Figs. 7A - 7C, Pg. 1 ¶ 0008, Pg. 7 ¶ 0072, 0074 and 0078, Pg. 8 ¶ 0082, 0086 and 0089 - 0091, Pg. 9 ¶ 0093 - 0095) a method of temporal action localization in video data, (Liu et al., Abstract, Figs. 1 & 9 - 11, Pg. 1 ¶ 0002 and 0006, Pg. 2 ¶ 0028 - 0029) the method comprising: receiving a stream of video data; (Liu et al., Figs. 1, 2 & 9 - 11, Pg. 1 ¶ 0006, Pg. 2 ¶ 0029, Pg. 3 ¶ 0036, Pg. 9 ¶ 0098, Pg. 10 ¶ 0102, Pg. ¶ 0116) determining proposals in the video data stream, (Liu et al., Abstract, Fig. 1, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ 0036, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 10 ¶ 0109 - 0110) the proposals being candidate regions for temporal action in the video data stream; (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 4 ¶ 0040 - 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049 [“temporal proposals 150 can correspond to video segments that potentially enclose target actions”]) and calculating values for an attention function for the proposals, (Liu et al., Fig. 2, Pg. 3 ¶ 0037 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 11 ¶ 0119 - 0124) wherein the attention function calculates a scalar value representing at least a weight for the proposals, (Liu et al., Fig. 2, Pg. 3 ¶ 0037 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 11 ¶ 0119 - 0124) as incorporated into a two-stage temporal action localization processing comprising a first stage of generating proposals which are likely to include actions. (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - Pg. 3 ¶ 0031, Pg. 3 ¶ 0034 - 0036, Pg. 4 ¶ 0043 - Pg. 5 ¶ 0049, Pg. 5 ¶ 0056, Pg. 9 ¶ 0100, Pg. 10 ¶ 0109 - 0110, Pg. 11 ¶ 0118 - 0119 [“a network model (e.g., a deep neural network) can select a subset of frames useful for action recognition, where the loss function can measure classification error and sparsity of frame selection per video. For localization, Temporal Class Activation Mappings (T-CAMs) can be employed to generate one dimensional temporal action proposals from which target actions can be localized in a temporal domain”, “temporal proposals 150 can correspond to video segments that potentially enclose target actions”, each proposal 150, defined by [tstart;tend], can be given a score for each class c, given by the weighted average T-CAM of all the frames within the proposal, as given by Equation (7)” and “For the RGB stream (e.g., 115 in FIG. 1), the smallest dimension of a frame was rescaled to 256 and a central crop of size 224x224 was performed. Other suitable input sizes could similarly be used”]) Liu et al. fail to disclose explicitly a pair-wise relation function for relating the proposals, wherein the pair-wise relation function calculates a scalar value representing at least a pair-wise relation weight for pairs of the proposals. Pertaining to analogous art, Escorcia et al. disclose a method of temporal action localization in video data, (Escorcia et al., Pg. 1 ¶ 0002, 0004 and 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0056 - 0058, Pg. 8 ¶ 0087) the method comprising: receiving a stream of video data; (Escorcia et al., Abstract, Figs. 5, 6 & 11, Pg. 1 ¶ 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0058, Pg. 6 ¶ 0064, Pg. 8 ¶ 0085 - Pg. 9 ¶ 0091) determining proposals in the video data stream, (Escorcia et al., Abstract, Figs. 4, 5 & 11, Pg. 1 ¶ 0007, Pg. 2 ¶ 0029, Pg. 5 ¶ 0056 - 0058, Pg. 6 ¶ 0061 - 0066, Pg. 8 ¶ 0087 - Pg. 9 ¶ 0088) the proposals being candidate regions for temporal action in the video data stream; (Escorcia et al., Abstract, Figs. 4 - 6, 10 & 11, Pg. 1 ¶ 0002, 0004 and 0007, Pg. 5 ¶ 0057 - 0058, Pg. 6 ¶ 0061 and 0066) and calculating values for a pair-wise relation function for relating the proposals, (Escorcia et al., Figs. 5, 8 - 9B & 11, Pg. 6 ¶ 0065, Pg. 7 ¶ 0071 - 0073 and 0075 - 0077, Pg. 8 ¶ 0080 - 0084, Pg. 9 ¶ 0088 - 0090 [“cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.” and “the machine based vision system may compare possible action locations in a first frame to possible action locations in a second subsequent frame. The comparison may compare a learned similarity between possible action locations in the first frame and possible action locations in the second subsequent frame”]) wherein the pair-wise relation function calculates a scalar value representing at least a pair-wise relation weight for pairs of the proposals. (Escorcia et al., Figs. 8 - 9B & 11, Pg. 7 ¶ 0072 - 0073 and 0075 - 0077, Pg. 8 ¶ 0080 - 0084 [“Each edge 908 between the frame nodes 902, 904, 906 represents a similarity between connected nodes (e.g., actor boxes). Given the number of frame nodes 902, 904, 906 in each frame, aspects of the present disclosure identify the most similar nodes over time to generate the action proposals. For example, as shown in a graph 950 of FIG. 9B, based on a comparison between the first frame nodes 902 and the second frame nodes 904, an affinity maximization module may determine that the first frame node A of the first frame nodes 902 has a greatest similarity to the second frame node A of the second frame nodes 904”, “similarity may be determined based on a comparison of bounding box locations or a comparison of visual features between two bounding boxes”, “ci is a confidence (e.g., level of certainty) of a detection I at frame t. The confidence is determined by the object detector or a matching confidence. cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”, “When xji or xij is one, node i and node j should be connected, when xji or xij is zero, node i and node j should not be connected” and “In equation 1, x is a confidence value determining the probability that a node (xi) or an edge (xij) belongs to the proposal.”]) Liu et al. and Escorcia et al. are combinable because they are both directed towards temporal action localization and classification in video data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Liu et al. with the teachings of Escorcia et al. This modification would have been prompted in order to enhance the base device of Liu et al. with the well-known technique Escorcia et al. applied to a comparable device. Calculating values for a pair-wise relation function that represent pair-wise relation weights for pairs of proposals, as taught by Escorcia et al., would enhance the base device of Liu et al. by improving its ability to reliably generate accurate temporal action proposals since related proposals would be able to be identified and connected thereby enhancing the ability of the base device to correctly locate and classify temporal actions in video data. Furthermore, this modification would have been prompted by the teachings and suggestions of Liu et al. to aggregate relevant proposals and to perform temporally weighted average pooling of proposals based on their determined relevance or importance, see at least page 3 paragraph 0036 - page 4 paragraph 0040 and page 4 paragraphs 0042 - 0048 of Liu et al. Moreover, this modification would have been prompted by the teachings and suggestions of Escorcia et al. that calculating a similarity between pairs of proposals can help in generating accurate action proposals especially in situations wherein a potential action location was lost during tracking and/or wherein the proposals are noisy, see at least page 6 paragraphs 0061 and 0066 - 0068, page 7 paragraph 0074 and page 8 paragraphs 0083 - 0086 of Escorcia et al. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that a pair-wise relation function would be utilized to calculate values that represent pair-wise relation weights for pairs of proposals so as to enable proposals that are relevant to each other to be identified and connected thereby improving the ability of the base device to determine temporal locations with the greatest likelihood of action. Therefore, it would have been obvious to combine Liu et al. with Escorcia et al. to obtain the invention as specified in claim 14. 

-	With regards to claim 18, Liu et al. in view of Escorcia et al. disclose the module of claim 14, as embodied as a set of machine-readable instructions in the non-transitory medium including a non-transitory memory device, (Liu et al., Fig. 7A, Pg. 1 ¶ 0008, Pg. 2 ¶ 0028, Pg. 7 ¶ 0072 and 0078, Pg. 8 ¶ 0082 and 0086) and wherein the proposals comprise all proposals in the video data stream. (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ 0036, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 7 ¶ 0069, Pg. 9 ¶ 0100, Pg. 10 ¶ 0108 - 0111) In addition, Escorcia et al. disclose wherein the proposals comprise all proposals in the video data stream. (Escorcia et al., Abstract, Figs. 4 - 8, 10 & 11, Pg. 1 ¶ 0002 and 0007, Pg. 2 ¶ 0029, Pg. 3 ¶ 0033, Pg. 5 ¶ 0056 - 0059, Pg. 6 ¶ 0061 - 0067, Pg. 7 ¶ 0072, Pg. 8 ¶ 0080 and 0082, Pg. 8 ¶ 0085 - Pg. 9 ¶ 0089) 

Claims 2, 3, 9 and 15 - 17 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. U.S. Publication No. 2020/0272823 A1 in view of Escorcia et al. U.S. Publication No. 2019/0108400 A1 as applied to claims 1 and 14 above, and further in view of Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, and Dahua Lin, “Temporal Action Detection with Structured Segment Networks”, arXiv, arXiv:1704.06228v2, 18 Sept. 2017, pages 1 - 10, herein referred to as “Zhao et al.”.

-	With regards to claim 2, Liu et al. in view of Escorcia et al. disclose the method of claim 1, as incorporated into the two-stage temporal action localization processing comprising the first stage of generating proposals which are likely to include actions (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - Pg. 3 ¶ 0031, Pg. 3 ¶ 0034 - 0036, Pg. 4 ¶ 0043 - Pg. 5 ¶ 0049, Pg. 5 ¶ 0056, Pg. 9 ¶ 0100, Pg. 10 ¶ 0109 - 0110, Pg. 11 ¶ 0118 - 0119 [“a network model (e.g., a deep neural network) can select a subset of frames useful for action recognition, where the loss function can measure classification error and sparsity of frame selection per video. For localization, Temporal Class Activation Mappings (T-CAMs) can be employed to generate one dimensional temporal action proposals from which target actions can be localized in a temporal domain”, “temporal proposals 150 can correspond to video segments that potentially enclose target actions”, each proposal 150, defined by [tstart;tend], can be given a score for each class c, given by the weighted average T-CAM of all the frames within the proposal, as given by Equation (7)” and “For the RGB stream (e.g., 115 in FIG. 1), the smallest dimension of a frame was rescaled to 256 and a central crop of size 224x224 was performed. Other suitable input sizes could similarly be used”]) and a second stage of performing a classification on each of the generated proposals individually, (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028, Pg. 3 ¶ 0036 - 0038, Pg. 4 ¶ 0040, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 9 ¶ 0100, Pg. 10 ¶ 0110 - 0111 [“Then, each proposal 150, defined by [tstart;tend], can be given a score for each class c, given by the weighted average T-CAM of all the frames within the proposal, as given by Equation (7)” and “This value corresponds to the temporal proposal score in each stream for class c. Finally, non-maximum suppression among temporal proposals of each class can be performed independently to remove highly overlapped detections.”]) wherein the generated proposals comprise all of the proposals in the video data stream. (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ 0036, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 7 ¶ 0069, Pg. 9 ¶ 0100, Pg. 10 ¶ 0108 - 0111) Liu et al. fail to disclose explicitly performing a boundary regression on each of the generated proposals. Pertaining to analogous art, Escorcia et al. disclose wherein the proposals comprise all proposals in the video data stream. (Escorcia et al., Abstract, Figs. 4 - 8, 10 & 11, Pg. 1 ¶ 0002 and 0007, Pg. 2 ¶ 0029, Pg. 3 ¶ 0033, Pg. 5 ¶ 0056 - 0059, Pg. 6 ¶ 0061 - 0067, Pg. 7 ¶ 0072, Pg. 8 ¶ 0080 and 0082, Pg. 8 ¶ 0085 - Pg. 9 ¶ 0089) Escorcia et al. fail to disclose explicitly performing a boundary regression on each of the generated proposals. Pertaining to analogous art, Zhao et al. disclose the method as incorporated into the two-stage temporal action localization processing comprising the first stage of generating proposals which are likely to contain actions (Zhao et al., Pg. 1 Abstract, Pg. 3 § 3 - § 3.1, Pg. 3 Fig. 2, Pgs. 5 - 6 § 5, Pg. 6 Fig. 3) and a second stage of performing a classification and a boundary regression on each of the generated proposals individually, (Zhao et al., Pg. 1 Abstract, Pg. 2 Left-Hand Column Third-Full Paragraph, Pg. 3 § 3, Pg. 3 Fig. 2, Pg. 4 § 3.2 - Pg. 5 § 3.4, Pgs. 5 - 6 § 5, Pg. 6 Fig. 3, Pg. 7 § 6.2 Subsection “Classifier Design” - Subsection “Location Regression & Multi-Task Learning”, Pg. 8 § 7) wherein the generated proposals comprise all of the proposals in the video data stream. (Zhao et al., Pg. 1 Abstract, Pg. 2 Left-Hand Column Third Full Paragraph, Pg. 3 § 3 - § 3.1, Pg. 5 § 4 Subsection “Inference with reordered computation” - § 5, Pg. 6 Fig. 3) Liu et al. in view of Escorcia et al. and Zhao et al. are combinable because they are all directed towards temporal action localization and classification in video data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined teachings of Liu et al. in view of Escorcia et al. with the teachings of Zhao et al. This modification would have been prompted in order to enhance the combined base device of Liu et al. in view of Escorcia et al. with the well-known technique Zhao et al. applied to a comparable device. Utilizing the two-stage temporal action localization processing comprising a second stage that performs boundary regression on each proposal, as taught by Zhao et al., would enhance the combined base device by improving its ability to accurately localize temporal actions within video data since proposals, candidate regions for temporal action, would first be extended and then have their start and end times later refined via regression so as to enable the temporal boundaries of detected actions to be precisely defined. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that boundary regression would be performed on each proposal so as to improve the ability of the combined base device to accurately determine the temporal boundaries of detected actions within the video data. Therefore, it would have been obvious to combine Liu et al. in view of Escorcia et al. with Zhao et al. to obtain the invention as specified in claim 2. 

-	With regards to claim 3, Liu et al. in view of Escorcia et al. in view of Zhao et al. disclose the method of claim 2. Liu et al. fail to disclose explicitly wherein the two-stage temporal action localization processing comprises a Structured Segment Network (SSN). Pertaining to analogous art, Zhao et al. disclose wherein the two-stage temporal action localization processing comprises a Structured Segment Network (SSN). (Zhao et al., Pg. 1 Abstract, Pg. 1 Fig. 1, Pg. 2 Left-Hand Column Third-Full Paragraph - Fourth-Full Paragraph, Pg. 3 § 3 ¶ 1, Pg. 3 Fig. 2, Pg. 5 § 4, Pg. 8 § 7) 

-	With regards to claim 9, Liu et al. in view of Escorcia et al. disclose the method of claim 1, as implemented in a cloud service, (Liu et al., Fig. 7A, Pg. 7 ¶ 0070 and 0076, Pg. 7 ¶ 0078 - Pg. 8 ¶ 0083, Pg. 9 ¶ 0092, Pg. 11 ¶ 0125) as incorporated into the two-stage temporal action localization processing comprising the first stage of generating proposals which are to include actions (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - Pg. 3 ¶ 0031, Pg. 3 ¶ 0034 - 0036, Pg. 4 ¶ 0043 - Pg. 5 ¶ 0049, Pg. 5 ¶ 0056, Pg. 9 ¶ 0100, Pg. 10 ¶ 0109 - 0110, Pg. 11 ¶ 0118 - 0119 [“a network model (e.g., a deep neural network) can select a subset of frames useful for action recognition, where the loss function can measure classification error and sparsity of frame selection per video. For localization, Temporal Class Activation Mappings (T-CAMs) can be employed to generate one dimensional temporal action proposals from which target actions can be localized in a temporal domain”, “temporal proposals 150 can correspond to video segments that potentially enclose target actions”, each proposal 150, defined by [tstart;tend], can be given a score for each class c, given by the weighted average T-CAM of all the frames within the proposal, as given by Equation (7)” and “For the RGB stream (e.g., 115 in FIG. 1), the smallest dimension of a frame was rescaled to 256 and a central crop of size 224x224 was performed. Other suitable input sizes could similarly be used”]) and a second stage of performing a classification on each of the generated proposals individually. (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028, Pg. 3 ¶ 0036 - 0038, Pg. 4 ¶ 0040, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 9 ¶ 0100, Pg. 10 ¶ 0110 - 0111 [“Then, each proposal 150, defined by [tstart;tend], can be given a score for each class c, given by the weighted average T-CAM of all the frames within the proposal, as given by Equation (7)” and “This value corresponds to the temporal proposal score in each stream for class c. Finally, non-maximum suppression among temporal proposals of each class can be performed independently to remove highly overlapped detections.”]) Liu et al. fail to disclose explicitly performing a boundary regression on each of the generated proposals individually. Pertaining to analogous art, Zhao et al. disclose the method as incorporated into the two-stage temporal action localization processing comprising the first stage of generating proposals which are to include actions (Zhao et al., Pg. 1 Abstract, Pg. 3 § 3 - § 3.1, Pg. 3 Fig. 2, Pgs. 5 - 6 § 5, Pg. 6 Fig. 3) and a second stage of performing a classification and a boundary regression on each of the generated proposals individually. (Zhao et al., Pg. 1 Abstract, Pg. 2 Left-Hand Column Third-Full Paragraph, Pg. 3 § 3, Pg. 3 Fig. 2, Pg. 4 § 3.2 - Pg. 5 § 3.4, Pgs. 5 - 6 § 5, Pg. 6 Fig. 3, Pg. 7 § 6.2 Subsection “Classifier Design” - Subsection “Location Regression & Multi-Task Learning”, Pg. 8 § 7) Liu et al. in view of Escorcia et al. and Zhao et al. are combinable because they are all directed towards temporal action localization and classification in video data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined teachings of Liu et al. in view of Escorcia et al. with the teachings of Zhao et al. This modification would have been prompted in order to enhance the combined base device of Liu et al. in view of Escorcia et al. with the well-known technique Zhao et al. applied to a comparable device. Utilizing the two-stage temporal action localization processing comprising a second stage that performs boundary regression on each proposal, as taught by Zhao et al., would enhance the combined base device by improving its ability to accurately localize temporal actions within video data since proposals, candidate regions for temporal action, would first be extended and then have their start and end times later refined via regression so as to enable the temporal boundaries of detected actions to be precisely defined. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that boundary regression would be performed on each proposal so as to improve the ability of the combined base device to accurately determine the temporal boundaries of detected actions within the video data. Therefore, it would have been obvious to combine Liu et al. in view of Escorcia et al. with Zhao et al. to obtain the invention as specified in claim 9. 

-	With regards to claim 15, Liu et al. in view of Escorcia et al. disclose the module of claim 14, as incorporated into the two-stage temporal action localization processing comprising the first stage of generating proposals which are likely to contain actions (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - Pg. 3 ¶ 0031, Pg. 3 ¶ 0034 - 0036, Pg. 4 ¶ 0043 - Pg. 5 ¶ 0049, Pg. 5 ¶ 0056, Pg. 9 ¶ 0100, Pg. 10 ¶ 0109 - 0110, Pg. 11 ¶ 0118 - 0119 [“a network model (e.g., a deep neural network) can select a subset of frames useful for action recognition, where the loss function can measure classification error and sparsity of frame selection per video. For localization, Temporal Class Activation Mappings (T-CAMs) can be employed to generate one dimensional temporal action proposals from which target actions can be localized in a temporal domain”, “temporal proposals 150 can correspond to video segments that potentially enclose target actions”, each proposal 150, defined by [tstart;tend], can be given a score for each class c, given by the weighted average T-CAM of all the frames within the proposal, as given by Equation (7)” and “For the RGB stream (e.g., 115 in FIG. 1), the smallest dimension of a frame was rescaled to 256 and a central crop of size 224x224 was performed. Other suitable input sizes could similarly be used”]) and a second stage of performing a classification on each of the generated proposals individually. (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028, Pg. 3 ¶ 0036 - 0038, Pg. 4 ¶ 0040, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 9 ¶ 0100, Pg. 10 ¶ 0110 - 0111 [“Then, each proposal 150, defined by [tstart;tend], can be given a score for each class c, given by the weighted average T-CAM of all the frames within the proposal, as given by Equation (7)” and “This value corresponds to the temporal proposal score in each stream for class c. Finally, non-maximum suppression among temporal proposals of each class can be performed independently to remove highly overlapped detections.”]) Liu et al. fail to disclose explicitly performing a boundary regression on each of the generated proposals. Pertaining to analogous art, Zhao et al. disclose the module as incorporated into the two-stage temporal action localization processing comprising the first stage of generating proposals which are likely to contain actions (Zhao et al., Pg. 1 Abstract, Pg. 3 § 3 - § 3.1, Pg. 3 Fig. 2, Pgs. 5 - 6 § 5, Pg. 6 Fig. 3) and a second stage of performing a classification and a boundary regression on each of the generated proposals individually. (Zhao et al., Pg. 1 Abstract, Pg. 2 Left-Hand Column Third-Full Paragraph, Pg. 3 § 3, Pg. 3 Fig. 2, Pg. 4 § 3.2 - Pg. 5 § 3.4, Pgs. 5 - 6 § 5, Pg. 6 Fig. 3, Pg. 7 § 6.2 Subsection “Classifier Design” - Subsection “Location Regression & Multi-Task Learning”, Pg. 8 § 7) Liu et al. in view of Escorcia et al. and Zhao et al. are combinable because they are all directed towards temporal action localization and classification in video data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined teachings of Liu et al. in view of Escorcia et al. with the teachings of Zhao et al. This modification would have been prompted in order to enhance the combined base device of Liu et al. in view of Escorcia et al. with the well-known technique Zhao et al. applied to a comparable device. Utilizing the two-stage temporal action localization processing comprising a second stage that performs boundary regression on each proposal, as taught by Zhao et al., would enhance the combined base device by improving its ability to accurately localize temporal actions within video data since proposals, candidate regions for temporal action, would first be extended and then have their start and end times later refined via regression so as to enable the temporal boundaries of detected actions to be precisely defined. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that boundary regression would be performed on each proposal so as to improve the ability of the combined base device to accurately determine the temporal boundaries of detected actions within the video data. Therefore, it would have been obvious to combine Liu et al. in view of Escorcia et al. with Zhao et al. to obtain the invention as specified in claim 15. 

-	With regards to claim 16, Liu et al. in view of Escorcia et al. in view of Zhao et al. disclose the module of claim 15. Liu et al. fail to disclose explicitly wherein the two-stage temporal action localization processing comprises a Structured Segment Network (SSN). Pertaining to analogous art, Zhao et al. disclose wherein the two-stage temporal action localization processing comprises a Structured Segment Network (SSN). (Zhao et al., Pg. 1 Abstract, Pg. 1 Fig. 1, Pg. 2 Left-Hand Column Third-Full Paragraph - Fourth-Full Paragraph, Pg. 3 § 3 ¶ 1, Pg. 3 Fig. 2, Pg. 5 § 4, Pg. 8 § 7) 

-	With regards to claim 17, Liu et al. in view of Escorcia et al. disclose the module of claim 14, as implemented in a cloud service, (Liu et al., Fig. 7A, Pg. 7 ¶ 0070 and 0076, Pg. 7 ¶ 0078 - Pg. 8 ¶ 0083, Pg. 9 ¶ 0092, Pg. 11 ¶ 0125) as incorporated into the two-stage temporal action localization processing comprising the first stage of generating proposals which are to include actions (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - Pg. 3 ¶ 0031, Pg. 3 ¶ 0034 - 0036, Pg. 4 ¶ 0043 - Pg. 5 ¶ 0049, Pg. 5 ¶ 0056, Pg. 9 ¶ 0100, Pg. 10 ¶ 0109 - 0110, Pg. 11 ¶ 0118 - 0119 [“a network model (e.g., a deep neural network) can select a subset of frames useful for action recognition, where the loss function can measure classification error and sparsity of frame selection per video. For localization, Temporal Class Activation Mappings (T-CAMs) can be employed to generate one dimensional temporal action proposals from which target actions can be localized in a temporal domain”, “temporal proposals 150 can correspond to video segments that potentially enclose target actions”, each proposal 150, defined by [tstart;tend], can be given a score for each class c, given by the weighted average T-CAM of all the frames within the proposal, as given by Equation (7)” and “For the RGB stream (e.g., 115 in FIG. 1), the smallest dimension of a frame was rescaled to 256 and a central crop of size 224x224 was performed. Other suitable input sizes could similarly be used”]) and a second stage of performing a classification on each of the generated proposals individually. (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028, Pg. 3 ¶ 0036 - 0038, Pg. 4 ¶ 0040, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 9 ¶ 0100, Pg. 10 ¶ 0110 - 0111 [“Then, each proposal 150, defined by [tstart;tend], can be given a score for each class c, given by the weighted average T-CAM of all the frames within the proposal, as given by Equation (7)” and “This value corresponds to the temporal proposal score in each stream for class c. Finally, non-maximum suppression among temporal proposals of each class can be performed independently to remove highly overlapped detections.”]) Liu et al. fail to disclose explicitly performing a boundary regression on each of the generated proposals individually. Pertaining to analogous art, Zhao et al. disclose the module as incorporated into the two-stage temporal action localization processing comprising the first stage of generating proposals which are to include actions (Zhao et al., Pg. 1 Abstract, Pg. 3 § 3 - § 3.1, Pg. 3 Fig. 2, Pgs. 5 - 6 § 5, Pg. 6 Fig. 3) and a second stage of performing a classification and a boundary regression on each of the generated proposals individually. (Zhao et al., Pg. 1 Abstract, Pg. 2 Left-Hand Column Third-Full Paragraph, Pg. 3 § 3, Pg. 3 Fig. 2, Pg. 4 § 3.2 - Pg. 5 § 3.4, Pgs. 5 - 6 § 5, Pg. 6 Fig. 3, Pg. 7 § 6.2 Subsection “Classifier Design” - Subsection “Location Regression & Multi-Task Learning”, Pg. 8 § 7) Liu et al. in view of Escorcia et al. and Zhao et al. are combinable because they are all directed towards temporal action localization and classification in video data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined teachings of Liu et al. in view of Escorcia et al. with the teachings of Zhao et al. This modification would have been prompted in order to enhance the combined base device of Liu et al. in view of Escorcia et al. with the well-known technique Zhao et al. applied to a comparable device. Utilizing the two-stage temporal action localization processing comprising a second stage that performs boundary regression on each proposal, as taught by Zhao et al., would enhance the combined base device by improving its ability to accurately localize temporal actions within video data since proposals, candidate regions for temporal action, would first be extended and then have their start and end times later refined via regression so as to enable the temporal boundaries of detected actions to be precisely defined. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that boundary regression would be performed on each proposal so as to improve the ability of the combined base device to accurately determine the temporal boundaries of detected actions within the video data. Therefore, it would have been obvious to combine Liu et al. in view of Escorcia et al. with Zhao et al. to obtain the invention as specified in claim 17. 

Claims 4, 6 - 8, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. U.S. Publication No. 2020/0272823 A1 in view of Escorcia et al. U.S. Publication No. 2019/0108400 A1 as applied to claims 1 and 14 above, and further in view of He et al. U.S. Publication No. 2019/0156210 A1.

-	With regards to claim 4, Liu et al. in view of Escorcia et al. disclose the method of claim 1. Liu et al. fail to disclose explicitly wherein the pair-wise relation function comprises a calculation of a similarity between two features of pairs of the proposals followed by a softmax operation. Pertaining to analogous art, Escorcia et al. disclose wherein the pair-wise relation function comprises a calculation of a similarity between two features of pairs of the proposals. (Escorcia et al., Figs. 8 & 11, Pg. 7 ¶ 0072 - 0073 and 0075, Pg. 8 ¶ 0080 - 0084 [“cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”]) Escorcia et al. fail to disclose explicitly wherein the pair-wise relation function comprises a calculation of a similarity followed by a softmax operation. Pertaining to analogous art, He et al. disclose wherein the pair-wise relation function comprises a calculation of a similarity between two features of pairs of the proposals followed by a softmax operation. (He et al., Pg. 12 ¶ 0071, Pg. 13 ¶ 0074 - 0075 and 0078 - 0080, Pg. 14 ¶ 0082 - 0086, Pg. 14 ¶ 0090 - Pg. 15 ¶ 0091, Pg. 15 ¶ 0095, Pg. 18 ¶ 0113) Liu et al. in view of Escorcia et al. disclose and He et al. are combinable because they are all directed towards temporal action localization and classification in video data and, similar to Escorcia et al., He et al. is also directed towards utilizing a pairwise relation function to calculate values representing pairwise relation weights between temporally related pairs of input. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined teachings of Liu et al. in view of Escorcia et al. disclose with the teachings of He et al. This modification would have been prompted in order to substitute the pair-wise relation function of Escorcia et al. for the pairwise function of He et al. The pairwise function of He et al. could be substituted in place of the pair-wise relation function of Escorcia et al. utilizing well-known techniques in the art and would likely yield predictable results, in that in the combination the pairwise function of He et al. would be utilized to determine the similarity, affinity, between pairs of proposals. Furthermore, this modification would have been prompted by the teachings and suggestions of Escorcia et al. that the similarity, affinity, between proposals may be determined in a variety of different fashions, see at least page 7 paragraphs 0071 - 0076 and page 8 paragraphs 0082 - 0084 of Escorcia et al. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that the pairwise function of He et al. would be utilized to calculate values for relating pairs of proposals. Therefore, it would have been obvious to combine Liu et al. in view of Escorcia et al. disclose with He et al. to obtain the invention as specified in claim 4.

-	With regards to claim 6, Liu et al. in view of Escorcia et al. disclose the method of claim 1. Liu et al. fail to disclose explicitly wherein the pair-wise relation function comprises a dot product of two embedding feature vectors. Pertaining to analogous art, He et al. disclose wherein the pair-wise relation function comprises a dot product of two embedding feature vectors. (He et al., Pg. 13 ¶ 0078 - 0080, Pg. 14 ¶ 0082 and 0086, Pg. 14 ¶ 0088 - Pg. 15 ¶ 0091, Pg. 15 ¶ 0094 - 0096, Pg. 16 ¶ 0100, Pg. 23 ¶ 0137 - 0139) Liu et al. in view of Escorcia et al. disclose and He et al. are combinable because they are all directed towards temporal action localization and classification in video data and, similar to Escorcia et al., He et al. is also directed towards utilizing a pairwise relation function to calculate values representing pairwise relation weights between temporally related pairs of input. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined teachings of Liu et al. in view of Escorcia et al. disclose with the teachings of He et al. This modification would have been prompted in order to substitute the pair-wise relation function of Escorcia et al. for the pairwise function of He et al. The pairwise function of He et al. could be substituted in place of the pair-wise relation function of Escorcia et al. utilizing well-known techniques in the art and would likely yield predictable results, in that in the combination the pairwise function of He et al. would be utilized to determine the similarity, affinity, between pairs of proposals. Furthermore, this modification would have been prompted by the teachings and suggestions of Escorcia et al. that the similarity, affinity, between proposals may be determined in a variety of different fashions, see at least page 7 paragraphs 0071 - 0076 and page 8 paragraphs 0082 - 0084 of Escorcia et al. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that the pairwise function of He et al. would be utilized to calculate values for relating pairs of proposals. Therefore, it would have been obvious to combine Liu et al. in view of Escorcia et al. disclose with He et al. to obtain the invention as specified in claim 6. 

-	With regards to claim 7, Liu et al. in view of Escorcia et al. disclose the method of claim 1, further comprising: a self-attention mechanism. (Liu et al., Figs. 1 & 2, Pg. 3 ¶ 0036 - Pg. 4 ¶ 0045, Pg. 4 ¶ 0047 - 0048, Pg. 11 ¶ 0119 - 0124) Liu et al. fail to disclose explicitly wherein the pair-wise relation function comprises a self-attention mechanism. Pertaining to analogous art, He et al. disclose wherein the pair-wise relation function comprises a self-attention mechanism. (He et al., Pg. 13 ¶ 0078 - 0080, Pg. 14 ¶ 0082 and 0086 - 0090, Pg. 18 ¶ 0113) Liu et al. in view of Escorcia et al. disclose and He et al. are combinable because they are all directed towards temporal action localization and classification in video data and, similar to Escorcia et al., He et al. is also directed towards utilizing a pairwise relation function to calculate values representing pairwise relation weights between temporally related pairs of input. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined teachings of Liu et al. in view of Escorcia et al. disclose with the teachings of He et al. This modification would have been prompted in order to substitute the pair-wise relation function of Escorcia et al. for the pairwise function of He et al. The pairwise function of He et al. could be substituted in place of the pair-wise relation function of Escorcia et al. utilizing well-known techniques in the art and would likely yield predictable results, in that in the combination the pairwise function of He et al. would be utilized to determine the similarity, affinity, between pairs of proposals. Furthermore, this modification would have been prompted by the teachings and suggestions of Escorcia et al. that the similarity, affinity, between proposals may be determined in a variety of different fashions, see at least page 7 paragraphs 0071 - 0076 and page 8 paragraphs 0082 - 0084 of Escorcia et al. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that the pairwise function of He et al. would be utilized to calculate values for relating pairs of proposals. Therefore, it would have been obvious to combine Liu et al. in view of Escorcia et al. disclose with He et al. to obtain the invention as specified in claim 7. 

-	With regards to claim 8, Liu et al. in view of Escorcia et al. disclose the method of claim 1, further comprising: a fully-connected (fc) layer. (Liu et al., Figs. 1 & 2, Pg. 3 ¶ 0038, Pg. 11 ¶ 0118 - 0124) Liu et al. fail to disclose explicitly wherein the pair-wise relation function is implemented in a fully-connected (fc) layer, wherein the function takes a pair of features as input and outputs a scalar, representing the pairwise relation weight, the function transforms the input features to an embedding subspace, output features for a proposal is viewed as a weighted average of all input proposal features in a sub-space. Pertaining to analogous art, Escorcia et al. disclose wherein the function takes a pair of features as input and outputs a scalar, representing the pairwise relation weight. (Escorcia et al., Figs. 8 - 9B & 11, Pg. 6 ¶ 0065 - 0066, Pg. 7 ¶ 0072 - 0073 and 0075 - 0077, Pg. 8 ¶ 0080 - 0084, Pg. 9 ¶ 0088 - 0090 [“a possible action location 800 (e.g., box proposal) is detected at a current frame t. Based on the possible action location 800 of the current frame t, possible action location samples 802 are generated for a number of consecutive frames”, “the number of consecutive frames is not limited to only a subsequent frame from the current frame. Rather, any number of frames may be used, such that a best match region is determined for each frame from frame t+1 to frame t+n”, “Each edge 908 between the frame nodes 902, 904, 906 represents a similarity between connected nodes (e.g., actor boxes). Given the number of frame nodes 902, 904, 906 in each frame, aspects of the present disclosure identify the most similar nodes over time to generate the action proposals. For example, as shown in a graph 950 of FIG. 9B, based on a comparison between the first frame nodes 902 and the second frame nodes 904, an affinity maximization module may determine that the first frame node A of the first frame nodes 902 has a greatest similarity to the second frame node A of the second frame nodes 904”, “similarity may be determined based on a comparison of bounding box locations or a comparison of visual features between two bounding boxes. That is, an affinity between a pair of boxes from consecutive frames may be determined based on an appearance comparison, a location comparison, and/or motion models. The action proposals of the sequence of frames is determined by maximizing a global affinity of the network”, “ci is a confidence (e.g., level of certainty) of a detection I at frame t. The confidence is determined by the object detector or a matching confidence. cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”, “When xji or xij is one, node i and node j should be connected, when xji or xij is zero, node i and node j should not be connected”, “In equation 1, x is a confidence value determining the probability that a node (xi) or an edge (xij) belongs to the proposal” and “To associate a most similar possible action location, the machine based vision system may compare possible action locations in a first frame to possible action locations in a second subsequent frame. The comparison may compare a learned similarity between possible action locations in the first frame and possible action locations in the second subsequent frame. The learned similarity may be a learned semantic visual feature similarity between possible action locations in the first frame and possible action locations in the second subsequent frame.”]) Escorcia et al. fail to disclose explicitly wherein the pair-wise relation function is implemented in a fully-connected (fc) layer, wherein the function transforms the input features to an embedding subspace, output features for a proposal is viewed as a weighted average of all input proposal features in a sub-space. Pertaining to analogous art, He et al. disclose wherein the pair-wise relation function is implemented in a fully-connected (fc) layer, (He et al., Pg. 14 ¶ 0082 - 0088 and 0090, Pg. 15 ¶ 0092 - 0097, Pg. 16 ¶ 0100 [“another choice of the pairwise function f may be based on a concatenation form, which is formulated as: f(xi,xj)=ReLU(                        
                            
                                
                                    w
                                
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    [θ(xi),Φ(xj)]) (5) As used herein, [∙,∙] may denote concatenation and wf may indicate a weight vector that projects the concatenated vector to a scaler. In particular embodiments, the normalization factor may be set as C(x)=N. As used herein, ReLU may indicate a function of a rectified linear unit” and “non-local block 500 takes an input 501 denoted by x, processes it with an embedding 502 denoted by θ, another embedding 503 denoted by ϕ, and a unary function 505 denoted by g. The results from embeddings 502 and 503 are further processed by a pairwise function 504 denoted by f. The results from the pairwise function 504 and the unary function 505 go through a matrix multiplication 506 denoted by                         
                            ⊗
                        
                    . The result from the matrix multiplication 506 is processed by a 1×1×1 convolution 507. The input 501 and the result from the 1×1×1 convolution 507 are processed by an element-wise sum 508 denoted by ⊕, which leads to an output 509 denoted by z.” The Examiner asserts that the pairwise function of He et al. that is based on their disclosed concatenation form corresponds to the claimed limitation at least because the instant specification discloses a substantially similar process for implementing the pair-wise relation function in an fc layer, see at least page 3 paragraphs 0041 - 0043 of the instant application’s corresponding patent application publication.]) wherein the function takes a pair of features as input and outputs a scalar, representing the pairwise relation weight, (He et al., Figs. 5 - 9, Pg. 13 ¶ 0075 and 0078 - 0079, Pg. 14 ¶ 0082, Pg. 14 ¶ 0087 - Pg. 15 ¶ 0096, Pg. 16 ¶ 0100 [“a pairwise function f may compute a scalar between i and all j. The scalar may represent a relationship such as affinity”]) the function transforms the input features to an embedding subspace, (He et al., Figs. 5 - 9, Pg. 13 ¶ 0078 - 0079, Pg. 14 ¶ 0082 and 0087 - 0090, Pg. 15 ¶ 0092 - 0096, Pg. 16 ¶ 0100 [“another choice of the pairwise function f may be based on a concatenation form, which is formulated as: f(xi,xj)=ReLU(                        
                            
                                
                                    w
                                
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    [θ(xi),Φ(xj)]) (5) As used herein, [∙,∙] may denote concatenation and wf may indicate a weight vector that projects the concatenated vector to a scaler. In particular embodiments, the normalization factor may be set as C(x)=N. As used herein, ReLU may indicate a function of a rectified linear unit” and “non-local block 500 takes an input 501 denoted by x, processes it with an embedding 502 denoted by θ, another embedding 503 denoted by ϕ, and a unary function 505 denoted by g. The results from embeddings 502 and 503 are further processed by a pairwise function 504 denoted by f”]) output features for a proposal is viewed as a weighted average of all input proposal features in a sub-space. (He et al., Figs. 8 & 9, Pg. 12 ¶ 0067 and 0071, Pg. 13 ¶ 0075 and 0078 - 0079, Pg. 14 ¶ 0082, Pg. 14 ¶ 0089 - Pg. 15 ¶ 0096, Pg. 15 ¶ 0099, Pg. 16 ¶ 0102 [“A self-attention module computes the response at a position in a sequence (e.g., a sentence) by attending to all positions and taking their weighted average in an embedding space.”]) Liu et al. in view of Escorcia et al. disclose and He et al. are combinable because they are all directed towards temporal action localization and classification in video data and, similar to Escorcia et al., He et al. is also directed towards utilizing a pairwise relation function to calculate values representing pairwise relation weights between temporally related pairs of input. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined teachings of Liu et al. in view of Escorcia et al. disclose with the teachings of He et al. This modification would have been prompted in order to substitute the pair-wise relation function of Escorcia et al. for the pairwise function of He et al. The pairwise function of He et al. could be substituted in place of the pair-wise relation function of Escorcia et al. utilizing well-known techniques in the art and would likely yield predictable results, in that in the combination the pairwise function of He et al. would be utilized to determine the similarity, affinity, between pairs of proposals. Furthermore, this modification would have been prompted by the teachings and suggestions of Escorcia et al. that the similarity, affinity, between proposals may be determined in a variety of different fashions, see at least page 7 paragraphs 0071 - 0076 and page 8 paragraphs 0082 - 0084 of Escorcia et al. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that the pairwise function of He et al. would be utilized to calculate values for relating pairs of proposals. Therefore, it would have been obvious to combine Liu et al. in view of Escorcia et al. disclose with He et al. to obtain the invention as specified in claim 8. 

-	With regards to claim 19, Liu et al. in view of Escorcia et al. disclose the module of claim 14. Liu et al. fail to disclose explicitly wherein the pair-wise relation function comprises a calculation of a similarity between two features of pairs of the proposals followed by a softmax operation. Pertaining to analogous art, Escorcia et al. disclose wherein the pair-wise relation function comprises a calculation of a similarity between two features of pairs of the proposals. (Escorcia et al., Figs. 8 & 11, Pg. 7 ¶ 0072 - 0073 and 0075, Pg. 8 ¶ 0080 - 0084 [“cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”]) Escorcia et al. fail to disclose explicitly wherein the pair-wise relation function comprises a calculation of a similarity followed by a softmax operation. Pertaining to analogous art, He et al. disclose wherein the pair-wise relation function comprises a calculation of a similarity between two features of pairs of the proposals followed by a softmax operation. (He et al., Pg. 12 ¶ 0071, Pg. 13 ¶ 0074 - 0075 and 0078 - 0080, Pg. 14 ¶ 0082 - 0086, Pg. 14 ¶ 0090 - Pg. 15 ¶ 0091, Pg. 15 ¶ 0095, Pg. 18 ¶ 0113) Liu et al. in view of Escorcia et al. disclose and He et al. are combinable because they are all directed towards temporal action localization and classification in video data and, similar to Escorcia et al., He et al. is also directed towards utilizing a pairwise relation function to calculate values representing pairwise relation weights between temporally related pairs of input. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined teachings of Liu et al. in view of Escorcia et al. disclose with the teachings of He et al. This modification would have been prompted in order to substitute the pair-wise relation function of Escorcia et al. for the pairwise function of He et al. The pairwise function of He et al. could be substituted in place of the pair-wise relation function of Escorcia et al. utilizing well-known techniques in the art and would likely yield predictable results, in that in the combination the pairwise function of He et al. would be utilized to determine the similarity, affinity, between pairs of proposals. Furthermore, this modification would have been prompted by the teachings and suggestions of Escorcia et al. that the similarity, affinity, between proposals may be determined in a variety of different fashions, see at least page 7 paragraphs 0071 - 0076 and page 8 paragraphs 0082 - 0084 of Escorcia et al. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that the pairwise function of He et al. would be utilized to calculate values for relating pairs of proposals. Therefore, it would have been obvious to combine Liu et al. in view of Escorcia et al. disclose with He et al. to obtain the invention as specified in claim 19. 

-	With regards to claim 20, Liu et al. in view of Escorcia et al. disclose the module of claim 14, further comprising: a fully-connected (fc) layer. (Liu et al., Figs. 1 & 2, Pg. 3 ¶ 0038, Pg. 11 ¶ 0118 - 0124) Liu et al. fail to disclose explicitly wherein the pair-wise relation function comprises a fully-connected (fc) layer, wherein the function takes a pair of features as input and outputs a scalar, representing the pairwise relation weight, the function transforms the input features to an embedding subspace, output features for a proposal is viewed as a weighted average of all input proposal features in a sub-space. Pertaining to analogous art, Escorcia et al. disclose wherein the function takes a pair of features as input and outputs a scalar, representing the pairwise relation weight. (Escorcia et al., Figs. 8 - 9B & 11, Pg. 6 ¶ 0065 - 0066, Pg. 7 ¶ 0072 - 0073 and 0075 - 0077, Pg. 8 ¶ 0080 - 0084, Pg. 9 ¶ 0088 - 0090 [“a possible action location 800 (e.g., box proposal) is detected at a current frame t. Based on the possible action location 800 of the current frame t, possible action location samples 802 are generated for a number of consecutive frames”, “the number of consecutive frames is not limited to only a subsequent frame from the current frame. Rather, any number of frames may be used, such that a best match region is determined for each frame from frame t+1 to frame t+n”, “Each edge 908 between the frame nodes 902, 904, 906 represents a similarity between connected nodes (e.g., actor boxes). Given the number of frame nodes 902, 904, 906 in each frame, aspects of the present disclosure identify the most similar nodes over time to generate the action proposals. For example, as shown in a graph 950 of FIG. 9B, based on a comparison between the first frame nodes 902 and the second frame nodes 904, an affinity maximization module may determine that the first frame node A of the first frame nodes 902 has a greatest similarity to the second frame node A of the second frame nodes 904”, “similarity may be determined based on a comparison of bounding box locations or a comparison of visual features between two bounding boxes. That is, an affinity between a pair of boxes from consecutive frames may be determined based on an appearance comparison, a location comparison, and/or motion models. The action proposals of the sequence of frames is determined by maximizing a global affinity of the network”, “ci is a confidence (e.g., level of certainty) of a detection I at frame t. The confidence is determined by the object detector or a matching confidence. cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”, “When xji or xij is one, node i and node j should be connected, when xji or xij is zero, node i and node j should not be connected”, “In equation 1, x is a confidence value determining the probability that a node (xi) or an edge (xij) belongs to the proposal” and “To associate a most similar possible action location, the machine based vision system may compare possible action locations in a first frame to possible action locations in a second subsequent frame. The comparison may compare a learned similarity between possible action locations in the first frame and possible action locations in the second subsequent frame. The learned similarity may be a learned semantic visual feature similarity between possible action locations in the first frame and possible action locations in the second subsequent frame.”]) Escorcia et al. fail to disclose explicitly wherein the pair-wise relation function comprises a fully-connected (fc) layer, wherein the function transforms the input features to an embedding subspace, output features for a proposal is viewed as a weighted average of all input proposal features in a sub-space. Pertaining to analogous art, He et al. disclose wherein the pair-wise relation function comprises a fully-connected (fc) layer, (He et al., Pg. 14 ¶ 0082 - 0088 and 0090, Pg. 15 ¶ 0092 - 0097, Pg. 16 ¶ 0100 [“another choice of the pairwise function f may be based on a concatenation form, which is formulated as: f(xi,xj)=ReLU(                        
                            
                                
                                    w
                                
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    [θ(xi),Φ(xj)]) (5) As used herein, [∙,∙] may denote concatenation and wf may indicate a weight vector that projects the concatenated vector to a scaler. In particular embodiments, the normalization factor may be set as C(x)=N. As used herein, ReLU may indicate a function of a rectified linear unit” and “non-local block 500 takes an input 501 denoted by x, processes it with an embedding 502 denoted by θ, another embedding 503 denoted by ϕ, and a unary function 505 denoted by g. The results from embeddings 502 and 503 are further processed by a pairwise function 504 denoted by f. The results from the pairwise function 504 and the unary function 505 go through a matrix multiplication 506 denoted by                         
                            ⊗
                        
                    . The result from the matrix multiplication 506 is processed by a 1×1×1 convolution 507. The input 501 and the result from the 1×1×1 convolution 507 are processed by an element-wise sum 508 denoted by ⊕, which leads to an output 509 denoted by z.” The Examiner asserts that the pairwise function of He et al. that is based on their disclosed concatenation form corresponds to the claimed limitation at least because the instant specification discloses a substantially similar process for implementing the pair-wise relation function in an fc layer, see at least page 3 paragraphs 0041 - 0043 of the instant application’s corresponding patent application publication.]) wherein the function takes a pair of features as input and outputs a scalar, representing the pairwise relation weight, (He et al., Figs. 5 - 9, Pg. 13 ¶ 0075 and 0078 - 0079, Pg. 14 ¶ 0082, Pg. 14 ¶ 0087 - Pg. 15 ¶ 0096, Pg. 16 ¶ 0100 [“a pairwise function f may compute a scalar between i and all j. The scalar may represent a relationship such as affinity”]) the function transforms the input features to an embedding subspace, (He et al., Figs. 5 - 9, Pg. 13 ¶ 0078 - 0079, Pg. 14 ¶ 0082 and 0087 - 0090, Pg. 15 ¶ 0092 - 0096, Pg. 16 ¶ 0100 [“another choice of the pairwise function f may be based on a concatenation form, which is formulated as: f(xi,xj)=ReLU(                        
                            
                                
                                    w
                                
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    [θ(xi),Φ(xj)]) (5) As used herein, [∙,∙] may denote concatenation and wf may indicate a weight vector that projects the concatenated vector to a scaler. In particular embodiments, the normalization factor may be set as C(x)=N. As used herein, ReLU may indicate a function of a rectified linear unit” and “non-local block 500 takes an input 501 denoted by x, processes it with an embedding 502 denoted by θ, another embedding 503 denoted by ϕ, and a unary function 505 denoted by g. The results from embeddings 502 and 503 are further processed by a pairwise function 504 denoted by f”]) output features for a proposal is viewed as a weighted average of all input proposal features in a sub-space. (He et al., Figs. 8 & 9, Pg. 12 ¶ 0067 and 0071, Pg. 13 ¶ 0075 and 0078 - 0079, Pg. 14 ¶ 0082, Pg. 14 ¶ 0089 - Pg. 15 ¶ 0096, Pg. 15 ¶ 0099, Pg. 16 ¶ 0102 [“A self-attention module computes the response at a position in a sequence (e.g., a sentence) by attending to all positions and taking their weighted average in an embedding space.”]) Liu et al. in view of Escorcia et al. disclose and He et al. are combinable because they are all directed towards temporal action localization and classification in video data and, similar to Escorcia et al., He et al. is also directed towards utilizing a pairwise relation function to calculate values representing pairwise relation weights between temporally related pairs of input. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined teachings of Liu et al. in view of Escorcia et al. disclose with the teachings of He et al. This modification would have been prompted in order to substitute the pair-wise relation function of Escorcia et al. for the pairwise function of He et al. The pairwise function of He et al. could be substituted in place of the pair-wise relation function of Escorcia et al. utilizing well-known techniques in the art and would likely yield predictable results, in that in the combination the pairwise function of He et al. would be utilized to determine the similarity, affinity, between pairs of proposals. Furthermore, this modification would have been prompted by the teachings and suggestions of Escorcia et al. that the similarity, affinity, between proposals may be determined in a variety of different fashions, see at least page 7 paragraphs 0071 - 0076 and page 8 paragraphs 0082 - 0084 of Escorcia et al. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that the pairwise function of He et al. would be utilized to calculate values for relating pairs of proposals. Therefore, it would have been obvious to combine Liu et al. in view of Escorcia et al. disclose with He et al. to obtain the invention as specified in claim 20. 

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. U.S. Publication No. 2020/0272823 A1 in view of Escorcia et al. U.S. Publication No. 2019/0108400 A1 as applied to claim 12 above, and further in view of Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, and Dahua Lin, “Temporal Action Detection with Structured Segment Networks”, arXiv, arXiv:1704.06228v2, 18 Sept. 2017, pages 1 - 10, herein referred to as “Zhao et al.”, in view of He et al. U.S. Publication No. 2019/0156210 A1.

-	With regards to claim 13, Liu et al. in view of Escorcia et al. disclose the apparatus of claim 12, wherein the method is incorporated into a two-stage temporal action localization processing comprising a first stage of generating proposals which are likely to contain actions (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ 0036, Pg. 4 ¶ 0040 - 0045, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 10 ¶ 0109 - 0110 [“temporal proposals 150 can correspond to video segments that potentially enclose target actions”]) and a second stage of performing a classification on each of the generated proposals individually, (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028, Pg. 3 ¶ 0036 - 0038, Pg. 4 ¶ 0040, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 9 ¶ 0100, Pg. 10 ¶ 0110 - 0111 [“Then, each proposal 150, defined by [tstart;tend], can be given a score for each class c, given by the weighted average T-CAM of all the frames within the proposal, as given by Equation (7)” and “This value corresponds to the temporal proposal score in each stream for class c. Finally, non-maximum suppression among temporal proposals of each class can be performed independently to remove highly overlapped detections.”]) wherein the generated proposals comprise all proposals in the video data stream, (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - 0030, Pg. 3 ¶ 0036, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0049, Pg. 7 ¶ 0069, Pg. 9 ¶ 0100, Pg. 10 ¶ 0108 - 0111) as incorporated into the two-stage temporal action localization processing comprising the first stage of generating proposals which are to include actions (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028 - Pg. 3 ¶ 0031, Pg. 3 ¶ 0034 - 0036, Pg. 4 ¶ 0043 - Pg. 5 ¶ 0049, Pg. 5 ¶ 0056, Pg. 9 ¶ 0100, Pg. 10 ¶ 0109 - 0110, Pg. 11 ¶ 0118 - 0119) and a second stage of performing a classification on each of the generated proposals individually. (Liu et al., Abstract, Figs. 1 & 2, Pg. 2 ¶ 0028, Pg. 3 ¶ 0036 - 0038, Pg. 4 ¶ 0040, Pg. 4 ¶ 0047 - Pg. 5 ¶ 0050, Pg. 9 ¶ 0100, Pg. 10 ¶ 0110 - 0111) Liu et al. fail to disclose explicitly performing a boundary regression on each of the generated proposals, wherein the function takes a pair of features as input and outputs a scalar, representing the pairwise relation weight, the function transforms the input features to an embedding subspace, output features for a proposal is viewed as a weighted average of all input proposal features in a sub-space. Pertaining to analogous art, Escorcia et al. disclose wherein the generated proposals comprise all proposals in the video data stream, (Escorcia et al., Abstract, Figs. 4 - 8, 10 & 11, Pg. 1 ¶ 0002 and 0007, Pg. 2 ¶ 0029, Pg. 3 ¶ 0033, Pg. 5 ¶ 0056 - 0059, Pg. 6 ¶ 0061 - 0067, Pg. 7 ¶ 0072, Pg. 8 ¶ 0080 and 0082, Pg. 8 ¶ 0085 - Pg. 9 ¶ 0089) wherein the function takes a pair of features as input and outputs a scalar, representing the pairwise relation weight. (Escorcia et al., Figs. 8 - 9B & 11, Pg. 6 ¶ 0065 - 0066, Pg. 7 ¶ 0072 - 0073 and 0075 - 0077, Pg. 8 ¶ 0080 - 0084, Pg. 9 ¶ 0088 - 0090 [“a possible action location 800 (e.g., box proposal) is detected at a current frame t. Based on the possible action location 800 of the current frame t, possible action location samples 802 are generated for a number of consecutive frames”, “the number of consecutive frames is not limited to only a subsequent frame from the current frame. Rather, any number of frames may be used, such that a best match region is determined for each frame from frame t+1 to frame t+n”, “Each edge 908 between the frame nodes 902, 904, 906 represents a similarity between connected nodes (e.g., actor boxes). Given the number of frame nodes 902, 904, 906 in each frame, aspects of the present disclosure identify the most similar nodes over time to generate the action proposals. For example, as shown in a graph 950 of FIG. 9B, based on a comparison between the first frame nodes 902 and the second frame nodes 904, an affinity maximization module may determine that the first frame node A of the first frame nodes 902 has a greatest similarity to the second frame node A of the second frame nodes 904”, “similarity may be determined based on a comparison of bounding box locations or a comparison of visual features between two bounding boxes. That is, an affinity between a pair of boxes from consecutive frames may be determined based on an appearance comparison, a location comparison, and/or motion models. The action proposals of the sequence of frames is determined by maximizing a global affinity of the network”, “ci is a confidence (e.g., level of certainty) of a detection I at frame t. The confidence is determined by the object detector or a matching confidence. cij defines the similarity (e.g., affinity) between a node i (e.g., detection i) at frame t and a node j at frame t+1. The similarity may be determined from the similarity of the bounding boxes, the spatial difference between the bounding box locations (e.g., an intersection over union), a cosine similarity between features obtained from the bounding boxes, etc.”, “When xji or xij is one, node i and node j should be connected, when xji or xij is zero, node i and node j should not be connected”, “In equation 1, x is a confidence value determining the probability that a node (xi) or an edge (xij) belongs to the proposal” and “To associate a most similar possible action location, the machine based vision system may compare possible action locations in a first frame to possible action locations in a second subsequent frame. The comparison may compare a learned similarity between possible action locations in the first frame and possible action locations in the second subsequent frame. The learned similarity may be a learned semantic visual feature similarity between possible action locations in the first frame and possible action locations in the second subsequent frame.”]) Escorcia et al. fail to disclose explicitly performing a boundary regression on each of the generated proposals, wherein the function transforms the input features to an embedding subspace, output features for a proposal is viewed as a weighted average of all input proposal features in a sub-space. Pertaining to analogous art, Zhao et al. disclose wherein the method is incorporated into a two-stage temporal action localization processing comprising a first stage of generating proposals which are likely to contain actions (Zhao et al., Pg. 1 Abstract, Pg. 3 § 3 - § 3.1, Pg. 3 Fig. 2, Pgs. 5 - 6 § 5, Pg. 6 Fig. 3) and a second stage of performing a classification and a boundary regression on each of the generated proposals individually, (Zhao et al., Pg. 1 Abstract, Pg. 2 Left-Hand Column Third-Full Paragraph, Pg. 3 § 3, Pg. 3 Fig. 2, Pg. 4 § 3.2 - Pg. 5 § 3.4, Pgs. 5 - 6 § 5, Pg. 6 Fig. 3, Pg. 7 § 6.2 Subsection “Classifier Design” - Subsection “Location Regression & Multi-Task Learning”, Pg. 8 § 7) wherein the generated proposals comprise all proposals in the video data stream, (Zhao et al., Pg. 1 Abstract, Pg. 2 Left-Hand Column Third Full Paragraph, Pg. 3 § 3 - § 3.1, Pg. 5 § 4 Subsection “Inference with reordered computation” - § 5, Pg. 6 Fig. 3) as incorporated into the two-stage temporal action localization processing comprising the first stage of generating proposals which are to include actions (Zhao et al., Pg. 1 Abstract, Pg. 3 § 3 - § 3.1, Pg. 3 Fig. 2, Pgs. 5 - 6 § 5, Pg. 6 Fig. 3) and a second stage of performing a classification and a boundary regression on each of the generated proposals individually. (Zhao et al., Pg. 1 Abstract, Pg. 2 Left-Hand Column Third-Full Paragraph, Pg. 3 § 3, Pg. 3 Fig. 2, Pg. 4 § 3.2 - Pg. 5 § 3.4, Pgs. 5 - 6 § 5, Pg. 6 Fig. 3, Pg. 7 § 6.2 Subsection “Classifier Design” - Subsection “Location Regression & Multi-Task Learning”, Pg. 8 § 7) Zhao et al. fail to disclose explicitly wherein the function transforms the input features to an embedding subspace, output features for a proposal is viewed as a weighted average of all input proposal features in a sub-space. Pertaining to analogous art, He et al. disclose wherein the function takes a pair of features as input and outputs a scalar, representing the pairwise relation weight, (He et al., Figs. 5 - 9, Pg. 13 ¶ 0075 and 0078 - 0079, Pg. 14 ¶ 0082, Pg. 14 ¶ 0087 - Pg. 15 ¶ 0096, Pg. 16 ¶ 0100 [“a pairwise function f may compute a scalar between i and all j. The scalar may represent a relationship such as affinity”]) the function transforms the input features to an embedding subspace, (He et al., Figs. 5 - 9, Pg. 13 ¶ 0078 - 0079, Pg. 14 ¶ 0082 and 0087 - 0090, Pg. 15 ¶ 0092 - 0096, Pg. 16 ¶ 0100 [“another choice of the pairwise function f may be based on a concatenation form, which is formulated as: f(xi,xj)=ReLU(                        
                            
                                
                                    w
                                
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    [θ(xi),Φ(xj)]) (5) As used herein, [∙,∙] may denote concatenation and wf may indicate a weight vector that projects the concatenated vector to a scaler. In particular embodiments, the normalization factor may be set as C(x)=N. As used herein, ReLU may indicate a function of a rectified linear unit” and “non-local block 500 takes an input 501 denoted by x, processes it with an embedding 502 denoted by θ, another embedding 503 denoted by ϕ, and a unary function 505 denoted by g. The results from embeddings 502 and 503 are further processed by a pairwise function 504 denoted by f”]) output features for a proposal is viewed as a weighted average of all input proposal features in a sub-space. (He et al., Figs. 8 & 9, Pg. 12 ¶ 0067 and 0071, Pg. 13 ¶ 0075 and 0078 - 0079, Pg. 14 ¶ 0082, Pg. 14 ¶ 0089 - Pg. 15 ¶ 0096, Pg. 15 ¶ 0099, Pg. 16 ¶ 0102 [“A self-attention module computes the response at a position in a sequence (e.g., a sentence) by attending to all positions and taking their weighted average in an embedding space.”]) Liu et al. in view of Escorcia et al. and Zhao et al. are combinable because they are all directed towards temporal action localization and classification in video data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined teachings of Liu et al. in view of Escorcia et al. with the teachings of Zhao et al. This modification would have been prompted in order to enhance the combined base device of Liu et al. in view of Escorcia et al. with the well-known technique Zhao et al. applied to a comparable device. Utilizing the two-stage temporal action localization processing comprising a second stage that performs boundary regression on each proposal, as taught by Zhao et al., would enhance the combined base device by improving its ability to accurately localize temporal actions within video data since proposals, candidate regions for temporal action, would first be extended and then have their start and end times later refined via regression so as to enable the temporal boundaries of detected actions to be precisely defined. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that boundary regression would be performed on each proposal so as to improve the ability of the combined base device to accurately determine the temporal boundaries of detected actions within the video data. In addition, Liu et al. in view of Escorcia et al. in view of Zhao et al. and He et al. are combinable because they are all directed toward temporal action localization and classification in video data and, similar to Escorcia et al., He et al. is also directed towards utilizing a pairwise relation function to calculate values representing pairwise relation weights between temporally related pairs of input. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined teachings of Liu et al. in view of Escorcia et al. in view of Zhao et al. with the teachings of He et al. This modification would have been prompted in order to substitute the pair-wise relation function of Escorcia et al. for the pairwise function of He et al. The pairwise function of He et al. could be substituted in place of the pair-wise relation function of Escorcia et al. utilizing well-known techniques in the art and would likely yield predictable results, in that in the combination the pairwise function of He et al. would be utilized to determine the similarity, affinity, between pairs of proposals. Furthermore, this modification would have been prompted by the teachings and suggestions of Escorcia et al. that the similarity, affinity, between proposals may be determined in a variety of different fashions, see at least page 7 paragraphs 0071 - 0076 and page 8 paragraphs 0082 - 0084 of Escorcia et al. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that the pairwise function of He et al. would be utilized to calculate values for relating pairs of proposals. Therefore, it would have been obvious to combine Liu et al. in view of Escorcia et al. with Zhao et al. and He et al. to obtain the invention as specified in claim 13.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
a.	Kadav et al. U.S. Publication No. 2019/0019037 A1; which is directed towards systems and methods utilizing neural networks for learning object interactions to perform video understanding tasks such as action recognition.
b.	Song et al. U.S. Publication No. 2017/0243082 A1; which is directed towards methods and systems for training a neural network to learn an image embedding function based on a pairwise similarity measure between images. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC RUSH whose telephone number is (571) 270-3017. The examiner can normally be reached 9am - 5pm Monday - Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571) 272 - 7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/ERIC RUSH/Primary Examiner, Art Unit 2667