DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is responsive to the original application filed on 5/17/2018, the claims filed on 4/20/2022, and the interview conducted on 7/20/2022.  

EXAMINER'S AMENDMENT

An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this examiner’s amendment was given in an interview with Nilesh Amin (reg. No. 58407) on 7/20/2022.

The application has been amended as follows: 

Please amend the claim set, filed on 4/20/2022, with the following amendments:

1.	(Currently Amended) A system, comprising:
 	a memory that stores computer executable components; and
 	a processor that executes the computer executable components stored in the memory, wherein the computer executable components comprise:
	a motion component that extracts, using a neural network model, a motion vector from a plurality of adaptive receptive fields in a deformable convolution layer of the neural network model, wherein the motion vector is extracted based on computing a difference between a first adaptive receptive field from the plurality of adaptive receptive fields at a first time frame and a second adaptive receptive field from the plurality of adaptive receptive fields at a second time frame; and
	an action detection component that generates, using the neural network model, a spatio-temporal feature by concatenating the motion vector with a spatial feature extracted from the deformable convolution layer, wherein the neural network model is a single stream model.

2.	(Original) The system of claim 1, wherein the spatio-temporal feature is a vector that characterizes a fine-grained action associated with the spatial feature. 

3.	(Original) The system of claim 1, wherein the neural network model is trained end-to-end. 

4.	(Original) The system of claim 1, wherein the neural network model comprises a plurality of deformable convolution layers, wherein the deformable convolution layer is comprised within the plurality of deformable convolution layers, and wherein the motion vector is extracted from the plurality of deformable convolution layers.

5.	(Cancelled)

6.	(Original) The system of claim 1, further comprising a loss component that computes a motion loss from an aggregation of a plurality of motion vectors extracted by the motion component, wherein the motion loss is a regularization that enforces a consistency of learned motion characterized by the plurality of motion vectors over a period of time.

7.	(Original) The system of claim 6, wherein the loss component further computes a class loss from a second aggregation of a plurality of spatial features extracted by the action detection component, wherein the class loss is cross-entropy loss that enforces a correctness of predicted labels generated by the neural network model.

8.	(Cancelled)

9.	(Currently Amended) The system of claim 1, wherein the motion component extracts the motion vector by computing [[a]] the difference 

10.	(Original) The system of claim 1, wherein the action detection component generates the spatio-temporal feature in a cloud computing environment.

11.	(Currently Amended) A computer-implemented method, comprising:
	extracting, by a system operatively coupled to a processor, using a neural network model, a motion vector from a plurality of adaptive receptive fields in a deformable convolution layer of the neural network model, wherein the extracting comprises computing a difference between a first adaptive receptive field from the plurality of adaptive receptive fields at a first time frame and a second adaptive receptive field from the plurality of adaptive receptive fields at a second time frame; and
	generating, by the system, using the neural network model, a spatio-temporal feature by concatenating the motion vector with a spatial feature extracted from the deformable convolution layer, wherein the neural network model is a single stream model.

12.	(Original) The computer-implemented method of claim 11, wherein the spatio-temporal feature is a vector that characterizes a fine-grained action associated with the spatial feature. 

13.	(Cancelled)

14.	(Original) The computer-implemented method of claim 11, further comprising computing, by the system, a motion loss from an aggregation of a plurality of motion vectors, wherein the motion loss is a regularization that enforces a consistency of learned motion characterized by the plurality of motion vectors over a period of time.

15.	(Original) The computer-implemented method of claim 14, further comprising computing, by the system, a class loss from a second aggregation of a plurality of spatial features, wherein the class loss is cross-entropy loss that enforces a correctness of predicted labels generated by the neural network model. 

16.	(Previously Presented) The computer-implemented method of claim 11, wherein the neural network model is trained end-to-end. 

17.	(Currently Amended) A computer program product that facilitates detecting fine-grained action based on motion in receptive fields, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
extract, using a neural network model, a motion vector from a plurality of adaptive receptive fields in a deformable convolution layer of the neural network model based on computing a difference between a first adaptive receptive field from the plurality of adaptive receptive fields at a first time frame and a second adaptive receptive field from the plurality of adaptive receptive fields at a second time frame; and
	generate, using the neural network model, a spatio-temporal feature by concatenating the motion vector with a spatial feature extracted from the deformable convolution layer, wherein the neural network model is a single stream model. 

18.	(Original) The computer program product of claim 17, wherein the spatio-temporal feature is a vector that characterizes the fine-grained action associated with the spatial feature.

19.	(Currently Amended) The computer program product of claim 17, wherein the program instructions further cause the processor to extract the motion vector by computing [[a]] the difference 

20.	(Original) The computer program product of claim 17, wherein the neural network model is trained end-to-end. 

21.	(Currently Amended) The computer program product of claim 17, wherein the program instructions further cause the processor to compute[[ing]] a motion loss from an aggregation of a plurality of motion vectors, wherein the motion loss is a regularization that enforces a consistency of learned motion characterized by the plurality of motion vectors over a period of time.

22.	(New) The computer program product of claim 21, wherein the program instructions further cause the processor to compute a class loss from a second aggregation of a plurality of spatial features, and wherein the class loss is cross-entropy loss that enforces a correctness of predicted labels generated by the neural network model.

23.	(New) The computer program product of claim 17, wherein the spatio-temporal feature is generated in a cloud computing environment.

Reasons for Allowance

Claims 1-4, 6-7, 9-12, and 14-23, in view of the Examiner’s Amendment above, are allowed.

The following is an examiner’s statement of reasons for allowance: None of the prior art teaches the limitations of claims either alone or in combination, particularly:

Claims 1, 11, and 17

Extracting, using a neural network model, a motion vector from a plurality of adaptive receptive fields in a deformable convolution layer of the neural network model, wherein the motion vector is extracted based on computing a difference between a first adaptive receptive field from the plurality of adaptive receptive fields at a first time frame and a second adaptive receptive field from the plurality of adaptive receptive fields at a second time frame, and wherein the neural network is a single stream model, all taught in the context of the remaining elements of the independent claims and when considered as a whole, is not taught by the prior art.  

The closest prior art of record Yang et al. (US 20170220854 A1) discloses extracting motion vectors from receptive fields in a convolutional layer of a neural network, but fails to disclose extracting, using a neural network model, a motion vector from a plurality of adaptive receptive fields in a deformable convolution layer of the neural network model, wherein the motion vector is extracted based on computing a difference between a first adaptive receptive field from the plurality of adaptive receptive fields at a first time frame and a second adaptive receptive field from the plurality of adaptive receptive fields at a second time frame, and wherein the neural network is a single stream model, all taught in the context of the remaining elements of the independent claims and when considered as a whole, as claimed

Accordingly, the 35 USC § 103 rejection of the claims is withdrawn.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
                                                                                                                                                                                                
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Brent Hoover whose telephone number is (303)297-4403. The examiner can normally be reached Monday - Friday 9-5 MST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on 571-270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRENT JOHNSTON HOOVER/Examiner, Art Unit 2127