DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Allowable Subject Matter
Claims 1-20 are allowed.

			 Statement of Reasons for Allowance
The following is an Examiner’s statement of reasons for allowance:
          
With respect to the allowed independent claim 1:
The primary closest prior art, Somanath et al. (US 201703724479, hereinafter “Somanath”), teaches:
“A method of generating multiple masks for an object depicted in a video, the method including one or more processing devices performing operations (Techniques are provided for segmentation of objects, in videos comprising a sequence of color and depth image frames. A methodology implementing the techniques according to an embodiment includes receiving image frames, see abstract), comprising: receiving (i) a video file having multiple frames in which an object is at least partially visible (method 900 for object segmentation commences by receiving, at operation 910, a plurality of image frames that include a reference frame. The reference frame may be the first image frame in the video sequence, Para. [0058]) and (ii) a first mask corresponding to a first frame of the multiple frames, wherein the first mask indicates a first location of the object in the first frame (at operation 920, a user specified mask is received. The mask outlines a region in the reference image frame that contains the object to be segmented. The mask thus provides a relatively coarse approximation of the location and boundary of the object which will be used as a starting point in the segmentation process, Para. [0059]), extracting, from the first frame, a location feature map indicating the first location of the object, by applying an initializer subnetwork to the first frame and the first mask, wherein the initializer subnetwork is trained to determine the first location of the object based on the first mask (The mask cost is based on pixel location within the active area mask MASK1 812. Pixels that are further away from the boundary of MASK1 (e.g., deeper within the active area mask), have a higher likelihood of being associated with the object of interest and therefore a lower cost (e.g., closer to 0), Paras. [0050]-[0053])”. 
The secondary closest prior art, Tao et al. (US 20190147602, hereinafter “Tao”), teaches:
“extracting, from a second frame of the multiple frames, an image feature map indicating attributes of the second frame, by applying an encoder subnetwork to the second frame, wherein the encoder subnetwork is trained to determine the attributes of the second frame based on the second frame (As shown in FIG. 4, a target 406 is tracked by a target tracking system (e.g., artificial neural network tracker) over a sequence of frames (e.g., first frame 402 to n.sup.th frame 404). The target 406 is an object in a frame. Each target 406 may be given a unique object ID so the target may be tracked through subsequent frames. As shown in FIG. 4, the target 406 may be localized with a bounding box 408, Fig. 4 and Paras. [0061]-[0065])”. 
However, the closest prior arts, Somanath and Tao, whether taken alone or combination, do not teach or suggest the following novel features:
“the method comprising extracting a difference feature map that includes one or more image features indicating a second location of the object in the second frame, by applying a convolutional subnetwork to the location feature map and to the image feature map, wherein the convolutional subnetwork is trained to determine the second location in the second frame based on (i) memory information indicated by the location feature map that is received as a hidden state by the convolutional LSTM subnetwork via a memory input and (ii) image information indicated by the image feature map that is received by the convolutional LSTM subnetwork via an additional input and generating, based on the difference feature map, a second mask indicating the second location of the object in the second frame, by applying a decoder subnetwork to the difference feature map”, with combination of other limitations in claim 1. 
Dependent claims 2-7 are also allowable because of their dependencies to claim 1.

With respect to the allowed independent claim 8:
The primary closest prior art, Somanath et al. (US 201703724479, hereinafter “Somanath”), teaches:
“A non-transitory computer-readable medium embodying program code for generating multiple masks for an object depicted in a video, the program code comprising instructions which, when executed by a processor, cause the processor to perform operations comprising (Techniques are provided for segmentation of objects, in videos comprising a sequence of color and depth image frames. A methodology implementing the techniques according to an embodiment includes receiving image frames, see abstract), comprising: receiving (i) a video file having multiple frames in which an object is at least partially visible (method 900 for object segmentation commences by receiving, at operation 910, a plurality of image frames that include a reference frame. The reference frame may be the first image frame in the video sequence, Para. [0058]) and (ii) a first mask corresponding to a first frame of the multiple frames, wherein the first mask indicates a first location of the object in the first frame (at operation 920, a user specified mask is received. The mask outlines a region in the reference image frame that contains the object to be segmented. The mask thus provides a relatively coarse approximation of the location and boundary of the object which will be used as a starting point in the segmentation process, Para. [0059]), extracting, from the first frame, a location feature map indicating the first location of the object, by applying an initializer subnetwork to the first frame and the first mask, wherein the initializer subnetwork is trained to determine the first location of the object based on the first mask (The mask cost is based on pixel location within the active area mask MASK1 812. Pixels that are further away from the boundary of MASK1 (e.g., deeper within the active area mask), have a higher likelihood of being associated with the object of interest and therefore a lower cost (e.g., closer to 0), Paras. [0050]-[0053])”. 
Tao et al. (US 20190147602, hereinafter “Tao”), teaches:
“extracting, from a second frame of the multiple frames, an image feature map indicating attributes of the second frame, by applying an encoder subnetwork to the second frame, wherein the encoder subnetwork is trained to determine the attributes of the second frame based on the second frame (As shown in FIG. 4, a target 406 is tracked by a target tracking system (e.g., artificial neural network tracker) over a sequence of frames (e.g., first frame 402 to n.sup.th frame 404). The target 406 is an object in a frame. Each target 406 may be given a unique object ID so the target may be tracked through subsequent frames. As shown in FIG. 4, the target 406 may be localized with a bounding box 408, Fig. 4 and Paras. [0061]-[0065])”. 
However, the closest prior arts, Somanath and Tao, whether taken alone or combination, do not teach or suggest the following novel features:
“A non-transitory computer-readable medium comprising extracting a difference feature map that includes one or more image features indicating a second location of the object in the second frame, by applying a convolutional subnetwork to the location feature map and to the image feature map, wherein the convolutional subnetwork is trained to determine the second location in the second frame based on (i) memory information indicated by the location feature map that is received as a hidden state by the convolutional LSTM subnetwork via a memory input and (ii) image information indicated by the image feature map that is received by the convolutional LSTM subnetwork via an additional input and generating, based on the difference feature map, a second mask indicating the second location of the object in the second frame, by applying a decoder subnetwork to the difference feature map”, with combination of other limitations in claim 8. 
Dependent claims 9-13 are also allowable because of their dependencies to claim 8.

With respect to the allowed independent claim 14:
The primary closest prior art, Somanath et al. (US 201703724479, hereinafter “Somanath”), teaches:
“An object segmentation system for generating a group of masks for an object depicted in a video (Techniques are provided for segmentation of objects, in videos comprising a sequence of color and depth image frames. A methodology implementing the techniques according to an embodiment includes receiving image frames, see abstract), the object segmentation system comprising: a memory device storing instructions which, when executed by a processor, implement a mask extraction subnetwork, the mask extraction subnetwork including an encoder subnetwork and a decoder subnetwork (platform 1010 may comprise any combination of a processor 1020, a memory 1030, object segmentation circuit 120, a network interface 1040, an input/output (I/O) system 1050, a depth camera 104, a display element 114, a user interface 108 and a storage system 1070, Para. [0063]), a means for receiving (i) a video file having multiple frames in which an object is at least partially visible (method 900 for object segmentation commences by receiving, at operation 910, a plurality of image frames that include a reference frame. The reference frame may be the first image frame in the video sequence, Para. [0058]) and (ii) a first mask corresponding to a first frame of the multiple frames, wherein the first mask indicates a first location of the object in the first frame (at operation 920, a user specified mask is received. The mask outlines a region in the reference image frame that contains the object to be segmented. The mask thus provides a relatively coarse approximation of the location and boundary of the object which will be used as a starting point in the segmentation process, Para. [0059]), a means for extracting, from the first frame, a location feature map indicating the first location of the object, by applying an initializer subnetwork to the first frame and the first mask, wherein the initializer subnetwork is trained to determine the first location of the object based on the first mask (The mask cost is based on pixel location within the active area mask MASK1 812. Pixels that are further away from the boundary of MASK1 (e.g., deeper within the active area mask), have a higher likelihood of being associated with the object of interest and therefore a lower cost (e.g., closer to 0), Paras. [0050]-[0053])”.
The secondary closest prior art, Tao et al. (US 20190147602, hereinafter “Tao”), teaches: 
“a means for extracting, from a second frame of the multiple frames, an image feature map indicating attributes of the second frame, by applying an encoder subnetwork to the second frame, wherein the encoder subnetwork is trained to determine the attributes of the second frame based on the second frame (As shown in FIG. 4, a target 406 is tracked by a target tracking system (e.g., artificial neural network tracker) over a sequence of frames (e.g., first frame 402 to n.sup.th frame 404). The target 406 is an object in a frame. Each target 406 may be given a unique object ID so the target may be tracked through subsequent frames. As shown in FIG. 4, the target 406 may be localized with a bounding box 408, Fig. 4 and Paras. [0061]-[0065])”.
However, the closest prior arts, Somanath and Tao, whether taken alone or combination, do not teach or suggest the following novel features:
“a convolutional long-short term memory (LSTM) subnetwork, a means for extracting a difference feature map that includes one or more image features indicating a second location of the object, by applying a convolutional subnetwork to the location feature map and to the image feature map, wherein the convolutional subnetwork is trained to determine the second location based on (i) memory information indicated by the location feature map that is received as a hidden state by the convolutional LSTM subnetwork via a memory input and (ii) image information indicated by the image feature map that is received by the convolutional LSTM subnetwork via an additional input; a means for  generating, based on the difference feature map, a second mask indicating the second location of the object, by applying a decoder subnetwork to the difference feature map”, with combination of other limitations in claim 14. 
 	Dependent claims 15-20 are also allowable because of their dependencies to claim 14.

 	Any comments considered necessary by Applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GOLAM SOROWAR whose telephone number is (571)270-3761. The examiner can normally be reached Mon-Fri: 8:30AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Appiah can be reached on (571) 272-7904. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.