Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview and email with Thomas Matikainen on 12/8/2021 and 12/13/2021.
The application has been amended as follows: 

Listing of the Claims:
1.	(Currently amended) A method with video segmentation, comprising:
acquiring, over time, a video sequence comprising a plurality of image frames, the plurality of image frames including a second image frame corresponding to a time t of the video sequence and a first image frame corresponding to a time t-1 before the time t;
extracting a second feature vector from the second image frame;
generating second hidden state information corresponding to the second image frame, based on first hidden state information corresponding to the first image frame and second fusion information in which the second feature vector is fused with information related to the second image frame stored in a memory
generating a second segmentation mask corresponding to the second image frame, based on an output vector corresponding to the second hidden state information; and
outputting the second segmentation mask,
wherein the generating of the second hidden state information comprises:
determining a relation between the second feature vector and a first feature vector stored in the memory and corresponding to at least one object included in the second image frame using an attention mechanism; and
reading the information related to the second image frame from the memory, based on the relation between the second feature vector and the first feature vector.

2.	(Canceled)

3.	(Currently amended) The method of claim 1, wherein the reading of the information related to the second image frame from the memory comprises:

reading the information related to the second image frame from the memory in response to the relation between the second feature vector and the first feature vector being higher than a preset standard.

4.	(Currently amended) The method of claim 1, wherein the generating of the second hidden state information comprises:
generating the second fusion information by fusing the second feature vector with the information related to the second image frame; and
generating the second hidden state information corresponding to the second image frame, based on the second fusion information and the first hidden state information.

5.	(Currently amended) A method with video segmentation, comprising:
acquiring, over time, a video sequence comprising a plurality of image frames, the plurality of image frames including a second image frame corresponding to a time t of the video sequence and a first image frame corresponding to a time t-1 before the time t;
extracting a second feature vector from the second image frame;
generating second hidden state information corresponding to the second image frame, based on first hidden state information corresponding to the first image frame and second fusion information in which the second feature vector is fused with information related to the second image frame stored in a memory;

outputting the second segmentation mask; and
storing the second hidden state information in the memory, based on a dissimilarity between hidden state information, including the first hidden state information, stored in the memory and the second feature vector,
wherein the generating of the second hidden state information comprises:
determining a relation between the second feature vector and a first feature vector stored in the memory and corresponding to at least one object included in the second image frame using an attention mechanism; and
reading the information related to the second image frame from the memory, based on the relation between the second feature vector and the first feature vector.

6.	(Original) The method of claim 5, wherein the storing of the second hidden state information in the memory comprises:
determining the dissimilarity between the hidden state information stored in the memory and the second feature vector; and
storing the second hidden state information in the memory, based on a result of comparing the dissimilarity to a preset reference value.

7.	(Original) The method of claim 6, wherein the determining of the dissimilarity comprises either one of:
determining the dissimilarity based on a similarity distance between the hidden state information stored in the memory and the second feature vector; and
determining the dissimilarity based on an entropy-based correlation between the hidden state information stored in the memory and the second feature vector.

8.	(Original) The method of claim 1, further comprising:
storing the second segmentation mask in the memory.

9.	(Original) The method of claim 8, further comprising, in response to reception of a third image frame, among the plurality of image frames, corresponding to a time t+1 after the time t:

extracting a third feature vector from an image in which the third image frame and the second segmentation mask are combined;
generating third hidden state information corresponding to the third image frame, based on the second hidden state information and third fusion information in which the third feature vector is fused with information related to the third image frame stored in the memory;
generating a third segmentation mask corresponding to the third image frame, based on an output vector corresponding to the third hidden state information; and
outputting the third segmentation mask.

10.	(Original) A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1.

11.	(Currently amended) An apparatus with video segmentation, comprising:
a communication interface configured to acquire, over time, a video sequence comprising a plurality of image frames, the plurality of image frames including a second image frame corresponding to a time t of the video sequence and a first image frame corresponding to a time t-1 before the time t; and
a processor configured to:
extract a second feature vector from the second image frame;
generate second hidden state information corresponding to the second image frame, based on first hidden state information corresponding to the first image frame and second fusion information in which the second feature vector is fused with information related to the second image frame stored in a memory
generate a second segmentation mask corresponding to the second image frame, based on an output vector corresponding to the second hidden state information; and
output the second segmentation mask,
wherein the memory is configured to store a first feature vector corresponding to at least one object included in the second image frame, and
wherein the processor is further configured to:
determine a relation between the second feature vector and a first feature vector stored in the memory and corresponding to at least one object included in the second image frame using an attention mechanism; and
read the information related to the second image frame from the memory, based on the relation between the second feature vector and the first feature vector.

12.	(Canceled)

13.	(Currently amended) The apparatus of claim 11, wherein the processor is further configured to 

14.	(Currently amended) The apparatus of claim 11, wherein the processor is further configured to generate the second fusion information by fusing the second feature vector with the information related to the second image frame, and generate the second hidden state information corresponding to the second image frame, based on the second fusion information and the first hidden state information.

15.	(Currently amended) An apparatus with video segmentation, comprising:
a communication interface configured to acquire, over time, a video sequence comprising a plurality of image frames, the plurality of image frames including a second image frame corresponding to a time t of the video sequence and a first image frame corresponding to a time t-1 before the time t; and
a processor configured to:
extract a second feature vector from the second image frame;
generate second hidden state information corresponding to the second image frame, based on first hidden state information corresponding to the first image frame and second fusion information in which the second feature vector is fused with information related to the second image frame stored in a memory;
generate a second segmentation mask corresponding to the second image frame, based on an output vector corresponding to the second hidden state information; 

store the second hidden state information in the memory based on a dissimilarity between hidden state information, including the first hidden state information, stored in the memory and the second feature vector,
wherein the memory is configured to store a first feature vector corresponding to at least one object included in the second image frame, and
wherein the processor is further configured to:
determine a relation between the second feature vector and a first feature vector stored in the memory and corresponding to at least one object included in the second image frame using an attention mechanism; and
read the information related to the second image frame from the memory, based on the relation between the second feature vector and the first feature vector.

16.	(Original) The apparatus of claim 15, wherein the processor is further configured to determine the dissimilarity between the hidden state information stored in the memory and the second feature vector, and store the second hidden state information in the memory based on a result of comparing the dissimilarity to a preset reference value.

17.	(Original) The apparatus of claim 16, wherein the processor is further configured to:
determine the dissimilarity based on a similarity distance between the hidden state information stored in the memory and the second feature vector, or
determine the dissimilarity based on an entropy-based correlation between the hidden state information stored in the memory and the second feature vector.

18.	(Original) The apparatus of claim 11, wherein the processor is further configured to store the second segmentation mask in the memory.

19.	(Original) The apparatus of claim 18, wherein the processor is further configured to, in response to the communication interface receiving a third image frame, among the plurality of image frames, corresponding to a time t+1 after the time t:
combine the third image frame with the second segmentation mask;

generate third hidden state information corresponding to the third image frame based on the second hidden state information corresponding to the second image frame and third fusion information in which the third feature vector is fused with information related to the third image frame stored in the memory;
generate a third segmentation mask corresponding to the third image frame, based on an output vector corresponding to the third hidden state information; and
output the third segmentation mask.

20.	(Currently amended) An apparatus with video segmentation, comprising:
a communication interface configured to acquire, over time, a video sequence comprising a plurality of image frames, the plurality of image frames including a second image frame corresponding to a time t of the video sequence and a first image frame corresponding to a time t-1 before the time t;
an encoder configured to extract a second feature vector from the second image frame;
a memory configured to store information related to the second image frame;
a recurrent neural network (RNN) configured to generate second hidden state information corresponding to the second image frame, based on first hidden state information corresponding to the first image frame and second fusion information in which the second feature vector is fused with the stored information related to the second image frame, the information related to the second image frame being generated based on one or more of the image frames corresponding to a time before the time t; and
a decoder configured to generate a second segmentation mask corresponding to the second image frame, based on an output vector corresponding to the second hidden state information, and output the second segmentation mask,
wherein the memory is configured to store a first feature vector corresponding to at least one object included in the second image frame, and
wherein the encoder is configured to:
determine a relation between the second feature vector and a first feature vector stored in the memory and corresponding to at least one object included in the second image frame using an attention mechanism; and
read the information related to the second image frame from the memory, based on the relation between the second feature vector and the first feature vector.

21.	(Original) The apparatus of claim 20, wherein the encoder comprises a convolutional neural network (CNN)-based ResNet or VGG network.

22.	(Currently amended) The apparatus of claim 20, wherein the memory is further configured to store a first feature vector corresponding to at least one object included in the second image frame, and
wherein the RNN is further configured to read the stored information related to the second image frame from the memory, in response to determining that the relation between the second feature vector and the first feature vector is higher than a preset standard.

23.	(Currently amended) An apparatus with video segmentation, comprising:
one or more processors configured to:
extract a second feature vector from a second image frame, among a plurality of image frames acquired from a video sequence, wherein the second image frame corresponds to a selected time in the time sequence;
generate second fusion information by fusing the second feature vector with information related to the second image frame stored in a memory
generate second hidden state information corresponding to the second image frame, based on first hidden state information and the second fusion information, wherein the first hidden state information corresponds to a first image frame, among the plurality of image frames, corresponding to a time before the selected time in the time sequence;
generate a second segmentation mask corresponding to the second image frame, based on the second hidden state information; and
output the second segmentation mask,
wherein the memory is configured to store a first feature vector corresponding to at least one object included in the second image frame, and
wherein the one or more processors are further configured to:
determine a relation between the second feature vector and a first feature vector stored in the memory and corresponding to at least one object included in the second image frame using an attention mechanism; and
read the information related to the second image frame from the memory, based on the relation between the second feature vector and the first feature vector.

24.	(Original) The apparatus of claim 23, wherein the one or more processors are further configured to read the information related to the second image frame from the memory, in response to determining that the second feature vector is similar to or overlaps with a first feature vector that is stored in the memory and corresponds to at least one object included in the second image frame.

25.	(Original) The apparatus of claim 23, wherein the one or more processors are further configured to:
compare a dissimilarity between hidden state information stored in the memory and the second feature vector to a preset reference value; and
store the second hidden state information in the memory, in response to a result of the comparing being that the dissimilarity is greater than the preset reference value.

Allowable Subject Matter
Claim 1, 3-11 and 13-25 are allowed.
The following is an examiner’s statement of reasons for allowance: The arguments are persuasive and the proposed amendments are sufficient to overcome the previous rejections. 
Xu, AHMED and KARANAM do not expressly teach the newly added limitations, “wherein the generating of the second hidden state information comprises: determining a relation between the second feature vector and a first feature vector stored in the memory and corresponding to at least one object included in the second image frame using an attention mechanism; and reading the information related to the second image frame from the memory, based on the relation between the second feature vector and the first feature vector.” Xu doesn’t teach “determining a relation between the second feature vector .
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion                                                                                                                                                                                                        
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL C. CHANG whose telephone number is (571)270-1277. The examiner can normally be reached Monday-Thursday and Alternate Fridays 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan S. Park can be reached on (571) 272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit 





/DANIEL C CHANG/Examiner, Art Unit 2669    
/CHAN S PARK/Supervisory Patent Examiner, Art Unit 2669