DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments (12/29/20 Remarks: page 7, line 12 - page 9, line 16) have been fully considered but they are not persuasive.
Re claim 1, Applicant argues (12/29/20 Remarks: page 7, line 12 – page 8, line 22, particularly page 8, lines 19-22) that Medioni does not disclose that the first output is generated by a pre-trained convolutional neural network according to a frame of a target video.
The teachings of Medioni describe using a pre-trained convolutional neural network which receives a video input (Medioni column 2, lines 17-21, CNN with pre-trained weights for performing detection within video (i.e. series of frames) content) and producing an output according to this input (Medioni column 2, lines 54-62, receiving video segment inputs and producing a first set of feature maps).
Re claim 1, Applicant argues (12/29/20 Remarks: page 8, lines 23-27) that the second set of feature maps of Medioni cannot be interpreted as the sub-feature maps recited in claim 
As noted above, the teachings of Medioni describe using a pre-trained convolutional neural network which receives a video input (Medioni column 2, lines 17-21, CNN with pre-trained weights for performing detection within video (i.e. series of frames) content) and producing an output (Medioni column 2, lines 54-62, receiving video segment inputs and producing a first set of feature maps as the output). Medioni further teaches that a second set of feature maps is generated by convolution of the first set of feature maps (Medioni column 3, lines 28-30, convolve first output (i.e. feature maps, see above) to produce second feature maps).
Re claim 1, Applicant argues (12/29/20 Remarks: page 8, line 28 - page 9, line 2) that the third set of feature maps of Medioni cannot be interpreted as the second feature maps recited in claim 1, which are produced by convolving the sub-feature maps using adaptive kernels.
The teachings of Medioni describe that a third set of feature maps is generated by convolution of the second set of feature maps (Medioni column 3, lines 30-32, convolve first output to produce second feature maps). Medioni further teaches networks using adaptive kernels (Medioni column 8, lines 13-18, self-optimization of filters within a convolutional neural network).
Re claim 1, Applicant argues (12/29/20 Remarks: page 9, lines 3-5) that training of convolutional neural networks using video segments of different numbers of video frames cannot be interpreted as training adaptive convolution kernels frame by frame.
However, while the several video segments of Medioni may have different numbers of frames, each is (by the standard understanding in the art of a "video segment") composed of individual frames in series. It is unclear how the provision of training data as a series of frames is patentably distinct from the recitation of “training, frame by frame”.
Re claims 2, 16, & 19, Applicant argues (12/29/20 Remarks: page 9, lines 13-16) that these claims are allowable by virtue of their dependency from claim 1 and because they recite additional patentable limitations.
Applicant's arguments re claim 1 are addressed above.
Re the alleged recitation of additional patentable limitations, the argument fails to comply with 37 CFR 1.111(b) because it amounts to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references.
Claim Rejections - 35 USC § 102
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-2, 16, & 19 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Medioni (US 9836853).
Claim 1: A method for adapting a pre-trained Convolutional Neural Network (CNN) (Medioni column 2, lines 18-21, convolutional neural network initialized with pre-trained weights) to a target video, comprising:
transforming a first feature map into a plurality of sub-feature maps, wherein the first feature map is generated by the pre-trained CNN according to a frame of the target video (Medioni column 3, lines 28-32, transform and generation of a set of feature maps);
convolving each of the sub-feature maps with one of a plurality of adaptive convolution kernels, respectively, to output a plurality of second feature maps with improved adaptability (Medioni column 3, lines 28-32, transform and generation of an additional set of feature maps); and
training, frame by frame, the adaptive convolution kernels (Medioni column 8, lines 13-18, training of neural network using video frames).
Claim 2: The method of claim 1 (see above), wherein the transforming and the convolving are implemented in an adaptive CNN comprising:
a first convolution layer, linked to the pre-trained CNN and configured to transform the first feature map into the plurality of sub-feature maps (Medioni column 3, lines 28-32, transform and generation of a set of feature maps by a first layer); and
a second convolution layer, linked to the first convolution layer and configured to convolve each of the sub-feature maps with one of the adaptive convolution kernels (Medioni column 3, lines 28-32, transform and generation of an additional set of feature maps by a second layer), respectively.
Claim 16: A system for adapting a pre-trained CNN to a target video, comprising:
a memory that stores executable components (Medioni column 6, lines 12-14, storage 12 couples to processor 11); and
a processor electrically coupled to the memory (Medioni column 6, lines 12-14, processor 11) to execute the method of claim 1 (see above).
Claim 19: A non-transitory computer readable storage medium for storing computer readable instructions executable by a processor (Medioni column 6, lines 12-14, storage 12 couples to processor 11) to perform the method of claim 1 (see above).
Allowable Subject Matter
Claims 7-15, 17-18, & 20 are allowed.
Claims 3-6 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:
Re claim 3 (and dependent claim 4), the art of record does not teach or suggest the recited training sample feed, repeated back-propagation, parameter grouping, and optimizing in conjunction with the recited convolutional neural network feature map generation and network training arrangement.
Re claim 5, the art of record does not teach or suggest the recited different loss criteria arrangement in conjunction with the recited convolutional neural network feature map generation and network training arrangement. 
Re claim 6, the art of record does not teach or suggest the recited mask layer correlation reduction arrangement in conjunction with the recited convolutional neural network feature map generation and network training arrangement.
Re claims 7 & 17 (and dependent claims 8-15, 18, & 20), the art of record does not teach or suggest the recited region of interest determination; forward feed of the region of interest through a convolutional neural network; first, second, and third location determination; and first, second, and third scale estimation in conjunction with the recited convolutional neural network feature map generation and network training arrangement.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Yaorong, Kristan, and Chen disclose examples of object tracking using convolutional neural networks.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning the contents of this communication or earlier communications from the examiner should be directed to Stephen M. Brinich at 571-272-7430 (voice) or 571-273-7430 (fax).
Any inquiry relating to the status of this application, entry of papers into this application, or other any inquiries of a general nature concerning application processing should be directed to the Tech Center 2600 Customer Service center at 571-272-2600 or to the USPTO Contact Center at 800-786-9199 or 571-272-1000.
The examiner can normally be reached on weekdays 7:30-4:00 Eastern Time.
If attempts to contact the examiner and the Customer Service Center are unsuccessful, supervisor Claire Wang can be contacted at 571-270-1051.
Hand-carried correspondence may be delivered to the Customer Service Window, located at the Randolph Building, 401 Dulany Street, Alexandria, VA 22314.
/S. M. B./
Examiner, Art Unit 2663
/CLAIRE X WANG/     Supervisory Patent Examiner, Art Unit 2663