DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 10 August 2021 has been entered.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

Claims 17-19 in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 


Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claim 1 is rejected on the ground of nonstatutory double patenting as being unpatentable over claim 12 of U.S. Patent No.9,830,709 B2 in view of Rodriguez-Serrano et al. (US 2017/0083792 A1) (hereinafter, Rodriguez).
Although the claims at issue are not identical, they are not patentably distinct from each other because:
Claim 1 of the instant application and the claim 12 of U.S. Patent No.9,830,709 B2 recite common subject matter.
The elements of claim 1 are fully anticipated by the claim 12 of U.S. Patent No.9,830,709 B2.
Moreover, claim 12 of the US patent discloses an artificial neural network (ANN) of a computing device, the ANN comprising: 
a representation network configured to:
receive a target object from a first frame of a sequence of frames and a search region corresponding to an expected location of the target object in a subsequent frame of the sequence of frames based on a location of the target object in the first frame, the first fram and 
extract a target region feature map of the target object from the first frame and a search region feature map of an expected location of the search region from the second frame (claim 11, limitations 2 and 3);
a cross-correlation layer configured to:
convolve the extracted target region feature map with the extracted search region feature map to determine a cross correlation map (claim 11, limitation 4).
However, claim 12 of the US patent does not disclose receiving the extracted target region feature map and the extracted search region feature map;
a predicting layer configured to:
receive the cross-correlation map; and predict coordinates, indicated by a multidimensional vector, of the target region in the subsequent frame based on the convolution; 
a loss layer configured to: 
compare the cross-correlation map with a ground truth cross-correlation map to determine a loss value, the ground truth cross-correlation map based on a ground truth target representation of the target object from the subsequent frame (Rodriguez discloses “[optimizing] the DDD task, the trained CNN; 
back propagate the loss value into the ANN to update a plurality of filter weights of the representation function.
Rodriguez discloses receiving the extracted target region feature map and the extracted search region feature map (Rodriguez discloses “a query image 12 [being] received” at Fig. 2-S108 and ¶0044);
a predicting layer configured to:
receive the cross-correlation map;  and predict coordinates of the target object in the second frames based on the cross-correlation map (Rodriguez discloses “a similarity [being] 
a loss layer (Rodriguez discloses a process of optimizing the ranking loss at Fig. 4-104 and  ¶¶0086-0088) configured to:
compare the cross-correlation map with a ground truth cross-correlation map to determine a loss value, the ground truth cross-correlation map based on a ground truth target feature map of the target object from the second frame (Rodriguez discloses “[optimizing] the DDD task, the trained CNN (with the fully-connected layers removed) may be learned to optimize the ranking loss” at Fig. 4-104 and ¶0086); and
back propagate the loss value into the ANN to update a plurality of filter weights of the representation function (Rodriguez discloses “[t]he vector of these derivatives of the loss [being] backpropagated through the model by first multiplying the vector of the loss 104 by the layer 98 weights, and computing the derivative of the loss, and using the new loss to update the weights of C5, then to C4, and so on, in the conventional manner” at Fig. 4 and ¶0088).
Before the time of the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to utilize the CNN of Rodriguez to claim 12 of the US patent.
The suggestion/motivation would have been to “provide a method of object localization which takes spatial information into account effectively in generating a global representation of the image while also providing a compact representation” (Rodriguez; ¶0009).

Claim 9 is rejected on the ground of nonstatutory double patenting as being unpatentable over claim 2 of U.S. Patent No.9,830,709 B2 in view of Rodriguez-Serrano et al. (US 2017/0083792 A1) (hereinafter, Rodriguez).

Claim 9 of the instant application and the claim 2 of U.S. Patent No.9,830,709 B2 recite common subject matter.
The elements of claim 9 are fully anticipated by the claim 2 of U.S. Patent No.9,830,709 B2.
Moreover, claim 2 of the US patent discloses a method comprising: 
receiving a target object from a first frame of a sequence of frames and a search region corresponding to an expected location of the target object in a second frame of the sequence of frames based on a location of the target object in the first frame, the first frame and the second frame being consecutive frames in the sequence of frames, the first frame and the second frame corresponding to different images (claim 2); and
extracting a target region feature map of a target object from a first frame and a search region feature map of the search region from the second frame (claim 1, limitations 1-2); 
convolving the extracted target region feature map with the extracted search region feature map to determine a cross-correlation map (claim 1, limitation 3).
However, claim 2 of the US patent does not disclose receiving the extracted target region feature map and the extracted search region feature map;
predicting coordinates of the target region in the second frame based on the cross-correlation map; 
comparing the cross-correlation map with a ground truth cross-correlation map to determine a loss value, the ground truth cross-correlation map based on a ground truth target feature map of the target object from the second frame; and 
back propagating the loss value into an artificial neural network to update a plurality of filter weights of a representation function.

predicting coordinates of the target region in the subsequent frame based on the convolving (Rodriguez discloses “a similarity [being] computed between the query image representation 46 and the dataset image representations 48 (optionally projected into the new feature space) to identify a subset 56 of one or more similar annotated image(s), i.e., those which have the most similar representations (in the new feature space)” at Fig. 2-S112 and ¶0046);
comparing the cross-correlation map with a ground truth cross-correlation map to determine a loss value, the ground truth cross-correlation map based on a ground truth target feature map of the target object from the subsequent frame (Rodriguez discloses “[optimizing] the DDD task, the trained CNN (with the fully-connected layers removed) may be learned to optimize the ranking loss” at Fig. 4-104 and ¶0086); and
back propagating the loss value into an artificial neural network to update a plurality of filter weights of a representation function (Rodriguez discloses “[t]he vector of these derivatives of the loss [being] backpropagated through the model by first multiplying the vector of the loss 104 by the layer 98 weights, and computing the derivative of the loss, and using the new loss to update the weights of C5, then to C4, and so on, in the conventional manner” at Fig. 4 and ¶0088).
Before the time of the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to utilize the CNN of Rodriguez to claim 2 of the US patent.
.

Claim 17 is rejected on the ground of nonstatutory double patenting as being unpatentable over claim 27 of U.S. Patent No.9,830,709 B2 in view of Rodriguez-Serrano et al. (US 2017/0083792 A1) (hereinafter, Rodriguez).
Although the claims at issue are not identical, they are not patentably distinct from each other because:
Claim 17 of the instant application and the claim 27 of U.S. Patent No.9,830,709 B2 recite common subject matter.
The elements of claim 17 are fully anticipated by the claim 27 of U.S. Patent No.9,830,709 B2.
Moreover, claim 27 of the US patent discloses a method comprising: 
means for receiving a target object from a first frame of a sequence of frames and a search region corresponding to an expected location of the target object in a subsequent frame of the sequence of frames based on a location of the target object in the first frame, the first frame and the second frame being consecutive frames in the sequence of frames, the first frame and the second frame corresponding to different images (claim 27); 
means for extracting a target region feature map of a target object from a first frame and a search region feature map of the search region from the second frame (claim 26, limitations 1-2); and
means for convolving the extracted target region feature map with the extracted search region feature map to determine a cross-correlation map (claim 26, limitation 3).
However, claim 27 of the US patent does not disclose means for receiving the extracted target region feature map and the extracted search region feature map; 

means for comparing the cross-correlation map with a ground truth cross-correlation map to determine a loss value, the ground truth cross-correlation map based on a ground truth target feature map of the target object from the subsequent frame; and 
means for back propagating the loss value into an artificial neural network to update a plurality of filter weights of a representation function.
Rodriguez discloses means for receiving the extracted target region feature map and the extracted search region feature map (Rodriguez discloses “set 40 of annotated images [being] provided. Each annotated image 38 is annotated with a bounding box which identifies a location of an object of interest” at Fig. 2-S104 and ¶0042);
means for predicting coordinates of the target region in the subsequent frame based on the convolving (Rodriguez discloses “a similarity [being] computed between the query image representation 46 and the dataset image representations 48 (optionally projected into the new feature space) to identify a subset 56 of one or more similar annotated image(s), i.e., those which have the most similar representations (in the new feature space)” at Fig. 2-S112 and ¶0046);
means for comparing the cross-correlation map with a ground truth cross-correlation map to determine a loss value, the ground truth cross-correlation map based on a ground truth target feature map of the target object from the subsequent frame (Rodriguez discloses “[optimizing] the DDD task, the trained CNN (with the fully-connected layers removed) may be learned to optimize the ranking loss” at Fig. 4-104 and ¶0086); and
means for back propagating the loss value into an artificial neural network to update a plurality of filter weights of a representation function (Rodriguez discloses “[t]he vector of these derivatives of the loss [being] backpropagated through the model by first multiplying the vector of the loss 104 by the layer 98 weights, and computing the derivative of the loss, and using the 
Before the time of the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to utilize the CNN of Rodriguez to claim 17 of the US patent.
The suggestion/motivation would have been to “provide a method of object localization which takes spatial information into account effectively in generating a global representation of the image while also providing a compact representation” (Rodriguez; ¶0009).

Claim 20 is rejected on the ground of nonstatutory double patenting as being unpatentable over claim 22 of U.S. Patent No.9,830,709 B2 in view of Rodriguez-Serrano et al. (US 2017/0083792 A1) (hereinafter, Rodriguez).
Although the claims at issue are not identical, they are not patentably distinct from each other because:
Claim 20 of the instant application and the claim 22 of U.S. Patent No.9,830,709 B2 recite common subject matter.
The elements of claim 20 are fully anticipated by the claim 22 of U.S. Patent No.9,830,709 B2.
Moreover, Moreover, claim 22 of the US patent discloses a non-transitory computer-readable medium having program code recorded thereon, the program code being executed by a processor of a neural computing device and comprising:
program code to receive a target object from a first frame of a sequence of frames and a search region corresponding to an expected location of the target object in a subsequent frame of the sequence of frames based on a location of the target object in the first frame, the first frame and the second frame being consecutive frames in the sequence of frames, the first frame and the second frame corresponding to different images (claim 22); 

program code to convolve the extracted target region feature map with the extracted search region feature map to determine a cross-correlation map (claim 21, limitation 3).
However, claim 22 of the US patent does not disclose program code to receive the extracted target region feature map and the extracted search region feature map; 
program code to predict coordinates of the target region in the second frame based on the cross-correlation map;
program code to compare the cross-correlation map with a ground truth cross-correlation map to determine a loss value, the ground truth cross-correlation map based on a ground truth target feature map of the target object from the subsequent frame; and 
program code to back propagate the loss value into an artificial neural network to update a plurality of filter weights of a feature map function.
Rodriguez discloses program code to receive the extracted target region feature map and the extracted search region feature map (Rodriguez discloses “set 40 of annotated images [being] provided. Each annotated image 38 is annotated with a bounding box which identifies a location of an object of interest” at Fig. 2-S104 and ¶0042);
program code to predict coordinates of the target region in the subsequent frame based on the convolving (Rodriguez discloses “a similarity [being] computed between the query image representation 46 and the dataset image representations 48 (optionally projected into the new feature space) to identify a subset 56 of one or more similar annotated image(s), i.e., those which have the most similar representations (in the new feature space)” at Fig. 2-S112 and ¶0046);
program code to compare the cross-correlation map with a ground truth cross-correlation map to determine a loss value, the ground truth cross-correlation map based on a ground truth 
program code to back propagate the loss value into an artificial neural network to update a plurality of filter weights of a representation function (Rodriguez discloses “[t]he vector of these derivatives of the loss [being] backpropagated through the model by first multiplying the vector of the loss 104 by the layer 98 weights, and computing the derivative of the loss, and using the new loss to update the weights of C5, then to C4, and so on, in the conventional manner” at Fig. 4 and ¶0088).
Before the time of the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to utilize the CNN of Rodriguez to claim 11 of the US patent.
The suggestion/motivation would have been to “provide a method of object localization which takes spatial information into account effectively in generating a global representation of the image while also providing a compact representation” (Rodriguez; ¶0009).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Tran et al. (US 2016/0189009 A1) in view of Rodriguez-Serrano et al. (US 2017/0083792 A1) (hereinafter, Rodriguez).

a representation network configured to:
receive a target object from a first frame of a sequence of frames and a search region corresponding to an expected location of the target object in a second frame of the sequence of frames based on a location of the target object in the first frame, the first frame and the second frame being consecutive frames in the sequence of frames, the first frame and the second frame corresponding to different images (Tran discloses that “The video 380 can include, or be represented by, a set of images (i.e., video image frames, still frames, etc.). In this example, the video 380 can include Frame 1 382, Frame 2 384, and all other frames through Frame N 386. If the video 380 is recorded at 24 frames per second and if Frame N 386 is the 48th frame, for example, then a normal playback length of the video 380 should be two seconds. In some implementations, the convolutional neural network 302 can be configured to receive as input video content having 64 frames …” at Figs. 3-380, 382, 384, 386 and ¶¶0042-0043); and
extract a target region representation of the target object from the first frame and a search region representation of the search region from the second frame (Tran discloses that “based on the one or more outputs from the convolutional neural network 302, the plurality of video feature descriptors for the video 380 (e.g., scene descriptors 350, object descriptors 360, action descriptors 370, etc.) can be determined. In this example, the inputted video 380 can correspond to a recording of a car driving in a park. As such, the scene descriptors 350 can, for instance, indicate significant likelihoods that an outdoor scene and a park scene are recognized in the video 380 but lower likelihoods that an indoor scene and an office scene are recognized in the video 380. The object descriptors 360 can, for example, indicate significant likelihoods that a tree object and a car object are recognized but lower likelihoods for a cat object and a table object. Also, in this example, the action descriptors 370 can indicate a significant likelihood that a driving action is recognized in the video 380 but lower likelihoods for a smiling action, a jumping action, and a walking object. It should be noted that this example scenario 300 and its specific details are provided for illustrative purposes. Many variations are possible” at Fig. 3 and ¶0048).
However, rest of the claim limitations are not disclose by Tran.
Instead, Rodriguez discloses a cross-correlation layer configured to:

convolve the extracted target region with the extracted search region to determine a cross-correlation map (Rodriguez discloses “the query image 12 [being] input to the neural network model 42 at the output of the selected layer of the model is used to generate a representation 46 of the query image, in a similar manner to the annotated images” at Fig. 2-S110 and ¶0045);
a predicting layer configured to:
receive the cross-correlation map (Rodriguez discloses “a similarity [being] computed between the query image representation 46 and the dataset image representations 48 (optionally projected into the new feature space) to identify a subset 56 of one or more similar annotated image(s), i.e., those which have the most similar representations (in the new feature space)” at Fig. 2-S112 and ¶0046); and 
predict coordinates of the target object in the second frame based on the cross-correlation map (Rodriguez discloses “a similarity [being] computed between the query image representation 46 and the dataset image representations 48 (optionally projected into the new feature space) to identify a subset 56 of one or more similar annotated image(s), i.e., those which have the most similar representations (in the new feature space)” at Fig. 2-S112 and ¶0046); and 
a loss layer (Rodriguez discloses a process of optimizing the ranking loss at Fig. 4-104 and  ¶¶0086-0088) configured to:
compare the cross-correlation map with a ground truth cross-correlation map to determine a loss value, the ground truth cross-correlation map based on a ground truth target object from the second frame (Rodriguez discloses “[optimizing] the DDD task, the trained CNN 
back propagate the loss value into the ANN to update a plurality of filter weights of a representation function (Rodriguez discloses “[t]he vector of these derivatives of the loss [being] backpropagated through the model by first multiplying the vector of the loss 104 by the layer 98 weights, and computing the derivative of the loss, and using the new loss to update the weights of C5, then to C4, and so on, in the conventional manner” at Fig. 4 and ¶0088).
Before the time of the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to utilize the process of CNN of Rodriguez to Trans’s neural network. 
The suggestion/motivation would have been to “provide a method of object localization which takes spatial information into account effectively in generating a global representation of the image while also providing a compact representation” (Rodriguez; ¶0009).
b.	Regarding claim 2, the combination applied in claim 1 discloses in which the cross-correlation layer is further configured to convolve the ground truth target feature map with the search region feature map to determine the ground truth cross-correlation map (Rodriguez discloses “[t]he CNN 43, when used for conventional computer vision tasks, [receiving] as input the (R,G,B) pixel values of an image of standardized size. The first layers 70, 72, 74, 76, 78 are a concatenation of several convolutional layers sometimes followed by max-pooling layers 86, 88, 90. The exemplary model includes three max-pooling layers although fewer or more max-pooling layers may be included in the model, such as 1, 2, 4, or 5 max-pooling layers. The last, fully connected layers 80, 82, 84 are essentially a fully-connected multi-layer perceptron. The first 80 of such layers is connected to all the neurons of the preceding max-pooling layer 90. The standard output 92 of the neural network is a soft-max encoding of the task, so for a 1000-
c.	Regarding claim 3, the combination applied in claim 1 discloses in which the loss layer comprises a pixel-wise least square errors (L2) loss function or a structured base loss function (Rodriguez discloses the derivatives of the ranking loss at Fig. 4 and ¶0087).
d.	Regarding claim 4, the combination applied in claim 1 discloses in which the cross-correlation layer is further configured to determine the cross-correlation map during a forward pass of the ANN (Rodriguez discloses “[t]he CNN 43, when used for conventional computer vision tasks, [receiving] as input the (R,G,B) pixel values of an image of standardized size. The first layers 70, 72, 74, 76, 78 are a concatenation of several convolutional layers sometimes followed by max-pooling layers 86, 88, 90. The exemplary model includes three max-pooling layers although fewer or more max-pooling layers may be included in the model, such as 1, 2, 4, or 5 max-pooling layers. The last, fully connected layers 80, 82, 84 are essentially a fully-connected multi-layer perceptron. The first 80 of such layers is connected to all the neurons of the preceding max-pooling layer 90. The standard output 92 of the neural network is a soft-max encoding of the task, so for a 1000-class classification problem, the output has 1000 neurons which constitute a prediction for each class” at Figs. 3 and 4 and ¶¶0058-0064).
e.	Regarding claim 5, the combination applied in claim 1 discloses in which, during a backward pass, the loss value for the plurality of filter weights is back propagated through the extracted target region feature map and the loss value for an input feature map is back propagated through the extracted search region feature map (Rodriguez discloses “[t]he vector of these derivatives of the loss [being] backpropagated through the model by first multiplying the vector of the loss 104 by the layer 98 weights, and computing the derivative of the loss, and using the new loss to update the weights of C5, then to C4, and so on, in the conventional manner” at Fig. 4 and ¶0088).

g.	Regarding claim 7, the combination applied in claim 1 discloses in which the ground truth cross-correlation map comprises a label (Rodriguez discloses “the CNN … [being] trained by end-to-end learning of the parameters of the neural network using a set of training images labeled by class” at ¶0041).
h.	Regarding claim 8, the combination applied in claim 1 discloses in which the representation network is further configured to convolve a target image with the representation function to extract the target region feature map (Rodriguez discloses generating representation of each of the annotated images, which were providing with a bounding box which identifies a location of an object of interest, by using a neural network that was previously generating or trained at Figs. 2-S102, S104 and S106 and ¶¶0040-0043).
i.	Regarding claims 9-16, claims 9-16 are analogous and correspond to claims 1-8, respectively. See rejection of claims 1-8 for further explanation.
j.	Regarding claim 17, claim 17 is analogous and corresponds to claim 9. See rejection of claim 9 for further explanation.
k.	Regarding claim 18, claim 18 is analogous and corresponds to claim 2. See rejection of claim 2 for further explanation.
l.	Regarding claim 19, Rodriguez discloses further comprising means for back propagating, during a backward pass:
the loss value for the plurality of filter weights through the extracted target region feature map (Rodriguez discloses “[t]he vector of these derivatives of the loss [being] backpropagated 
the loss value for an input feature map through the extracted search region feature map (Rodriguez discloses “[t]he vector of these derivatives of the loss [being] backpropagated through the model by first multiplying the vector of the loss 104 by the layer 98 weights, and computing the derivative of the loss, and using the new loss to update the weights of C5, then to C4, and so on, in the conventional manner” at Fig. 4 and ¶0088).
m.	Regarding claim 20, Rodriguez discloses a non-transitory computer-readable medium having program code recorded thereon for tracking a target across a sequence of frames using a representation function of an artificial neural network (ANN), the program code being executed by a processor of a neural computing device (Tran discloses that “the processes and features described herein are implemented as a series of executable modules run by the computer system 800, individually or collectively in a distributed computing environment. The foregoing modules may be realized by hardware, executable modules stored on a computer-readable medium (or machine-readable medium), or a combination of both. For example, the modules may comprise a plurality or series of instructions to be executed by a processor in a hardware system, such as the processor 802. Initially, the series of instructions may be stored on a storage device, such as the mass storage 818. However, the series of instructions can be stored on any suitable computer readable storage medium. Furthermore, the series of instructions need not be stored locally, and could be received from a remote storage device, such as a server on a network, via the network interface 816. The instructions are copied from the storage device, such as the mass storage 818, into the system memory 814 and then accessed and executed by the processor 802. In various implementations, a module or modules can be executed by a processor or multiple processors in one or multiple locations, such as multiple servers in a parallel processing environment …” at ¶¶0091-0092). 
Moreover, rest of the claim limitations are analogous and corresponds to the claim limitations of claim 9. See rejection of claim 9 for further explanation.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN W LEE whose telephone number is (571)272-9554.  The examiner can normally be reached on Mon-Fri 8:00AM-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, NAY MAUNG can be reached on 571-272-7882.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JOHN W LEE/Primary Examiner, Art Unit 2664