DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending in the application.

Claim Objections
Claim 18 (line 2) “image features to the” should be ““image features corresponding to the”. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-2, 4-5, 8-9 and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Bertinetto et al. (Bertinetto, L., et al., “Fully-Convolutional Siamese Networks for Object Tracking”, in ECCV 2016 Workshops, Part II, LNCS 9914, pp. 850–865, 2016, hereafter Bertinetto).

a storage device storing a set of instructions; and one or more processors in communication with the storage device, wherein when executing the set of instructions, the one or more processors are configured to (The above features are inherent for a computer system): 
obtain a target part of reference image features of a reference image (Fig. 1 exemplar image “z” being a target part of a reference image (see also Fig. 2 top row rectangles being target parts of respective images; page 850 first para. following section 1 “Introduction”: “the object is identified solely by a rectangle in the first frame”); Fig. 1  “6X6X128” being corresponding features); 
obtain a target part of target image features of a target image, wherein the target part of the target image features is selected from the target image features based on the target part of the reference image features (Fig. 1 search image “x” being a target image; Fig. 1  “22X22X128” being corresponding features of “x”; Fig. 1 shows a fully-convolution Siamese network for obtaining a similar block to the query image “z” within a search image “x”; The score map (17X17X1) comprises points (e.g. the red and blue pixels) representing target parts of target image that are similar to the target part of the reference image. See page 851 last para., page 852 first para. and subsection 2.1); 
determine, based on the target part of the reference image features and the target part of the target image features, whether the target image is similar to the 
mark, upon a determination that the target image is similar to the reference image, the target image as a similar image of the reference image (page 851 last para.: “We propose to learn a function f(z, x) that compares an exemplar image z to a candidate image x of the same size and returns a high score if the two images depict the same object and a low score otherwise. To find the position of the object in a new image, we can then exhaustively test all possible locations and choose the candidate with the maximum similarity to the past appearance of the object”).

As per claim 2, dependent upon claim 1, Bertinetto teaches to obtain the target part of the reference image features of the reference image, the one or more processors are configured to: obtain a target region of the reference image; and obtain the target part of the reference image features corresponding to the target region of the reference image (Fig. 2 “z” being a target region of the reference image, “6X6X128” being the corresponding features).

As per claim 4, dependent upon claim 2, Bertinetto teaches to obtain the target part of the target image features of the target image, the one or more processors are configured to: 
generate a score map based on the target part of the reference image features and the target image features, wherein the score map includes a plurality of points, each point corresponding to a score; 

identify a target region of the target image based on the target block of the score map; and 
obtain the target part of the target image features corresponding to the target region of the target image (Fig. 2 “17X17X1” being score map including a plurality of points; Fig. 1 caption: “In this example, the red and blue pixels in the score map contain the similarities for the corresponding sub-windows”; page 851 last para.: “We propose to learn a function f(z, x) that compares an exemplar image z to a candidate image x of the same size and returns a high score if the two images depict the same object and a low score otherwise. To find the position of the object in a new image, we can then exhaustively test all possible locations and choose the candidate with the maximum similarity to the past appearance of the object”).

As per claim 5, dependent upon claim 4, Bertinetto teaches to generate the score map based on the target part of the reference image features and the target image features, the one or more processors are configured to: conduct a convolution calculation to the target part of the reference image features and the target image features (Fig. 1 the operation “*” between two outputs from the two branches; page 851 3rd para.: “A further contribution is a novel Siamese architecture that is fully-convolutional with respect to the search image: dense and efficient sliding-window evaluation is achieved with a bilinear layer that computes the cross-correlation of its two inputs”).

As per claim 8, dependent upon claim 4, Bertinetto teaches a size of the score map is the same as a size of the target image, each point of the score map corresponding to one or more pixels of the target image; and a size of the target block is the same as a size of the target region of the reference image, each point of the target block corresponding to one or more pixels of the reference image (Fig. 1 the score map is of dimension “17X17X1”. Note the score map represents the cross-correlation between the target region of the reference image (i.e. “z”)  and the sub-window of the target image (shaded areas in the search image “x”). Further the score map is generated by a convolution (sliding-window) to the target part of the reference image features and the target image features. Therefore each pixel of the score map corresponding to multiple pixels (say n1 and n2 respectively) in image z and image x respectively. The number of n1 and n2 are determined by the network parameters (see Table 1; Literally speaking, the size of the score map is not the same as a size of the target image, but after scaling and adding margin, the two sizes match (section 2.4 “Dataset Curation” talking about scaling and adding margin)). FIG. 1 the two shaded blocks in “x” is the same size of “z”).

As per claim 9, dependent upon claim 1, Bertinetto teaches that the target part of the reference image features corresponding to the target region of the reference image and the target image features of the target image are obtained based on a fully convolutional siamese neural network model (Fig. 1 “6X6X128” and “22X22X128” being the target part features corresponding to the target region of the reference image and 

Claim 20, an independent medium claim, is rejected as applied to system claim 1 above.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 3, 12-15 and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Bertinetto et al. (Bertinetto, L., et al., “Fully-Convolutional Siamese Networks for Object Tracking”, in ECCV 2016 Workshops, Part II, LNCS 9914, pp. 850–865, 2016, hereafter Bertinetto), in view of Sun et al. (US Publication 2017/0091952 A1, hereafter Sun).
As per claim 3,  Bertinetto teaches determining a similarity between the target part of the reference image features and the target part of the target image features (See rejections applied to claim 1), but do not teach a first threshold. 

Taking the combined teachings of Bertinetto and Sun as a whole, it would have been obvious for a person with ordinary skill in the art before the effective filing date of the claimed invention to consider using a threshold when measure similarity between two image areas in order to determine the similarity based on a standard. 

As per claim 12, an independent method claim, Bertinetto teaches every limitation as recited except for “a communication platform connected to a network”. 
Sun is evidenced that such a communication platform is well-known and practiced (FIG. 12). 
Taking the combined teachings of Bertinetto and Sun as a whole, it would have been obvious for a person with ordinary skill in the art before the effective filing date of the claimed invention to consider including a communication platform connected to a network in a computing system in order for the system to input data and output results from and to outside devices. 

 Claim 13, dependent upon claim 12, is rejected as applied to claim 2 above.

Claim 14, dependent upon claim 13, is rejected as applied to claim 4 above.

Claim 15, dependent upon claim 14, is rejected as applied to claim 5 above.

Claim 17, dependent upon claim 14, is rejected as applied to claim 8 above.

Claim 18, dependent upon claim 13, is rejected as applied to claim 9 above.

Claim 19, dependent upon claim 18, is rejected as applied to claim 10 below.

Claims 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Bertinetto et al. (Bertinetto, L., et al., “Fully-Convolutional Siamese Networks for Object Tracking”, in ECCV 2016 Workshops, Part II, LNCS 9914, pp. 850–865, 2016, hereafter Bertinetto).
As per claim 10, dependent upon claim 9, Bertinetto teaches that the fully convolutional siamese neural network model is generated based on a training process (Bertinetto page 853-854 section 2.2 “Training with Large Search Images”) , the training process including: 
obtaining a plurality of sample images, each sample image relating to a same object;
obtaining a preliminary fully convolutional siamese neural network; 
for the each sample image, obtaining a region of the sample image as a first input of the preliminary fully convolutional siamese neural network; and 
obtaining the sample image as a second input of the preliminary fully convolutional siamese neural network; and 


As per claim 11, dependent upon claim 10, Bertinetto teaches the training process further includes: 
for the each sample image, generating first sample image features based on the first input (Fig. 2 showing first input and generated first sample image features); 
generating second sample image features based on the second input (Fig. 2 showing second input and generated second sample image features); and
generating a sample score map based on the first sample image features and the second sample image features (Fig. 2 showing a “17X17X1” score map); and 
.

Claims 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over Bertinetto et al. (Bertinetto, L., et al., “Fully-Convolutional Siamese Networks for Object Tracking”, in ECCV 2016 Workshops, Part II, LNCS 9914, pp. 850–865, 2016, hereafter Bertinetto), as applied above to claim 4, in view of MORIMOTO et al. (US Publication 2008/0292189 A1, hereafter MORIMOTO).
As per claim 6, Bertinetto teaches:
obtaining one or more blocks of the score map, each block corresponding to the target region of the reference image; and 
designating the block with a maximum of the scores as the target block (Fig. 2 and caption; page 851 last para.: “To find the position of the object in a new image, we can then exhaustively test all possible locations and choose the candidate with the maximum similarity to the past appearance of the object”; page 853 2nd para.: “The position of the maximum score relative to the centre of the score map, multiplied by the stride of the network, gives the displacement of the target from frame to frame”).
Bertinetto, however, does not teach for each of the one or more blocks, determining a summation of the scores corresponding to the points in the block.
MORIMOTO discloses an apparatus for determining if an input image is similar to a reference image (ABSTARCT). Specifically, MORIMOTO divides the input image data 
Taking the combined teachings of Bertinetto and MORIMOTO as a whole, it would have been obvious for a person with ordinary skill in the art before the effective filing date of the claimed invention to consider calculating a summation of similarity scores in order to obtain a similarity score that accurately represents an area with a plurality of regions, each with respective similarity scores.  

As per claim 7, dependent upon claim 6, Bertinetto in view of MORIMOTO teaches determining whether the maximum summation of the scores is greater than a second threshold; and upon a determination that the maximum summation of the scores is greater than the second threshold, designate the block with the maximum summation of the scores as the target block (The combination of Bertinetto and MORIMOTO renders obviousness of determining the block with the maximum summation of the scores as the target block. See rejections applied in claim 6. Further it is noticed that MORIMOTO determines similarity of two image by comparing the summation similarity score with the second threshold. It is inherent that when the summation similarity score is greater than the second threshold, there exists similarity).

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Bertinetto et al. (Bertinetto, L., et al., “Fully-Convolutional Siamese Networks for Object Tracking”, .

Claim 16, dependent upon claim 14, is rejected as applied to claim 6 above.

Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XUEMEI G CHEN whose telephone number is (571)270-3480.  The examiner can normally be reached on Monday-Friday 9am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nay Maung can be reached on 571-272-7882.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private 
/XUEMEI G CHEN/Primary Examiner, Art Unit 2664