DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions.
Information Disclosure Statement
The information disclosure statement (IDS), submitted on 06/08/2021, is being considered by the examiner.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the claims at issue are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the reference application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/forms/. The filing date of the application in which the form is filed  determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to http://www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
4.       Claims 1, 4-6, 9-11, and 14-15 are rejected on the ground of nonstatutory double patenting as being unpatentable over related claims of the U.S. Patent 11,082,634 B2. Although the conflicting claims are not identical, they are not patentably distinct from each other because the instant claims are similar to the claims in the US Patent 11,082,634 B2 to meet the limitations claimed in the co-pending applications.  Table 1 shows comparisons between the instant claims and the US Patent 11,082,634 B2 claims.
Table 1: Comparison of claims in the instant Application 17/360,169 vs. the U.S. Patent 11,082,634 B2.
Instant Application 17/360,169 
U.S. Patent 11,082,634 B2 
1. A person tracking apparatus comprising: 
one or more non-transitory storage devices configured to store instructions; and 
one or more processors configured to execute the instructions to: 
receive a first video captured by a first camera, a second video captured by a second camera, and a third video captured by a third camera; and 
cause a display device to display thumbnail images and the third video in a same displayed screen, based on one or more monitored persons appearing in the third video, 
the thumbnail images including a first thumbnail image based on the first video captured by the first camera and a second thumbnail image based on the second video captured by the second camera which is different from the first camera, 
the first thumbnail image being a part of the first video, and the second thumbnail image being a part of the second video, and 
each of the thumbnail images indicating a person determined to have a similarity to a monitored person appearing in the third video.
1. A person tracking system comprising:
one or more non-transitory storage devices configured to store instructions; and one or more processors configured
by the instructions to:
receive a first video captured by a first camera, a second video captured by a second camera, and a third video captured by a third camera; and
cause a display device to display thumbnail images and the third video in a same display, based on one or more monitored persons appearing in the third video; wherein the thumbnail images include a first thumbnail image based on the first video captured by the first camera and a second thumbnail image based on the second video captured by the second camera which is different from the first camera, the first thumbnail image being a part of the first video, the second thumbnail image being a part of the second video, each of the thumbnail images indicating a person whose similarity to a monitored person appearing in the third video is greater than a threshold value, and wherein each of the thumbnail images is longer in a vertical direction than as compared to a horizontal direction, includes both of a head and a part of a body of the person of standing posture.
4. The person tracking apparatus according to claim 1, wherein the one or more non-transitory storage devices are configured to: store the first video and the second video previously captured by the first camera and the second camera.  
2. The person tracking system according to claim 1, wherein the one or more non-transitory storage devices are configured to:
store the first video and the second video previously captured by the first camera and the second camera.
5. The person tracking apparatus according to claim 1, wherein the one or more processors are further configured to: calculate a similarity between persons indicated by each of the thumbnail images and the one or more monitored persons appearing in the third video.  
4. The person tracking system according to claim 3, wherein the one or more processors are further configured to:
calculate a similarity between each of the persons indicated by each of the thumbnail images and the one or more monitored persons displayed in the third video,
wherein the window includes the thumbnail images of persons having a similarity to the one or more monitored persons that is greater than the threshold value.
6. A method comprising: 
receiving a first video captured by a first camera, a second video captured by a second camera, and a third video captured by a third camera; and 
causing a display device to display thumbnail images and the third video in a same displayed screen, based on one or more monitored persons appearing in the third video, the thumbnail images including a first thumbnail image based on the first video captured by the first camera and a second thumbnail image based on the second video captured by the second camera which is different from the first camera, 
the first thumbnail image being a part of the first video, and the second thumbnail image being a part of the second video, and 
each of the thumbnail images indicating a person determined to have a similarity to a monitored person appearing in the third video. 
7. A person tracking method comprising:
receiving a first video captured by a first camera, a second video captured by a second camera, and a third video captured by a third camera; and causing a display device to display thumbnail images and the third video in a same display, based on one or more monitored persons appearing in the third video; wherein the thumbnail images include a first thumbnail image based on the first video captured by the first camera and a second thumbnail image based on the second video captured by the second camera which is different from the first camera, the first thumbnail image being a part of the first video, the second thumbnail image being a part of the second video, each of the thumbnail images indicating a person whose similarity to a monitored person appearing in the third video is greater than a threshold value, and wherein each of the thumbnail images is longer in a vertical direction than as compared to a horizontal direction, includes both of a head and a part of a body of the person of standing posture.
9. The method according to claim 6, further comprising: storing the first video and the second video previously captured by the first camera and the second camera.
8. The person tracking method according to claim 7, further comprising:
storing the first video and the second video previously captured by the first camera and the second camera.
10. The method according to claim 6, further comprising: calculating a similarity between persons indicated by each of the thumbnail images and the one or more monitored persons appearing in the third video.  
10. The person tracking method according to claim 9, further comprising:
calculating a similarity between each of the persons indicated by each of the thumbnail images and the one or more monitored persons displayed in the third video, wherein the window includes the thumbnail images of persons having a similarity to the one or more monitored persons that is greater than the threshold value.
11. A non-transitory computer-readable medium storing instructions, the instructions comprising: one or more instructions that, when executed by one or more processors of a person tracking apparatus, cause the one or more processors to: receive a first video captured by a first camera, a second video captured by a second camera, and a third video captured by a third camera; and cause a display device to display thumbnail images and the third video in a same displayed screen, based on one or more monitored persons appearing in the third video, the thumbnail images including a first thumbnail image based on the first video captured by the first camera and a second thumbnail image based on the second video captured by the second camera which is different from the first camera, the first thumbnail image being a part of the first video, and the second thumbnail image being a part of the second video, and each of the thumbnail images indicating a person determined to have a similarity to a monitored person appearing in the third video.
13. A non-transitory computer-readable storage medium storing a program that causes a computer to perform:
receiving a first video captured by a first camera, a second video captured by a second camera, and a third video captured by a third camera; and
causing a display device to display thumbnail images and the third video in a same display, based on one or more monitored persons appearing in the third video; wherein the thumbnail images include a first thumbnail
image based on the first video captured by the first camera and a second thumbnail image based on the second video captured by the second camera which is different from the first camera, the first thumbnail image being a part of the first video, the second thumbnail image being a part of the second video, each of the thumbnail images indicating a person whose similarity to a monitored person appearing in the third video is greater than a threshold value, and
wherein each of the thumbnail images is longer in a vertical direction than as compared to a horizontal direction, includes both of a head and a part of a body of the person of standing posture.
14. The non-transitory computer-readable medium according to claim 11, wherein the one or more instructions further cause the one or more processors to: store the first video and the second video previously captured by the first camera and the second camera.
14. The non-transitory computer-readable storage medium according to claim 13, wherein the program further causes the computer to perform: storing the first video and the second video previously captured by the first camera and
the second camera.
15. The non-transitory computer-readable medium according to claim 11, wherein the one or more instructions further cause the one or more processors to: calculate a similarity between persons indicated by each of the thumbnail images and the one or more monitored persons appearing in the third video.
16. The non-transitory computer-readable storage
medium according to claim 15, wherein the program further causes the computer to perform:calculating a similarity between each of the persons indicated by each of the thumbnail images and the one or more monitored persons displayed in the third video,
wherein the window includes the thumbnail images of persons having a similarity to the one or more monitored persons that greater than the threshold value.



Claim Rejection – 35 U.S.C. § 112
The following is a quotation of 35 U.S.C. 112(a): 
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention. 
The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112: 
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same and shall set forth the best mode contemplated by the inventor of carrying out his invention.

The following is a quotation of 35 U.S.C. 112(b): 
(B) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. 
The following is a quotation of pre-AIA  35 U.S.C. 112, second paragraph: 
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-19 are rejected under 35 U.S.C. 112(a) or pre-AIA  35 U.S.C. 112, first paragraph, as failing to comply with the written description requirement.  Claims 1, 6, and 11 contain subject matters “the first thumbnail image being a part of the first video, the second thumbnail image being a part of the second video”.  However, these features were not described in the specification. Hence, the limitation “the first thumbnail image being a part of the first video, the second thumbnail image being a part of the second video” is a new matter, which is not described in the application as originally filed.   As a result, claims 1, 6, 11, and their dependent claims are rejected under 35 U.S.C. 112(a) or pre-AIA  35 U.S.C. 112, first paragraph. The new matter is required to be canceled from the claims (Please see MPEP 608.04).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
 (a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under pre-AIA  35 U.S.C. 103(a) are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims under pre-AIA  35 U.S.C. 103(a), the examiner presumes that the subject matter of the various claims was commonly owned at the time any inventions covered therein were made absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and invention dates of each claim that was not commonly owned at the time a later invention was made in order for the examiner to consider the applicability of pre-AIA  35 U.S.C. 103(c) and potential pre-AIA  35 U.S.C. 102(e), (f) or (g) prior art under pre-AIA  35 U.S.C. 103(a). 

Claims 1-2, 4-7, 9-12 and 14-19 are rejected under 35 U.S.C. 103 as being unpatentable over Saptharishi (US Patent Application Publication 2009/0245573 A1), (“Saptharishi”), in view of Ikeda et al. (US Patent Application Publication 2011/0007901 A1), (“Ikeda”).
Regarding claim 1, Saptharishi meets the claim limitations as follow. 
A person tracking system ((i.e. an object tracking system) [Saptharishi: para. 0015]; (i.e. By establishing semantic links between video streams and objects detected, a video history can be created for a particular object. For instance, by selecting a human object, a user may automatically summon video clips showing where the person had been detected previously by other cameras) [Saptharishi: para. 0124]) comprising: one or more non-transitory storage devices (i.e. a computer-readable medium, which include storage devices) [Saptharishi: para. 0135] configured to store instructions (i.e. one or more software programs comprised of program instructions in source code, object code, executable code or other formats. Any of the above can be embodied in compressed or uncompressed form on a computer-readable medium, which include storage devices) [Saptharishi: para. 0135]; and one or more processors ((i.e. a processor) [Saptharishi: claim 21]; (i.e. a computer) [Saptharishi: para. 0136]) configured to execute the instructions to (i.e. one or more software programs comprised of program instructions in source code, object code, executable code) [Saptharishi: para. 0135]: receive a first video (i.e. receives the image data and is operable to detect objects appearing in one or more of the multiple images) [Saptharishi: para 0024] captured by a first camera ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]), a second video captured by a second camera ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]), and a third video captured by a third camera ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]); and cause a display device (i.e. the rules engine 220 may trigger an alarm that is presented on the display 114 of the user interface if a human is detected in the field of view of one of the image capturing devices 102) [Saptharishi: para. 0039] to display thumbnail images (i.e. displays an image of the first object on a display) [Saptharishi: para. 0027] and the third video in a same display screen ((i.e. a video history can be created for a particular object. For instance, by selecting a human object, a user may automatically summon video clips showing where the person had been detected previously by other cameras. The user may then notice companions of the person in question, and may select those companions and view their video histories. Because metadata corresponding to the object's appearance signature is linked in the database with video data corresponding to the location where it was detected, the image itself may be used as a selectable link for searching the database) [Saptharishi: para. 0124]; (i.e. displays an image of the first object on a display) [Saptharishi: para. 0027] ; (i.e. specific categories or classes (e.g., humans, vehicles, animals) of objects are tracked) [Saptharishi: para. 0095]; (i.e. By establishing semantic links between video streams and objects detected, a video history can be created for a particular object. For instance, by selecting a human object, a user may automatically summon video clips showing where the person had been detected previously by other cameras) [Saptharishi: para. 0124]), based on one or more monitored persons appearing in the third video ((i.e. Tracking may be thought of as locating an object in each video frame or image, and establishing correspondences between moving objects across frames. Tracking may be performed within a single image capturing device 102 or across multiple image capturing devices 102. In general, the object tracking module 206 may use object motion between frames as a cue to tracking, while also relying on the match classifier 218 for tracking.) [Saptharishi: para. 0088]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras. As soon as an appearance signature of the person is detected, the system automatically directs live video data from the corresponding camera to a monitor that allows security personnel to visually track the person. As the person moves into the field of view of the next camera, the video feed is automatically switched so that it is not necessary for security personnel to switch back and forth between cameras to continue tracking the person's path) [Saptharishi: para. 0125] ; (i.e. specific categories or classes (e.g., humans, vehicles, animals) of objects are tracked) [Saptharishi: para. 0095]; (i.e. the rules engine 220 may trigger an alarm that is presented on the display 114 of the user interface if a human is detected in the field of view of one of the image capturing devices 102) [Saptharishi: para. 0039]; (i.e. displays an image of the first object on a display) [Saptharishi: para. 0027]); the thumbnail images including a first thumbnail image based on the first video captured by the first camera ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]) and a second thumbnail image based on the second video captured by the second camera which is different from the first camera ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]), the first thumbnail image being a part of the first video ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]), the second thumbnail image being a part of the second video ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]), and each of the thumbnail images indicating a person ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. specific categories or classes (e.g., humans, vehicles, animals) of objects are tracked) [Saptharishi: para. 0095]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras. As soon as an appearance signature of the person is detected, the system automatically directs live video data from the corresponding camera to a monitor that allows security personnel to visually track the person. As the person moves into the field of view of the next camera, the video feed is automatically switched so that it is not necessary for security personnel to switch back and forth between cameras to continue tracking the person's path) [Saptharishi: para. 0125]) determined to have a similarity to a monitored person appearing in the third video  (((i.e. The video analytics module 200 also includes a match classifier 218 connected to the object tracking module 206, the object indexing module 212, and the object search module 214. The match classifier 218 is operable to receive an input pattern z representing signatures of two objects and determine whether the signatures match (e.g., whether the signatures are sufficiently similar). The match classifier 218 may be used by the object tracking module 206, the object indexing module 212, and the object search module 214 to assist the modules with their various operations) [Saptharishi: para. 0042]; (i.e. The decision step value s(z) is correlated with the match classifier's estimate as to how similar it thinks two objects are (e.g., match confidence)) [Saptharishi: para. 0083]; (i.e. The camera system 100 may  automatically recognize pedestrians leaving a parked car, and can compare them later to people entering the car. If a person entering a car is not from the original group who arrived in the car, security personnel may be alerted. Video clips are automatically sent with the alert, so that it is easy to review and quickly determine whether there is a problem. A security guard may then opt to either inform the car owner (if the car is registered by license plate number) or summon police.) [Saptharishi: para. 0126]) (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras. As soon as an appearance signature of the person is detected, the system automatically directs live video data from the corresponding camera to a monitor that allows security personnel to visually track the person. As the person moves into the field of view of the next camera, the video feed is automatically switched so that it is not necessary for security personnel to switch back and forth between cameras to continue tracking the person's path) [Saptharishi: para. 0125]; (i.e. specific categories or classes (e.g., humans, vehicles, animals) of objects are tracked) [Saptharishi: para. 0095]; (i.e. Tracking may be thought of as locating an object in each video frame or image, and establishing correspondences between moving objects across frames. 
Saptharishi does not explicitly disclose the following claim limitations (Emphasis Added).
A person tracking apparatus comprising: one or more non-transitory storage devices configured to store instructions; and one or more processors configured to execute the instructions to: receive a first video captured by a first camera, a second video captured by a second camera, and a third video captured by a third camera; and cause a display device to display thumbnail images and the third video in a same displayed screen, based on one or more monitored persons appearing in the third video, the thumbnail images including a first thumbnail image based on the first video captured by the first camera and a second thumbnail image based on the second video captured by the second camera which is different from the first camera, the first thumbnail image being a part of the first video, the second thumbnail image being a part of the second video, and each of the thumbnail images indicating a person determined to have a similarity to a monitored person appearing in the third video.   
However, in the same field of endeavor Ikeda further discloses the claim limitations and the deficient claim limitations as follows:
(i.e. displays, on the display unit 110, thumbnails or the like of images in the image data) [Ikeda: para. 0180; Fig. 3] (i.e. If thumbnails are displayed in a list, thumbnails of all images including images not yet been uploaded can be displayed on the TV within a time period a general user can tolerate. The above is one of practical solutions) [Ikeda: para. 0190 – Note: Figs. 22A-B shows more than two thumbnail images.  Hence the thumbnail images can include the first thumbnail images and the second thumbnail image] based on the first video captured by the first camera ((i.e. the image capturing device 1 (camera) causes the TV 45 to display a list of thumbnails of images) [Ikeda: para. 0199 – Note: Ikeda teaches that the thumbnail images comes from a video captured by a camera]; (i.e. the service is regarding photographs or video) [Ikeda: para. 0226]) and a second thumbnail image (i.e. If thumbnails are displayed in a list, thumbnails of all images including images not yet been uploaded can be displayed on the TV within a time period a general user can tolerate. The above is one of practical solutions) [Ikeda: para. 0190 – Note: Figs. 22A-B shows more than two thumbnail images.  Hence the thumbnail images can include the first thumbnail images and the second thumbnail image] based on the second video captured by the second camera which is different from the first camera ((i.e. the image capturing device 1 (camera) causes the TV 45 to display a list of thumbnails of images) [Ikeda: para. 0199 – Note: Ikeda teaches that the thumbnail images comes from a video captured by a camera]; (i.e. the service is regarding photographs or video) [Ikeda: para. 0226]), the first thumbnail image being a part of the first video ((i.e. the image capturing device 1 (camera) causes the TV 45 to display a list of thumbnails of images) [Ikeda: para. 0199 – Note: Ikeda teaches that the thumbnail images comes from a video captured by a camera]; (i.e. the service is regarding photographs or video) [Ikeda: para. 0226]), the second thumbnail image being a part of the second video ((i.e. the image capturing device 1 (camera) causes the TV 45 to display a list of thumbnails of images) [Ikeda: para. 0199 – Note: Ikeda teaches that the thumbnail images comes from a video captured by a camera]; (i.e. the service is regarding photographs or video) [Ikeda: para. 0226]), and each of the thumbnail images (i.e. displays, on the display unit 110, thumbnails or the like of images in the image data) [Ikeda: para. 0180; Fig. 3] indicating a person ((i.e. a person to which the user desires to show) [Ikeda: para. 0517; Figs. 5-6]; (i.e. displays, on the display unit 110, thumbnails or the like of images in the image data) [Ikeda: para. 0180; Fig. 3])  
It would have been obvious to one with an ordinary skill in the art at the time of invention to modify the teachings of Saptharishi with Ikeda to display thumbnail images on the display.  
Therefore, the combination of Saptharishi and Ikeda will shorten the upload time and reduce user’s unpleasant feeling while waiting full images being loaded [Ikeda: 0189].  

Regarding claim 2, Saptharishi meets the claim limitations as set forth in claim 1. 
The person tracking apparatus according to claim 1 ((i.e. an object tracking system) [Saptharishi: para. 0015]; (i.e. By establishing semantic links between video streams and objects detected, a video history can be created for a particular object. For instance, by selecting a human object, a user may automatically summon video clips showing where the person had been detected previously by other cameras) [Saptharishi: para. 0124]), wherein each of the thumbnail images includes both of a head (i.e. Real-time Face Detection) [Saptharishi: para. 0048] and a part of a body of the person of a standing posture ((i.e. people are standing) [Saptharishi: para. 0130]; (i.e. a person had been standing) [Saptharishi: para. 0130]).
Saptharishi does not explicitly disclose the following claim limitations (Emphasis Added).
The person tracking apparatus according to claim 1, wherein each of the thumbnail images includes both of a head and a part of a body of the person of a standing posture.
However, in the same field of endeavor Ikeda further discloses the claim limitations and the deficient claim limitations as follows:
wherein each of the thumbnail images ((i.e. a person to which the user desires to show) [Ikeda: para. 0517; Figs. 5-6]; (i.e. displays, on the display unit 110, thumbnails or the like of images in the image data) [Ikeda: para. 0180; Fig. 3])  
It would have been obvious to one with an ordinary skill in the art at the time of invention to modify the teachings of Saptharishi with Ikeda to display thumbnail images on the display.  
Therefore, the combination of Saptharishi and Ikeda will shorten the upload time and reduce user’s unpleasant feeling while waiting full images being loaded [Ikeda: 0189].  

Regarding claim 4, Saptharishi meets the claim limitations as set forth in claim 1. Saptharishi further meets the claim limitations as follow.
The person tracking system according to claim 1 (i.e. an object tracking system) [Saptharishi: para. 0015], wherein the one or more non-transitory storage devices (i.e. a computer-readable medium, which include storage devices) [Saptharishi: para. 0135] are configured to (i.e. one or more software programs comprised of program instructions in source code, object code, executable code or other formats. Any of the above can be embodied in compressed or uncompressed form on a computer-readable medium, which include storage devices) [Saptharishi: para. 0135]: store the first video and the second video (i.e. data can be stored) [Saptharishi: para. 0118] previously captured by the first camera and the second camera ((i.e. historical video from a variety of image capturing devices) [Saptharishi: para. 0116]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]; (i.e. The signatures or index elements stored in the metadata database 112 may facilitate searching a large database of objects quickly for a specific object because actual pixel information from video images does not need to be reprocessed. The object search module 214 may use the same match classifier 218 used for tracking and indexing to search for a specific object. The match classifier 218, together with the signatures of objects, enable object-based searches in both historical video and real-time video feeds) [Saptharishi: para. 0110]).

Regarding claim 5, Saptharishi meets the claim limitations as set forth in claim 1. Saptharishi further meets the claim limitations as follow.
The person tracking system according to claim 1 (i.e. an object tracking system) [Saptharishi: para. 0015], wherein the one or more processors are further are configured to (i.e. wherein the processor is operable to) [Saptharishi: claim 22]:calculate a similarity between persons ((i.e. The video analytics module 200 also includes a match classifier 218 connected to the object tracking module 206, the object indexing module 212, and the object search module 214. The match classifier 218 is operable to receive an input pattern z representing signatures of two objects and determine whether the signatures match (e.g., whether the signatures are sufficiently similar). The match classifier 218 may be used by the object tracking module 206, the object indexing module 212, and the object search module 214 to assist the modules with their various operations) [Saptharishi: para. 0042]; (i.e. The decision step value s(z) is correlated with the match classifier's estimate as to how similar it thinks two objects are (e.g., match confidence)) [Saptharishi: para. 0083] ; (i.e. the object of interest is a human and the selected feature corresponds to the face of the human) [Marman: claim 34]; (i.e. The camera system 100 may  automatically recognize pedestrians leaving a parked car, and can compare them later to people entering the car. If a person entering a car is not from the original group who arrived in the car, security personnel may be alerted. Video clips are automatically sent with the alert, so that it is easy to review and quickly determine whether there is a problem. A security guard may then opt to either inform the car owner (if the car is registered by license plate number) or summon police.) [Saptharishi: para. 0126]) indicated by each of the thumbnail images and the one or more monitored persons ((i.e. The video analytics module 200 also includes a match classifier 218 connected to the object tracking module 206, the object indexing module 212, and the object search module 214. The match classifier 218 is operable to receive an input pattern z representing signatures of two objects and determine whether the signatures match (e.g., whether the signatures are sufficiently similar). The match classifier 218 may be used by the object tracking module 206, the object indexing module 212, and the object search module 214 to assist the modules with their various operations)) [Saptharishi: para. 0042]; (i.e. specific categories or classes (e.g., humans, vehicles, animals) of objects are tracked) [Saptharishi: para. 0095] ; (i.e. the object of interest is a human and the selected feature corresponds to the face of the human) [Marman: claim 34]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras. As soon as an appearance signature of the person is detected, the system automatically directs live video data from the corresponding camera to a monitor that allows security personnel to visually track the person. As the person moves into the field of view of the next camera, the video feed is automatically switched so that it is not necessary for security personnel to switch back and forth between cameras to continue tracking the person's path) [Saptharishi: para. 0125] ; (i.e. Tracking may be thought of as locating an object in each video frame or image, and establishing correspondences between moving objects across frames. Tracking may be performed within a single image capturing device 102 or across multiple image capturing devices 102. In general, the object tracking module 206 may use object motion between frames as a cue to tracking, while also relying on the match classifier 218 for tracking.) [Saptharishi: para. 0088])  appeared in the third video ((i.e. the rules engine 220 may trigger an alarm that is presented on the display 114 of the user interface if a human is detected in the field of view of one of the image capturing devices 102) [Saptharishi: para. 0039]; (i.e. displays an image of the first object on a display) [Saptharishi: para. 0027]).

Regarding claim 6, Saptharishi meets the claim limitations as follow. 
A method (i.e. methods) [Saptharishi: para. 0120]  comprising:receiving a first video (i.e. receives the image data and is operable to detect objects appearing in one or more of the multiple images) [Saptharishi: para 0024] captured by a first camera ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]), a second video captured by a second camera ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]), and a third video captured by a third camera ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]); and causing a display device (i.e. the rules engine 220 may trigger an alarm that is presented on the display 114 of the user interface if a human is detected in the field of view of one of the image capturing devices 102) [Saptharishi: para. 0039] to display thumbnail images (i.e. displays an image of the first object on a display) [Saptharishi: para. 0027] and the third video in a same display screen ((i.e. a video history can be created for a particular object. For instance, by selecting a human object, a user may automatically summon video clips showing where the person had been detected previously by other cameras. The user may then notice companions of the person in question, and may select those companions and view their video histories. Because metadata corresponding to the object's appearance signature is linked in the database with video data corresponding to the location where it was detected, the image itself may be used as a selectable link for searching the database) [Saptharishi: para. 0124]; (i.e. displays an image of the first object on a display) [Saptharishi: para. 0027] ; (i.e. specific categories or classes (e.g., humans, vehicles, animals) of objects are tracked) [Saptharishi: para. 0095]; (i.e. By establishing semantic links between video streams and objects detected, a video history can be created for a particular object. For instance, by selecting a human object, a user may automatically summon video clips showing where the person had been detected previously by other cameras) [Saptharishi: para. 0124]), based on one or more monitored persons appearing in the third video ((i.e. Tracking may be thought of as locating an object in each video frame or image, and establishing correspondences between moving objects across frames. Tracking may be performed within a single image capturing device 102 or across multiple image capturing devices 102. In general, the object tracking module 206 may use object motion between frames as a cue to tracking, while also relying on the match classifier 218 for tracking.) [Saptharishi: para. 0088]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras. As soon as an appearance signature of the person is detected, the system automatically directs live video data from the corresponding camera to a monitor that allows security personnel to visually track the person. As the person moves into the field of view of the next camera, the video feed is automatically switched so that it is not necessary for security personnel to switch back and forth between cameras to continue tracking the person's path) [Saptharishi: para. 0125] ; (i.e. specific categories or classes (e.g., humans, vehicles, animals) of objects are tracked) [Saptharishi: para. 0095]; (i.e. the rules engine 220 may trigger an alarm that is presented on the display 114 of the user interface if a human is detected in the field of view of one of the image capturing devices 102) [Saptharishi: para. 0039]; (i.e. displays an image of the first object on a display) [Saptharishi: para. 0027]); the thumbnail images including a first thumbnail image based on the first video captured by the first camera ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]) and a second thumbnail image based on the second video captured by the second camera which is different from the first camera ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]), the first thumbnail image being a part of the first video ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]), the second thumbnail image being a part of the second video ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]), and each of the thumbnail images indicating a person ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. specific categories or classes (e.g., humans, vehicles, animals) of objects are tracked) [Saptharishi: para. 0095]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras. As soon as an appearance signature of the person is detected, the system automatically directs live video data from the corresponding camera to a monitor that allows security personnel to visually track the person. As the person moves into the field of view of the next camera, the video feed is automatically switched so that it is not necessary for security personnel to switch back and forth between cameras to continue tracking the person's path) [Saptharishi: para. 0125]) determined to have a similarity to a monitored person appearing in the third video  (((i.e. The video analytics module 200 also includes a match classifier 218 connected to the object tracking module 206, the object indexing module 212, and the object search module 214. The match classifier 218 is operable to receive an input pattern z representing signatures of two objects and determine whether the signatures match (e.g., whether the signatures are sufficiently similar). The match classifier 218 may be used by the object tracking module 206, the object indexing module 212, and the object search module 214 to assist the modules with their various operations) [Saptharishi: para. 0042]; (i.e. The decision step value s(z) is correlated with the match classifier's estimate as to how similar it thinks two objects are (e.g., match confidence)) [Saptharishi: para. 0083]; (i.e. The camera system 100 may  automatically recognize pedestrians leaving a parked car, and can compare them later to people entering the car. If a person entering a car is not from the original group who arrived in the car, security personnel may be alerted. Video clips are automatically sent with the alert, so that it is easy to review and quickly determine whether there is a problem. A security guard may then opt to either inform the car owner (if the car is registered by license plate number) or summon police.) [Saptharishi: para. 0126]) (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras. As soon as an appearance signature of the person is detected, the system automatically directs live video data from the corresponding camera to a monitor that allows security personnel to visually track the person. As the person moves into the field of view of the next camera, the video feed is automatically switched so that it is not necessary for security personnel to switch back and forth between cameras to continue tracking the person's path) [Saptharishi: para. 0125]; (i.e. specific categories or classes (e.g., humans, vehicles, animals) of objects are tracked) [Saptharishi: para. 0095]; (i.e. Tracking may be thought of as locating an object in each video frame or image, and establishing correspondences between moving objects across frames. 
Saptharishi does not explicitly disclose the following claim limitations (Emphasis Added).
A method comprising: receiving a first video captured by a first camera, a second video captured by a second camera, and a third video captured by a third camera; and causing a display device to display thumbnail images and the third video in a same displayed screen, based on one or more monitored persons appearing in the third video, 3PRELIMINARY AMENDMENTAttorney Docket No.: 3280001400US05 Appln. No.: 17/360,169 the thumbnail images including a first thumbnail image based on the first video captured by the first camera and a second thumbnail image based on the second video captured by the second camera which is different from the first camera, the first thumbnail image being a part of the first video, and the second thumbnail image being a part of the second video, and each of the thumbnail images indicating a person determined to have a similarity to a monitored person appearing in the third video. 
However, in the same field of endeavor Ikeda further discloses the claim limitations and the deficient claim limitations as follows:
(i.e. displays, on the display unit 110, thumbnails or the like of images in the image data) [Ikeda: para. 0180; Fig. 3] (i.e. If thumbnails are displayed in a list, thumbnails of all images including images not yet been uploaded can be displayed on the TV within a time period a general user can tolerate. The above is one of practical solutions) [Ikeda: para. 0190 – Note: Figs. 22A-B shows more than two thumbnail images.  Hence the thumbnail images can include the first thumbnail images and the second thumbnail image] based on the first video captured by the first camera ((i.e. the image capturing device 1 (camera) causes the TV 45 to display a list of thumbnails of images) [Ikeda: para. 0199 – Note: Ikeda teaches that the thumbnail images comes from a video captured by a camera]; (i.e. the service is regarding photographs or video) [Ikeda: para. 0226]) and a second thumbnail image (i.e. If thumbnails are displayed in a list, thumbnails of all images including images not yet been uploaded can be displayed on the TV within a time period a general user can tolerate. The above is one of practical solutions) [Ikeda: para. 0190 – Note: Figs. 22A-B shows more than two thumbnail images.  Hence the thumbnail images can include the first thumbnail images and the second thumbnail image] based on the second video captured by the second camera which is different from the first camera ((i.e. the image capturing device 1 (camera) causes the TV 45 to display a list of thumbnails of images) [Ikeda: para. 0199 – Note: Ikeda teaches that the thumbnail images comes from a video captured by a camera]; (i.e. the service is regarding photographs or video) [Ikeda: para. 0226]), the first thumbnail image being a part of the first video ((i.e. the image capturing device 1 (camera) causes the TV 45 to display a list of thumbnails of images) [Ikeda: para. 0199 – Note: Ikeda teaches that the thumbnail images comes from a video captured by a camera]; (i.e. the service is regarding photographs or video) [Ikeda: para. 0226]), the second thumbnail image being a part of the second video ((i.e. the image capturing device 1 (camera) causes the TV 45 to display a list of thumbnails of images) [Ikeda: para. 0199 – Note: Ikeda teaches that the thumbnail images comes from a video captured by a camera]; (i.e. the service is regarding photographs or video) [Ikeda: para. 0226]), and each of the thumbnail images (i.e. displays, on the display unit 110, thumbnails or the like of images in the image data) [Ikeda: para. 0180; Fig. 3] indicating a person ((i.e. a person to which the user desires to show) [Ikeda: para. 0517; Figs. 5-6]; (i.e. displays, on the display unit 110, thumbnails or the like of images in the image data) [Ikeda: para. 0180; Fig. 3])  
It would have been obvious to one with an ordinary skill in the art at the time of invention to modify the teachings of Saptharishi with Ikeda to display thumbnail images on the display.  
Therefore, the combination of Saptharishi and Ikeda will shorten the upload time and reduce user’s unpleasant feeling while waiting full images being loaded [Ikeda: 0189].

Regarding claim 7, Saptharishi meets the claim limitations as set forth in claim 1. 
The method according to claim 6 (i.e. methods) [Saptharishi: para. 0120], wherein each of the thumbnail images includes both of a head (i.e. Real-time Face Detection) [Saptharishi: para. 0048] and a part of a body of the person of a standing posture ((i.e. people are standing) [Saptharishi: para. 0130]; (i.e. a person had been standing) [Saptharishi: para. 0130]).
Saptharishi does not explicitly disclose the following claim limitations (Emphasis Added).
The method according to claim 6, wherein each of the thumbnail images includes both of a head and a part of a body of the person of a standing posture.
However, in the same field of endeavor Ikeda further discloses the claim limitations and the deficient claim limitations as follows:
wherein each of the thumbnail images ((i.e. a person to which the user desires to show) [Ikeda: para. 0517; Figs. 5-6]; (i.e. displays, on the display unit 110, thumbnails or the like of images in the image data) [Ikeda: para. 0180; Fig. 3])  
It would have been obvious to one with an ordinary skill in the art at the time of invention to modify the teachings of Saptharishi with Ikeda to display thumbnail images on the display.  
Therefore, the combination of Saptharishi and Ikeda will shorten the upload time and reduce user’s unpleasant feeling while waiting full images being loaded [Ikeda: 0189].  

Regarding claim 9, Saptharishi meets the claim limitations as set forth in claim 6. Saptharishi further meets the claim limitations as follow.
The method according to claim 6 (i.e. methods) [Saptharishi: para. 0120] further comprising: storing the first video and the second video (i.e. data can be stored) [Saptharishi: para. 0118] previously captured by the first camera and the second camera ((i.e. historical video from a variety of image capturing devices) [Saptharishi: para. 0116]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]; (i.e. The signatures or index elements stored in the metadata database 112 may facilitate searching a large database of objects quickly for a specific object because actual pixel information from video images does not need to be reprocessed. The object search module 214 may use the same match classifier 218 used for tracking and indexing to search for a specific object. The match classifier 218, together with the signatures of objects, enable object-based searches in both historical video and real-time video feeds) [Saptharishi: para. 0110]).

Regarding claim 10, Saptharishi meets the claim limitations as set forth in claim 6. Saptharishi further meets the claim limitations as follow.
The method according to claim 11 (i.e. methods) [Saptharishi: para. 0120] further comprising:calculating a similarity between persons ((i.e. The video analytics module 200 also includes a match classifier 218 connected to the object tracking module 206, the object indexing module 212, and the object search module 214. The match classifier 218 is operable to receive an input pattern z representing signatures of two objects and determine whether the signatures match (e.g., whether the signatures are sufficiently similar). The match classifier 218 may be used by the object tracking module 206, the object indexing module 212, and the object search module 214 to assist the modules with their various operations) [Saptharishi: para. 0042]; (i.e. The decision step value s(z) is correlated with the match classifier's estimate as to how similar it thinks two objects are (e.g., match confidence)) [Saptharishi: para. 0083] ; (i.e. the object of interest is a human and the selected feature corresponds to the face of the human) [Marman: claim 34]; (i.e. The camera system 100 may  automatically recognize pedestrians leaving a parked car, and can compare them later to people entering the car. If a person entering a car is not from the original group who arrived in the car, security personnel may be alerted. Video clips are automatically sent with the alert, so that it is easy to review and quickly determine whether there is a problem. A security guard may then opt to either inform the car owner (if the car is registered by license plate number) or summon police.) [Saptharishi: para. 0126]) indicated by each of the thumbnail images and the one or more monitored persons ((i.e. The video analytics module 200 also includes a match classifier 218 connected to the object tracking module 206, the object indexing module 212, and the object search module 214. The match classifier 218 is operable to receive an input pattern z representing signatures of two objects and determine whether the signatures match (e.g., whether the signatures are sufficiently similar). The match classifier 218 may be used by the object tracking module 206, the object indexing module 212, and the object search module 214 to assist the modules with their various operations)) [Saptharishi: para. 0042]; (i.e. specific categories or classes (e.g., humans, vehicles, animals) of objects are tracked) [Saptharishi: para. 0095] ; (i.e. the object of interest is a human and the selected feature corresponds to the face of the human) [Marman: claim 34]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras. As soon as an appearance signature of the person is detected, the system automatically directs live video data from the corresponding camera to a monitor that allows security personnel to visually track the person. As the person moves into the field of view of the next camera, the video feed is automatically switched so that it is not necessary for security personnel to switch back and forth between cameras to continue tracking the person's path) [Saptharishi: para. 0125] ; (i.e. Tracking may be thought of as locating an object in each video frame or image, and establishing correspondences between moving objects across frames. Tracking may be performed within a single image capturing device 102 or across multiple image capturing devices 102. In general, the object tracking module 206 may use object motion between frames as a cue to tracking, while also relying on the match classifier 218 for tracking.) [Saptharishi: para. 0088])  appeared in the third video ((i.e. the rules engine 220 may trigger an alarm that is presented on the display 114 of the user interface if a human is detected in the field of view of one of the image capturing devices 102) [Saptharishi: para. 0039]; (i.e. displays an image of the first object on a display) [Saptharishi: para. 0027]).

Regarding claim 11, Saptharishi meets the claim limitations as follow. 
A non-transitory computer-readable medium (i.e. a computer-readable medium, which include storage devices) [Saptharishi: para. 0135] storing instructions, the instructions comprising (i.e. one or more software programs comprised of program instructions in source code, object code, executable code or other formats. Any of the above can be embodied in compressed or uncompressed form on a computer-readable medium, which include storage devices) [Saptharishi: para. 0135]: one or more instructions that, when executed by one or more processors of (i.e. one or more software programs comprised of program instructions in source code, object code, executable code or other formats. Any of the above can be embodied in compressed or uncompressed form on a computer-readable medium, which include storage devices) [Saptharishi: para. 0135] a person tracking apparatus (i.e. an object tracking system) [Saptharishi: para. 0015], cause the one or more processors to (i.e. wherein the processor is operable to) [Saptharishi: claim 22]:receive a first video (i.e. receives the image data and is operable to detect objects appearing in one or more of the multiple images) [Saptharishi: para 0024] captured by a first camera ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]), a second video captured by a second camera ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]), and a third video captured by a third camera ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]); and cause a display device (i.e. the rules engine 220 may trigger an alarm that is presented on the display 114 of the user interface if a human is detected in the field of view of one of the image capturing devices 102) [Saptharishi: para. 0039] to display thumbnail images (i.e. displays an image of the first object on a display) [Saptharishi: para. 0027] and the third video in a same display screen ((i.e. a video history can be created for a particular object. For instance, by selecting a human object, a user may automatically summon video clips showing where the person had been detected previously by other cameras. The user may then notice companions of the person in question, and may select those companions and view their video histories. Because metadata corresponding to the object's appearance signature is linked in the database with video data corresponding to the location where it was detected, the image itself may be used as a selectable link for searching the database) [Saptharishi: para. 0124]; (i.e. displays an image of the first object on a display) [Saptharishi: para. 0027] ; (i.e. specific categories or classes (e.g., humans, vehicles, animals) of objects are tracked) [Saptharishi: para. 0095]; (i.e. By establishing semantic links between video streams and objects detected, a video history can be created for a particular object. For instance, by selecting a human object, a user may automatically summon video clips showing where the person had been detected previously by other cameras) [Saptharishi: para. 0124]), based on one or more monitored persons appearing in the third video ((i.e. Tracking may be thought of as locating an object in each video frame or image, and establishing correspondences between moving objects across frames. Tracking may be performed within a single image capturing device 102 or across multiple image capturing devices 102. In general, the object tracking module 206 may use object motion between frames as a cue to tracking, while also relying on the match classifier 218 for tracking.) [Saptharishi: para. 0088]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras. As soon as an appearance signature of the person is detected, the system automatically directs live video data from the corresponding camera to a monitor that allows security personnel to visually track the person. As the person moves into the field of view of the next camera, the video feed is automatically switched so that it is not necessary for security personnel to switch back and forth between cameras to continue tracking the person's path) [Saptharishi: para. 0125] ; (i.e. specific categories or classes (e.g., humans, vehicles, animals) of objects are tracked) [Saptharishi: para. 0095]; (i.e. the rules engine 220 may trigger an alarm that is presented on the display 114 of the user interface if a human is detected in the field of view of one of the image capturing devices 102) [Saptharishi: para. 0039]; (i.e. displays an image of the first object on a display) [Saptharishi: para. 0027]); the thumbnail images including a first thumbnail image based on the first video captured by the first camera ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]) and a second thumbnail image based on the second video captured by the second camera which is different from the first camera ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]), the first thumbnail image being a part of the first video ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]), the second thumbnail image being a part of the second video ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]), and each of the thumbnail images indicating a person ((i.e. images of an object captured by a camera system) [Saptharishi: para 0027]; (i.e. specific categories or classes (e.g., humans, vehicles, animals) of objects are tracked) [Saptharishi: para. 0095]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras. As soon as an appearance signature of the person is detected, the system automatically directs live video data from the corresponding camera to a monitor that allows security personnel to visually track the person. As the person moves into the field of view of the next camera, the video feed is automatically switched so that it is not necessary for security personnel to switch back and forth between cameras to continue tracking the person's path) [Saptharishi: para. 0125]) determined to have a similarity to a monitored person appearing in the third video  (((i.e. The video analytics module 200 also includes a match classifier 218 connected to the object tracking module 206, the object indexing module 212, and the object search module 214. The match classifier 218 is operable to receive an input pattern z representing signatures of two objects and determine whether the signatures match (e.g., whether the signatures are sufficiently similar). The match classifier 218 may be used by the object tracking module 206, the object indexing module 212, and the object search module 214 to assist the modules with their various operations) [Saptharishi: para. 0042]; (i.e. The decision step value s(z) is correlated with the match classifier's estimate as to how similar it thinks two objects are (e.g., match confidence)) [Saptharishi: para. 0083]; (i.e. The camera system 100 may  automatically recognize pedestrians leaving a parked car, and can compare them later to people entering the car. If a person entering a car is not from the original group who arrived in the car, security personnel may be alerted. Video clips are automatically sent with the alert, so that it is easy to review and quickly determine whether there is a problem. A security guard may then opt to either inform the car owner (if the car is registered by license plate number) or summon police.) [Saptharishi: para. 0126]) (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras. As soon as an appearance signature of the person is detected, the system automatically directs live video data from the corresponding camera to a monitor that allows security personnel to visually track the person. As the person moves into the field of view of the next camera, the video feed is automatically switched so that it is not necessary for security personnel to switch back and forth between cameras to continue tracking the person's path) [Saptharishi: para. 0125]; (i.e. specific categories or classes (e.g., humans, vehicles, animals) of objects are tracked) [Saptharishi: para. 0095]; (i.e. Tracking may be thought of as locating an object in each video frame or image, and establishing correspondences between moving objects across frames. 
Saptharishi does not explicitly disclose the following claim limitations (Emphasis Added).
A non-transitory computer-readable medium storing instructions, the instructions comprising: one or more instructions that, when executed by one or more processors of a person tracking apparatus, cause the one or more processors to: receive a first video captured by a first camera, a second video captured by a second camera, and a third video captured by a third camera; and cause a display device to display thumbnail images and the third video in a same displayed screen, based on one or more monitored persons appearing in the third video, the thumbnail images including a first thumbnail image based on the first video captured by the first camera and a second thumbnail image based on the second video captured by the second camera which is different from the first camera, the first thumbnail image being a part of the first video, the second thumbnail image being a part of the second video, and each of the thumbnail images indicating a person determined to have a similarity to a monitored person appearing in the third video.   
However, in the same field of endeavor Ikeda further discloses the claim limitations and the deficient claim limitations as follows:
(i.e. displays, on the display unit 110, thumbnails or the like of images in the image data) [Ikeda: para. 0180; Fig. 3] (i.e. If thumbnails are displayed in a list, thumbnails of all images including images not yet been uploaded can be displayed on the TV within a time period a general user can tolerate. The above is one of practical solutions) [Ikeda: para. 0190 – Note: Figs. 22A-B shows more than two thumbnail images.  Hence the thumbnail images can include the first thumbnail images and the second thumbnail image] based on the first video captured by the first camera ((i.e. the image capturing device 1 (camera) causes the TV 45 to display a list of thumbnails of images) [Ikeda: para. 0199 – Note: Ikeda teaches that the thumbnail images comes from a video captured by a camera]; (i.e. the service is regarding photographs or video) [Ikeda: para. 0226]) and a second thumbnail image (i.e. If thumbnails are displayed in a list, thumbnails of all images including images not yet been uploaded can be displayed on the TV within a time period a general user can tolerate. The above is one of practical solutions) [Ikeda: para. 0190 – Note: Figs. 22A-B shows more than two thumbnail images.  Hence the thumbnail images can include the first thumbnail images and the second thumbnail image] based on the second video captured by the second camera which is different from the first camera ((i.e. the image capturing device 1 (camera) causes the TV 45 to display a list of thumbnails of images) [Ikeda: para. 0199 – Note: Ikeda teaches that the thumbnail images comes from a video captured by a camera]; (i.e. the service is regarding photographs or video) [Ikeda: para. 0226]), the first thumbnail image being a part of the first video ((i.e. the image capturing device 1 (camera) causes the TV 45 to display a list of thumbnails of images) [Ikeda: para. 0199 – Note: Ikeda teaches that the thumbnail images comes from a video captured by a camera]; (i.e. the service is regarding photographs or video) [Ikeda: para. 0226]), the second thumbnail image being a part of the second video ((i.e. the image capturing device 1 (camera) causes the TV 45 to display a list of thumbnails of images) [Ikeda: para. 0199 – Note: Ikeda teaches that the thumbnail images comes from a video captured by a camera]; (i.e. the service is regarding photographs or video) [Ikeda: para. 0226]), and each of the thumbnail images (i.e. displays, on the display unit 110, thumbnails or the like of images in the image data) [Ikeda: para. 0180; Fig. 3] indicating a person ((i.e. a person to which the user desires to show) [Ikeda: para. 0517; Figs. 5-6]; (i.e. displays, on the display unit 110, thumbnails or the like of images in the image data) [Ikeda: para. 0180; Fig. 3])  
It would have been obvious to one with an ordinary skill in the art at the time of invention to modify the teachings of Saptharishi with Ikeda to display thumbnail images on the display.  
Therefore, the combination of Saptharishi and Ikeda will shorten the upload time and reduce user’s unpleasant feeling while waiting full images being loaded [Ikeda: 0189].

Regarding claim 12, Saptharishi meets the claim limitations as set forth in claim 1. 
The non-transitory computer-readable medium according to claim 11 (i.e. a computer-readable medium, which include storage devices) [Saptharishi: para. 0135],  wherein each of the thumbnail images includes both of a head (i.e. Real-time Face Detection) [Saptharishi: para. 0048] and a part of a body of the person of a standing posture ((i.e. people are standing) [Saptharishi: para. 0130]; (i.e. a person had been standing) [Saptharishi: para. 0130]).
Saptharishi does not explicitly disclose the following claim limitations (Emphasis Added).
The non-transitory computer-readable medium according to claim 11, wherein each of the thumbnail images includes both of a head and a part of a body of the person of a standing posture.
However, in the same field of endeavor Ikeda further discloses the claim limitations and the deficient claim limitations as follows:
wherein each of the thumbnail images ((i.e. a person to which the user desires to show) [Ikeda: para. 0517; Figs. 5-6]; (i.e. displays, on the display unit 110, thumbnails or the like of images in the image data) [Ikeda: para. 0180; Fig. 3])  
It would have been obvious to one with an ordinary skill in the art at the time of invention to modify the teachings of Saptharishi with Ikeda to display thumbnail images on the display.  
Therefore, the combination of Saptharishi and Ikeda will shorten the upload time and reduce user’s unpleasant feeling while waiting full images being loaded [Ikeda: 0189].  

Regarding claim 14, Saptharishi meets the claim limitations as set forth in claim 11. Saptharishi further meets the claim limitations as follow.
The non-transitory computer-readable medium according to claim 11 (i.e. a computer-readable medium, which include storage devices) [Saptharishi: para. 0135],  wherein the one or more instructions (i.e. one or more software programs comprised of program instructions in source code, object code, executable code or other formats. Any of the above can be embodied in compressed or uncompressed form on a computer-readable medium, which include storage devices) [Saptharishi: para. 0135] further cause the one or more processors to (i.e. wherein the processor is operable to) [Saptharishi: claim 22]: store the first video and the second video (i.e. data can be stored) [Saptharishi: para. 0118] previously captured by the first camera and the second camera ((i.e. historical video from a variety of image capturing devices) [Saptharishi: para. 0116]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras) [Saptharishi: para. 0125]; (i.e. The signatures or index elements stored in the metadata database 112 may facilitate searching a large database of objects quickly for a specific object because actual pixel information from video images does not need to be reprocessed. The object search module 214 may use the same match classifier 218 used for tracking and indexing to search for a specific object. The match classifier 218, together with the signatures of objects, enable object-based searches in both historical video and real-time video feeds) [Saptharishi: para. 0110]).

Regarding claim 15, Saptharishi meets the claim limitations as set forth in claim 11. Saptharishi further meets the claim limitations as follow.
The non-transitory computer-readable medium according to claim 11 (i.e. a computer-readable medium, which include storage devices) [Saptharishi: para. 0135],  wherein the one or more instructions (i.e. one or more software programs comprised of program instructions in source code, object code, executable code or other formats. Any of the above can be embodied in compressed or uncompressed form on a computer-readable medium, which include storage devices) [Saptharishi: para. 0135] further cause the one or more processors to (i.e. wherein the processor is operable to) [Saptharishi: claim 22]:calculate a similarity between persons ((i.e. The video analytics module 200 also includes a match classifier 218 connected to the object tracking module 206, the object indexing module 212, and the object search module 214. The match classifier 218 is operable to receive an input pattern z representing signatures of two objects and determine whether the signatures match (e.g., whether the signatures are sufficiently similar). The match classifier 218 may be used by the object tracking module 206, the object indexing module 212, and the object search module 214 to assist the modules with their various operations) [Saptharishi: para. 0042]; (i.e. The decision step value s(z) is correlated with the match classifier's estimate as to how similar it thinks two objects are (e.g., match confidence)) [Saptharishi: para. 0083] ; (i.e. the object of interest is a human and the selected feature corresponds to the face of the human) [Marman: claim 34]; (i.e. The camera system 100 may  automatically recognize pedestrians leaving a parked car, and can compare them later to people entering the car. If a person entering a car is not from the original group who arrived in the car, security personnel may be alerted. Video clips are automatically sent with the alert, so that it is easy to review and quickly determine whether there is a problem. A security guard may then opt to either inform the car owner (if the car is registered by license plate number) or summon police.) [Saptharishi: para. 0126]) indicated by each of the thumbnail images and the one or more monitored persons ((i.e. The video analytics module 200 also includes a match classifier 218 connected to the object tracking module 206, the object indexing module 212, and the object search module 214. The match classifier 218 is operable to receive an input pattern z representing signatures of two objects and determine whether the signatures match (e.g., whether the signatures are sufficiently similar). The match classifier 218 may be used by the object tracking module 206, the object indexing module 212, and the object search module 214 to assist the modules with their various operations)) [Saptharishi: para. 0042]; (i.e. specific categories or classes (e.g., humans, vehicles, animals) of objects are tracked) [Saptharishi: para. 0095] ; (i.e. the object of interest is a human and the selected feature corresponds to the face of the human) [Marman: claim 34]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras. As soon as an appearance signature of the person is detected, the system automatically directs live video data from the corresponding camera to a monitor that allows security personnel to visually track the person. As the person moves into the field of view of the next camera, the video feed is automatically switched so that it is not necessary for security personnel to switch back and forth between cameras to continue tracking the person's path) [Saptharishi: para. 0125] ; (i.e. Tracking may be thought of as locating an object in each video frame or image, and establishing correspondences between moving objects across frames. Tracking may be performed within a single image capturing device 102 or across multiple image capturing devices 102. In general, the object tracking module 206 may use object motion between frames as a cue to tracking, while also relying on the match classifier 218 for tracking.) [Saptharishi: para. 0088])  appeared in the third video ((i.e. the rules engine 220 may trigger an alarm that is presented on the display 114 of the user interface if a human is detected in the field of view of one of the image capturing devices 102) [Saptharishi: para. 0039]; (i.e. displays an image of the first object on a display) [Saptharishi: para. 0027]).

Regarding claim 16, Saptharishi meets the claim limitations as set forth in claim 1. Saptharishi further meets the claim limitations as follow.
The person tracking system according to claim 1 (i.e. an object tracking system) [Saptharishi: para. 0015],  wherein each of the thumbnail images indicates the person determined to have a similarity value ((i.e. The video analytics module 200 also includes a match classifier 218 connected to the object tracking module 206, the object indexing module 212, and the object search module 214. The match classifier 218 is operable to receive an input pattern z representing signatures of two objects and determine whether the signatures match (e.g., whether the signatures are sufficiently similar). The match classifier 218 may be used by the object tracking module 206, the object indexing module 212, and the object search module 214 to assist the modules with their various operations) [Saptharishi: para. 0042]; ; (i.e. the object of interest is a human and the selected feature corresponds to the face of the human) [Marman: claim 34];  (i.e. Tracking may be thought of as locating an object in each video frame or image, and establishing correspondences between moving objects across frames. Tracking may be performed within a single image capturing device 102 or across multiple image capturing devices 102. In general, the object tracking module 206 may use object motion between frames as a cue to tracking, while also relying on the match classifier 218 for tracking.) [Saptharishi: para. 0088]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras. As soon as an appearance signature of the person is detected, the system automatically directs live video data from the corresponding camera to a monitor that allows security personnel to visually track the person. As the person moves into the field of view of the next camera, the video feed is automatically switched so that it is not necessary for security personnel to switch back and forth between cameras to continue tracking the person's path) [Saptharishi: para. 0125])  that is greater than a threshold value ((i.e. The output of the match classifier 218 may correspond to a decision step value s(z) as described below. The decision step value s(z) may indicate whether the first and second object match, and may include a value corresponding to a confidence level in its decision) [Saptharishi: para. 0046]; (i.e. The decision step value s(z) is correlated with the match classifier's estimate as to how similar it thinks two objects are (e.g., match confidence)) [Saptharishi: para. 0083]; (i.e. The decision step value is compared (represented by block S06) to one or both of an acceptance threshold "τα and a rejection threshold "τr to determine whether two objects match, to reject the objects as a match) [Saptharishi: para. 0053]; (i.e. that threshold may be larger than a threshold) [Saptharishi: para. 0034]).
Saptharishi does not explicitly disclose the following claim limitations (Emphasis Added).
The person tracking apparatus according to claim 1, wherein each of the thumbnail images indicates the person determined to have a similarity value that is greater than a threshold value.
However, in the same field of endeavor Ikeda further discloses the claim limitations and the deficient claim limitations as follows:
wherein each of the thumbnail images ((i.e. a person to which the user desires to show) [Ikeda: para. 0517; Figs. 5-6]; (i.e. displays, on the display unit 110, thumbnails or the like of images in the image data) [Ikeda: para. 0180; Fig. 3])  
It would have been obvious to one with an ordinary skill in the art at the time of invention to modify the teachings of Saptharishi with Ikeda to display thumbnail images on the display.  
Therefore, the combination of Saptharishi and Ikeda will shorten the upload time and reduce user’s unpleasant feeling while waiting full images being loaded [Ikeda: 0189].  

Regarding claim 17, Saptharishi meets the claim limitations as set forth in claim 1. Saptharishi further meets the claim limitations as follow.
The person tracking system according to claim 1 (i.e. an object tracking system) [Saptharishi: para. 0015],  wherein each of the thumbnail images indicates the person determined to have a similarity value ((i.e. The video analytics module 200 also includes a match classifier 218 connected to the object tracking module 206, the object indexing module 212, and the object search module 214. The match classifier 218 is operable to receive an input pattern z representing signatures of two objects and determine whether the signatures match (e.g., whether the signatures are sufficiently similar). The match classifier 218 may be used by the object tracking module 206, the object indexing module 212, and the object search module 214 to assist the modules with their various operations) [Saptharishi: para. 0042]; ; (i.e. the object of interest is a human and the selected feature corresponds to the face of the human) [Marman: claim 34];  (i.e. Tracking may be thought of as locating an object in each video frame or image, and establishing correspondences between moving objects across frames. Tracking may be performed within a single image capturing device 102 or across multiple image capturing devices 102. In general, the object tracking module 206 may use object motion between frames as a cue to tracking, while also relying on the match classifier 218 for tracking.) [Saptharishi: para. 0088]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras. As soon as an appearance signature of the person is detected, the system automatically directs live video data from the corresponding camera to a monitor that allows security personnel to visually track the person. As the person moves into the field of view of the next camera, the video feed is automatically switched so that it is not necessary for security personnel to switch back and forth between cameras to continue tracking the person's path) [Saptharishi: para. 0125])  that is greater than a threshold value ((i.e. The decision step value is compared (represented by block S06) to one or both of an acceptance threshold "τα and a rejection threshold "τr to determine whether two objects match, to reject the objects as a match) [Saptharishi: para. 0053]; (i.e. that threshold may be larger than a threshold) [Saptharishi: para. 0034]), and wherein the similarity value is calculated based on similarity to the monitored person ((i.e. By establishing semantic links between video streams and objects detected, a video history can be created for a particular object. For instance, by selecting a human object, a user may automatically summon video clips showing where the person had been detected previously by other cameras) [Saptharishi: para. 0124] ; (i.e. the object of interest is a human and the selected feature corresponds to the face of the human) [Marman: claim 34]; (i.e. The output of the match classifier 218 may correspond to a decision step value s(z) as described below. The decision step value s(z) may indicate whether the first and second object match, and may include a value corresponding to a confidence level in its decision) [Saptharishi: para. 0046]; (i.e. The decision step value s(z) is correlated with the match classifier's estimate as to how similar it thinks two objects are (e.g., match confidence)) [Saptharishi: para. 0083]; (i.e. The decision step value is compared (represented by block S06) to one or both of an acceptance threshold "τα and a rejection threshold "τr to determine whether two objects match, to reject the objects as a match) [Saptharishi: para. 0053]; (i.e. that threshold may be larger than a threshold) [Saptharishi: para. 0034]).  
Saptharishi does not explicitly disclose the following claim limitations (Emphasis Added).
The person tracking apparatus according to claim 1, wherein each of the thumbnail images indicates the person determined to have a similarity value that is greater than a threshold value, and wherein the similarity value is calculated based on similarity to the monitored person.  
However, in the same field of endeavor Ikeda further discloses the claim limitations and the deficient claim limitations as follows:
wherein each of the thumbnail images ((i.e. a person to which the user desires to show) [Ikeda: para. 0517; Figs. 5-6]; (i.e. displays, on the display unit 110, thumbnails or the like of images in the image data) [Ikeda: para. 0180; Fig. 3])  
It would have been obvious to one with an ordinary skill in the art at the time of invention to modify the teachings of Saptharishi with Ikeda to display thumbnail images on the display.  
Therefore, the combination of Saptharishi and Ikeda will shorten the upload time and reduce user’s unpleasant feeling while waiting full images being loaded [Ikeda: 0189].  

Regarding claim 18, Saptharishi meets the claim limitations as set forth in claim 6. Saptharishi further meets the claim limitations as follow.
The method according to claim 6 (i.e. methods) [Saptharishi: para. 0120],  wherein each of the thumbnail images indicates the person determined to have a similarity value ((i.e. The video analytics module 200 also includes a match classifier 218 connected to the object tracking module 206, the object indexing module 212, and the object search module 214. The match classifier 218 is operable to receive an input pattern z representing signatures of two objects and determine whether the signatures match (e.g., whether the signatures are sufficiently similar). The match classifier 218 may be used by the object tracking module 206, the object indexing module 212, and the object search module 214 to assist the modules with their various operations) [Saptharishi: para. 0042]; ; (i.e. the object of interest is a human and the selected feature corresponds to the face of the human) [Marman: claim 34];  (i.e. Tracking may be thought of as locating an object in each video frame or image, and establishing correspondences between moving objects across frames. Tracking may be performed within a single image capturing device 102 or across multiple image capturing devices 102. In general, the object tracking module 206 may use object motion between frames as a cue to tracking, while also relying on the match classifier 218 for tracking.) [Saptharishi: para. 0088]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras. As soon as an appearance signature of the person is detected, the system automatically directs live video data from the corresponding camera to a monitor that allows security personnel to visually track the person. As the person moves into the field of view of the next camera, the video feed is automatically switched so that it is not necessary for security personnel to switch back and forth between cameras to continue tracking the person's path) [Saptharishi: para. 0125])  that is greater than a threshold value ((i.e. The output of the match classifier 218 may correspond to a decision step value s(z) as described below. The decision step value s(z) may indicate whether the first and second object match, and may include a value corresponding to a confidence level in its decision) [Saptharishi: para. 0046]; (i.e. The decision step value s(z) is correlated with the match classifier's estimate as to how similar it thinks two objects are (e.g., match confidence)) [Saptharishi: para. 0083]; (i.e. The decision step value is compared (represented by block S06) to one or both of an acceptance threshold "τα and a rejection threshold "τr to determine whether two objects match, to reject the objects as a match) [Saptharishi: para. 0053]; (i.e. that threshold may be larger than a threshold) [Saptharishi: para. 0034]).
Saptharishi does not explicitly disclose the following claim limitations (Emphasis Added).
The method according to claim 6,  wherein each of the thumbnail images indicates the person determined to have a similarity value that is greater than a threshold value.
However, in the same field of endeavor Ikeda further discloses the claim limitations and the deficient claim limitations as follows:
wherein each of the thumbnail images ((i.e. a person to which the user desires to show) [Ikeda: para. 0517; Figs. 5-6]; (i.e. displays, on the display unit 110, thumbnails or the like of images in the image data) [Ikeda: para. 0180; Fig. 3])  
It would have been obvious to one with an ordinary skill in the art at the time of invention to modify the teachings of Saptharishi with Ikeda to display thumbnail images on the display.  
Therefore, the combination of Saptharishi and Ikeda will shorten the upload time and reduce user’s unpleasant feeling while waiting full images being loaded [Ikeda: 0189].

Regarding claim 19, Saptharishi meets the claim limitations as set forth in claim 11. Saptharishi further meets the claim limitations as follow.
The non-transitory computer-readable medium according to claim 11 (i.e. a computer-readable medium, which include storage devices) [Saptharishi: para. 0135], wherein each of the thumbnail images indicates the person determined to have a similarity value ((i.e. The video analytics module 200 also includes a match classifier 218 connected to the object tracking module 206, the object indexing module 212, and the object search module 214. The match classifier 218 is operable to receive an input pattern z representing signatures of two objects and determine whether the signatures match (e.g., whether the signatures are sufficiently similar). The match classifier 218 may be used by the object tracking module 206, the object indexing module 212, and the object search module 214 to assist the modules with their various operations) [Saptharishi: para. 0042]; ; (i.e. the object of interest is a human and the selected feature corresponds to the face of the human) [Marman: claim 34];  (i.e. Tracking may be thought of as locating an object in each video frame or image, and establishing correspondences between moving objects across frames. Tracking may be performed within a single image capturing device 102 or across multiple image capturing devices 102. In general, the object tracking module 206 may use object motion between frames as a cue to tracking, while also relying on the match classifier 218 for tracking.) [Saptharishi: para. 0088]; (i.e. an individual person may be followed through a casino monitored by dozens of cameras with adjacent, overlapping fields of view, by just clicking on an image and instructing the system to track the image across all cameras. As soon as an appearance signature of the person is detected, the system automatically directs live video data from the corresponding camera to a monitor that allows security personnel to visually track the person. As the person moves into the field of view of the next camera, the video feed is automatically switched so that it is not necessary for security personnel to switch back and forth between cameras to continue tracking the person's path) [Saptharishi: para. 0125])  that is greater than a threshold value ((i.e. The output of the match classifier 218 may correspond to a decision step value s(z) as described below. The decision step value s(z) may indicate whether the first and second object match, and may include a value corresponding to a confidence level in its decision) [Saptharishi: para. 0046]; (i.e. The decision step value s(z) is correlated with the match classifier's estimate as to how similar it thinks two objects are (e.g., match confidence)) [Saptharishi: para. 0083]; (i.e. The decision step value is compared (represented by block S06) to one or both of an acceptance threshold "τα and a rejection threshold "τr to determine whether two objects match, to reject the objects as a match) [Saptharishi: para. 0053]; (i.e. that threshold may be larger than a threshold) [Saptharishi: para. 0034]).
Saptharishi does not explicitly disclose the following claim limitations (Emphasis Added).
The non-transitory computer-readable medium according to claim 11,  wherein each of the thumbnail images indicates the person determined to have a similarity value that is greater than a threshold value.
However, in the same field of endeavor Ikeda further discloses the claim limitations and the deficient claim limitations as follows:
wherein each of the thumbnail images ((i.e. a person to which the user desires to show) [Ikeda: para. 0517; Figs. 5-6]; (i.e. displays, on the display unit 110, thumbnails or the like of images in the image data) [Ikeda: para. 0180; Fig. 3])  
It would have been obvious to one with an ordinary skill in the art at the time of invention to modify the teachings of Saptharishi with Ikeda to display thumbnail images on the display.  
Therefore, the combination of Saptharishi and Ikeda will shorten the upload time and reduce user’s unpleasant feeling while waiting full images being loaded [Ikeda: 0189].  

Claims 3, 8, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Saptharishi (US Patent Application Publication 2009/0245573 A1), (“Saptharishi”), in view of Ikeda et al. (US Patent Application Publication 2011/0007901 A1), (“Ikeda”), in view of Marman et al. (US Patent Application Publication 2012/0062732 A1), (“Marman”).
Regarding claim 3, Saptharishi meets the claim limitations as set forth in claim 1. 
The person tracking apparatus according to claim 2 ((i.e. an object tracking system) [Saptharishi: para. 0015]; (i.e. By establishing semantic links between video streams and objects detected, a video history can be created for a particular object. For instance, by selecting a human object, a user may automatically summon video clips showing where the person had been detected previously by other cameras) [Saptharishi: para. 0124]), wherein each of the thumbnail images is longer in a vertical direction than as compared to a horizontal direction.
Saptharishi does not explicitly disclose the following claim limitations (Emphasis Added).
The person tracking apparatus according to claim 2, wherein each of the thumbnail images is longer in a vertical direction than as compared to a horizontal direction.
However, in the same field of endeavor Ikeda further discloses the claim limitations and the deficient claim limitations as follows:
wherein each of the thumbnail images ((i.e. a person to which the user desires to show) [Ikeda: para. 0517; Figs. 5-6]; (i.e. displays, on the display unit 110, thumbnails or the like of images in the image data) [Ikeda: para. 0180; Fig. 3]) is longer in a vertical direction than as compared to a horizontal direction  (i.e. Note: Figs. 22B shows the vertical direction of the thumbnail image is longer than the horizontal direction) [Ikeda: Figs. 22B].    
It would have been obvious to one with an ordinary skill in the art at the time of invention to modify the teachings of Saptharishi with Ikeda to display thumbnail images on the display.  
Therefore, the combination of Saptharishi and Ikeda will shorten the upload time and reduce user’s unpleasant feeling while waiting full images being loaded [Ikeda: 0189].  
In the same field of endeavor Marman further discloses the claim limitations and the deficient claim limitations, as follows:
wherein each of the thumbnail images (i.e. FIG. 9 includes images 800 and 810 and additional photographic images 930, 940, 950, and 960 to demonstrate a combination of specific object zoom and group zoom that may be implemented by display management module 340) [Marman: para. 0068; Note: Fig. 9 shows a row of thumbnail images that has the same size and the same shape in a window]) is longer in a vertical direction than as compared to a horizontal direction (i.e. FIG. 9 includes images 800 and 810 and additional photographic images 930, 940, 950, and 960 to demonstrate a combination of specific object zoom and group zoom that may be implemented by display management module 340) [Marman: para. 0068; Note: Fig. 9 shows the height of the region is longer than a half of the width of the thumbnail image].    
It would have been obvious to one with an ordinary skill in the art at the time of invention to modify the teachings of Saptharishi and Ikeda with Marman to display multiple thumbnail images of the same person in different orientations.  
Therefore, the combination of Saptharishi and Ikeda with Marman will enable users to recognize and tracking a person of interest [Marman: para. 0068].

Regarding claim 8, Saptharishi meets the claim limitations as set forth in claim 7. 
The method according to claim 7 (i.e. methods) [Saptharishi: para. 0120], wherein each of the thumbnail images is longer in a vertical direction than as compared to a horizontal direction.
Saptharishi does not explicitly disclose the following claim limitations (Emphasis Added).
The method according to claim 7, wherein each of the thumbnail images is longer in a vertical direction than as compared to a horizontal direction.
However, in the same field of endeavor Ikeda further discloses the claim limitations and the deficient claim limitations as follows:
wherein each of the thumbnail images ((i.e. a person to which the user desires to show) [Ikeda: para. 0517; Figs. 5-6]; (i.e. displays, on the display unit 110, thumbnails or the like of images in the image data) [Ikeda: para. 0180; Fig. 3]) is longer in a vertical direction than as compared to a horizontal direction  (i.e. Note: Figs. 22B shows the vertical direction of the thumbnail image is longer than the horizontal direction) [Ikeda: Figs. 22B].    
It would have been obvious to one with an ordinary skill in the art at the time of invention to modify the teachings of Saptharishi with Ikeda to display thumbnail images on the display.  
Therefore, the combination of Saptharishi and Ikeda will shorten the upload time and reduce user’s unpleasant feeling while waiting full images being loaded [Ikeda: 0189].  
In the same field of endeavor Marman further discloses the claim limitations and the deficient claim limitations, as follows:
wherein each of the thumbnail images (i.e. FIG. 9 includes images 800 and 810 and additional photographic images 930, 940, 950, and 960 to demonstrate a combination of specific object zoom and group zoom that may be implemented by display management module 340) [Marman: para. 0068; Note: Fig. 9 shows a row of thumbnail images that has the same size and the same shape in a window]) is longer in a vertical direction than as compared to a horizontal direction (i.e. FIG. 9 includes images 800 and 810 and additional photographic images 930, 940, 950, and 960 to demonstrate a combination of specific object zoom and group zoom that may be implemented by display management module 340) [Marman: para. 0068; Note: Fig. 9 shows the height of the region is longer than a half of the width of the thumbnail image].    
It would have been obvious to one with an ordinary skill in the art at the time of invention to modify the teachings of Saptharishi and Ikeda with Marman to display multiple thumbnail images of the same person in different orientations.  
Therefore, the combination of Saptharishi and Ikeda with Marman will enable users to recognize and tracking a person of interest [Marman: para. 0068].

Regarding claim 13, Saptharishi meets the claim limitations as set forth in claim 12. 
The non-transitory computer-readable medium according to claim 12 (i.e. a computer-readable medium, which include storage devices) [Saptharishi: para. 0135],  wherein each of the thumbnail images is longer in a vertical direction than as compared to a horizontal direction.
Saptharishi does not explicitly disclose the following claim limitations (Emphasis Added).
The non-transitory computer-readable medium according to claim 12, wherein each of the thumbnail images is longer in a vertical direction than as compared to a horizontal direction.
However, in the same field of endeavor Ikeda further discloses the claim limitations and the deficient claim limitations as follows:
wherein each of the thumbnail images ((i.e. a person to which the user desires to show) [Ikeda: para. 0517; Figs. 5-6]; (i.e. displays, on the display unit 110, thumbnails or the like of images in the image data) [Ikeda: para. 0180; Fig. 3]) is longer in a vertical direction than as compared to a horizontal direction  (i.e. Note: Figs. 22B shows the vertical direction of the thumbnail image is longer than the horizontal direction) [Ikeda: Figs. 22B].    
It would have been obvious to one with an ordinary skill in the art at the time of invention to modify the teachings of Saptharishi with Ikeda to display thumbnail images on the display.  
Therefore, the combination of Saptharishi and Ikeda will shorten the upload time and reduce user’s unpleasant feeling while waiting full images being loaded [Ikeda: 0189].  
In the same field of endeavor Marman further discloses the claim limitations and the deficient claim limitations, as follows:
wherein each of the thumbnail images (i.e. FIG. 9 includes images 800 and 810 and additional photographic images 930, 940, 950, and 960 to demonstrate a combination of specific object zoom and group zoom that may be implemented by display management module 340) [Marman: para. 0068; Note: Fig. 9 shows a row of thumbnail images that has the same size and the same shape in a window]) is longer in a vertical direction than as compared to a horizontal direction (i.e. FIG. 9 includes images 800 and 810 and additional photographic images 930, 940, 950, and 960 to demonstrate a combination of specific object zoom and group zoom that may be implemented by display management module 340) [Marman: para. 0068; Note: Fig. 9 shows the height of the region is longer than a half of the width of the thumbnail image].    
It would have been obvious to one with an ordinary skill in the art at the time of invention to modify the teachings of Saptharishi and Ikeda with Marman to display multiple thumbnail images of the same person in different orientations.  
Therefore, the combination of Saptharishi and Ikeda with Marman will enable users to recognize and tracking a person of interest [Marman: para. 0068].

   Reference Notice 
Additional prior arts, included in the Notice of Reference Cited, made of record and not relied upon is considered pertinent to applicant's disclosure.

Contact Information

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Philip Dang whose telephone number is (408) 918-7529.  The examiner can normally be reached on Monday-Thursday between 8:30 am - 5:00 pm (PST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sath Perungavoor can be reached on 571-272-7455.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
/Philip P. Dang/Primary Examiner, Art Unit 2488