Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
CLAIM INTERPRETATION
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
“a face pair module”, “a multiple face tracking module” and “a fine tuning module” in claim 1, 
“a face tracklet module” in claim 6, 
“a face tracklet module”, “a face pair module” and “a multiple face tracking module” in claim 7, and 
“a fine tuning module” in claim 9.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Examiner's Amendment/Statement
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in a telephone interview with Michael Dodd on 9/6/2022 at 11:25am PT. See interview summary for details.
The application has been amended as follows:
In claim 1, line 5, delete “a positive face and a negative face” and insert in its place –positive and negative face pairs -.
In claim 7, lines 9-10, delete “a positive face and a negative face” and insert in its place –positive and negative face pairs -.
In claim 13, lines 7-8, delete “a positive face and a negative face” and insert in its place –positive and negative face pairs -.

Allowable Subject Matter
Claims 1-20 are allowed subject to the above examiner’s amendment and terminal disclaimer filed on 9/6/2022.
The following is an examiner’s statement of reasons for allowance:
The prior art fails to teach Claims 1-6, alone or in reasonable combination, which specifically comprise the following limitations (in consideration of the claim as a whole):  
a multiple face tracking module configured to receive the face pairs from the face pair module and construct a trajectory model for an identified human face; and 
a fine tuning module connected between the neural network and the multiple face tracking module and configured to adaptively extract discriminative face features of the identified human face. 
The prior art fails to teach Claims 7-12, alone or in reasonable combination, which specifically comprise the following limitations (in consideration of the claim as a whole):  
a face tracklet module configured to: … generate spatio-temporal constraints indicative of: faces in the face tracklet being the same person and faces in different positions in the frame being different persons;
the face pair module connected to a neural network and configured to generate face pairs from the spatio-temporal constraints including positive and negative face pairs; and 
a multiple face tracking module configured to receive face pairs from the face pair module and construct a trajectory model for an identified human face.. 
The prior art fails to teach Claims 13-20, alone or in reasonable combination, which specifically comprise the following limitations (in consideration of the claim as a whole):  
generating spatio-temporal constraints indicative of: faces in the face tracklet being the same person and faces in different positions in the frame being different persons; 
deriving face pairs from the spatio-temporal constraints including a positive face and a negative face; and 
constructing a trajectory model for an identified human face. 
The closest prior art, Chen et al. ("An End-to-End System for Unconstrained Face Verification with Deep Convolutional Neural Networks," 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), 2015, pp. 360-368) reveals a similar technique and system for detecting and tracking human faces and verifying human faces with positive and negative face pairs (see following Fig. 1), but fails to anticipate or render obvious, either singularly or in combination with the other cited references, the above limitations (as combined with the other claimed limitations).

    PNG
    media_image1.png
    734
    1212
    media_image1.png
    Greyscale

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Chen et al., "An equalised global graphical model-based approach for multi-camera object tracking." arXiv preprint arXiv:1502.03532 (2015).

    PNG
    media_image2.png
    373
    592
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    356
    594
    media_image3.png
    Greyscale


    PNG
    media_image4.png
    277
    588
    media_image4.png
    Greyscale

Li et al., "Deeptrack: Learning discriminative feature representations online for robust visual tracking." IEEE Transactions on Image Processing 25, no. 4 (2016): 1834-1848.

    PNG
    media_image5.png
    786
    755
    media_image5.png
    Greyscale


    PNG
    media_image6.png
    549
    1516
    media_image6.png
    Greyscale

Ahmed et al., "An improved deep learning architecture for person re-identification." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3908-3916. 2015.

    PNG
    media_image7.png
    713
    730
    media_image7.png
    Greyscale


    PNG
    media_image8.png
    865
    1165
    media_image8.png
    Greyscale

Ge, “Multi-target Data Association by Tracklets with Unsupervised Parameter”, Estimation, 2008, pp. 1-10.

    PNG
    media_image9.png
    265
    987
    media_image9.png
    Greyscale


    PNG
    media_image10.png
    522
    1117
    media_image10.png
    Greyscale

Zhou et al., "Random field topic model for semantic region analysis in crowded scenes from tracklets." In CVPR 2011, pp. 3441-3448. IEEE, 2011.

    PNG
    media_image11.png
    736
    751
    media_image11.png
    Greyscale


    PNG
    media_image12.png
    782
    592
    media_image12.png
    Greyscale


Wojek et al., "Monocular visual scene understanding: Understanding multi-object traffic scenes." IEEE transactions on pattern analysis and machine intelligence 35, no. 4 (2012): 882-897.

    PNG
    media_image13.png
    228
    912
    media_image13.png
    Greyscale


    PNG
    media_image14.png
    445
    507
    media_image14.png
    Greyscale

Latapie et al. (US 20180033024 A1): In one embodiment, a method includes obtaining a plurality of tracklets, each of the plurality of tracklets including tracklet data representing a position of a respective one of a plurality of people at a plurality of times. The method includes generating a behavioral analytic metric based on the plurality of tracklets. The method includes generating a notification in response to determining that the behavioral analytic metric is greater than a threshold. (abstract)
Chen et al. (US 20170213089 A1): Solutions for object tracking problems are presented by gathering images using one or more cameras, processing the gathered images to generate a directed acyclic graph, using the directed acyclic graph to determine a path cover that achieves maximum weight and satisfies one or more positive or negative constraints, and using the path cover to solve the object tracking problem. A first set of solutions utilizes trellis graphs, a second set of solutions employs a greedy approach, and a third set of solutions uses search algorithms. (abstract)
Nevatia et al. (US 20110085702 A1): [0021] For tracking objects in image frames, such as obtained from a video clip, embodiments of the present disclosure can provide a low (first or initial) level of association, in which reliable tracklets are generated by linking detection responses in consecutive frames. A conservative two-threshold strategy can be used to prevent "unsafe" associations until more evidence is collected to reduce the ambiguity at higher levels of association, as described below.

    PNG
    media_image15.png
    129
    513
    media_image15.png
    Greyscale


    PNG
    media_image16.png
    201
    381
    media_image16.png
    Greyscale

Wang et al. (US 20170351905 A1): [0047] Even if data available for a task is limited, the deep neural network 250 is configured to generate enough training data as the deep neural network 250 is a Siamese deep neural network. For example, for an individual/object captured in an input image, the deep neural network 250 may generate a corresponding set of positive sample pairs and a corresponding set of negative sample pairs. For face verification, a positive sample pair may comprise a pair of facial images of the same individual/object, and a negative sample pair may comprise a pair of facial images of different individuals/objects.

    PNG
    media_image17.png
    225
    528
    media_image17.png
    Greyscale

Medioni et al. (US 8391548 B1): Tracking multiple targets can include making different observations based on multiple different frames of one or more digital video feeds, determining an initial cover based on the observations, performing one or more modifications to the initial cover to generate a final cover, and using the final cover to track multiple targets in the one or more digital video feeds. Performing one or more modifications to generate a final cover can include selecting one or more adjustments from a group that includes temporal cover adjustments and spatial cover adjustments, and can include using likelihood information indicative of similarities in motion and appearance to distinguish different targets in the frames. (abstract)

    PNG
    media_image18.png
    566
    500
    media_image18.png
    Greyscale


Schroeder et al. (US 20170161919 A1): [0039] It is desirable for an appearance based relocalization system generally to be invariant to changes in viewpoint, illumination, and scale. The deep metric learning network described above is suited to solving the problem of appearance-invariant relocalization. In one embodiment, the triplet convolutional neural network model embeds an image into a lower dimensional space where the system can measure meaningful distances between images. Through the careful selection of triplets, consisting of three images that form an anchor-positive pair of similar images and an anchor-negative pair of dissimilar images, the convolutional neural network can be trained for a variety of locations, including changing locations. 
Fazl Ersi et al. (US 20160379043 A1): [0055] In another particular embodiment, the representations of a set of training positive and negative pairs of faces (where a pair of faces is positive when the two faces belong to the same individual, and is negative, when the two faces belong to different individuals) are used to learn a model for distinguishing between similar and different faces, using the Support Vector Machine (SVM) learning method. The learned model is then used to compare the representation of a probe face against those of the gallery faces to recognize the identity of the probe face.
Roshtkhari et al. (US 20160335502 A1): A system and method are provided for tracking objects in a scene from a sequence of images captured by an imaging device. The method includes processing the sequence of images to generate sequential images at a plurality of hierarchical levels to generate a set of regions of interest; and, at each of the hierarchical levels: examining pairs of sequential images to link pixels into short tracklets; and grouping short tracklets that indicate similar motion patterns to generate representative tracklets. The representative tracklets are grouped to generate a tracking result for at least one object. (abstract)

    PNG
    media_image19.png
    331
    692
    media_image19.png
    Greyscale

Maggio et al. (US 20160019700 A1): A method for tracking a target in a sequence of images comprises: a step of detecting objects, a temporal association step, aiming to associate the objects detected in the current image with the objects detected in the previous image based on their respective positions, a step of determining a second target in the current image according to a search area determined for a previous image of the sequence, a step of determining the detected object that best corresponds to a dynamic of a final target, a step of updating the search area for the current image based on the position of the target that best corresponds to the dynamics of the final target, and a step of searching for the final target in the search area for the current image by comparing areas of the current image with a reference model representative of the final target. (abstract)

    PNG
    media_image20.png
    383
    637
    media_image20.png
    Greyscale

Shiratani (US 20190034800 A1): [0053] Then, based on the CNN feature data that has been input from the CNN 50 and based on the second correct answer label that has been input from the capsule endoscope image database 3, the SVM 52 determines a support vector coefficient such that a margin of a discrimination boundary between the positive example and the negative example becomes the maximum (Step S110). After the process at Step S110, the image recognition device 4 ends the process.
Ben Shitrit et al. (WO 2013072401 A2): A method for continuously tracking multiple people partitioned into groups while preserving identities under global appearance constraints, wherein people's trajectories may intersect, and wherein only sparse appearance information is available is disclosed. Individual trajectories for each group identity are obtained by solving a layered tracklet-based multi-commodity f low (MCNF) programming problem, wherein tracklets are connected parts of splitted trajectories, wherein each trajectory is split at posit ions which are in the neighborhood of another, wherein said neighborhood encompasses locations within a predefined distance. (abstract)

    PNG
    media_image21.png
    489
    491
    media_image21.png
    Greyscale

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FENG NIU whose telephone number is (571)272-9592.  The examiner can normally be reached on Monday - Friday, 8am-5pm PT.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached on (571) 272-7409.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/FENG NIU/Primary Examiner, Art Unit 2669