Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 4/22/2022 has been entered.
 
Response to Arguments
Applicant’s arguments submitted on 4/22/2022 have been fully considered, but are not persuasive.  Applicant argues that the prior art does not disclose all the claim limitations of the independent claims, specifically that Togashi does not disclose only generating a bounding box for detected faces.  Examiner respectfully disagrees.  
Previously cited Li discloses in a computer system having at least one processor and computer memory (see Li Abstract, and col. 7. ll. 44, and col. 7, ll. 50, and col. 8, ll. 27, and col. 8, ll. 67 to col. 9, ll. 3, where a computer programmed with computer code is used), a method for identifying individuals within a video comprising: accessing, from the computer memory, a video comprising a plurality of digital frames having an original capture resolution and capturing the movement of one or more unidentified individuals over a period of time (see Li col. 8, ll. 4-22, where “[a]s an additional example, the computer 102 requests a video clip, containing a set (meaning one or more) of frames or still images, from the web server 114”); dividing, in the computer system, the plurality of frames into a set of segments, wherein each segment digitally describes a part of a frame of the video (see Li col. 10, ll. 32-37, where “[i]n one implementation, the facial portion (also referred to herein as a facial window) is a rectangular area. In a further implementation, the facial window has a fixed size, such as 100x100 pixels, for different faces of different people”); and executing, in the computer system, a recognition algorithm configured to extract a feature vector representative of the detected face in the recognition bounding box wherein the feature vector is configured to be compared to a feature vector of a target individual's face (see Li col., ll. 25-38, where “[a]t 1308, for each facial feature in the detected face, the Software application calculates a matching score for each position (m, n) using the facial feature probability and each of the convolution values of the corresponding LBP feature templates”).
Previously cited Togashi discloses in the computer system, adjusting pixel resolution of each segment to a detection resolution; applying, in the computer system, a detection algorithm configured to detect a face of one or more unidentified individuals within the segment; in at least some segments, generating a detection bounding box at the detection resolution around each of at least a plurality of the detected faces, each bounding box having a plurality of vertices; for each of at least a plurality of the detection bounding boxes, generating a recognition bounding box by mapping each vertex of the detection bounding box from their relative locations within a segment to their proportionally equivalent locations in the original frame of the video, whereby the recognition bounding box is proportionately larger than the detection bounding box (see Togashi Fig. 1, and paras. 0091-0101, where the face is detected in the downsampled image, then mapped back to the original resolution – with proportional vertices on the bounding boxes – before normalization and recognition by the face recognizer).
It would have been obvious to one of ordinary skill in the art at the time of filing to use the face detection technique of Togashi on the images of Li, because it is predictable that the additional searching with additional face detector resolutions tuned at various expected resolutions would predictably improve the accuracy of face detection thereby improving overall face recognition by reducing the number of faces that are not detected.  It is predictable that performing face detection at more image resolutions will catch faces that the other resolutions would miss, thereby improving the overall robustness of the face detection and recognition. 
Specifically, Fig. 1 of Togashi shows a face detected in the lower resolution IMAGEx1/2.  A 24x24 window is used during face detection.  That window becomes a detection bounding box 24x24 only after a face is detected in the window.  If a face is not detected, the window is not a bounding box because nothing was detected inside the window.  After detection, a recognition bounding box 48x48 is determined at the same detection bounding box “corresponding area” with the same vertices but in the higher resolution IMAGEx1 (see Togashi para. 0098).  The resolution is then normalized to 60x66 and face recognition is performed.   For these reasons, the prior art at least renders obvious the independent claims.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim(s) 1, 2, 7, 16, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Li et al., US 9,275,269 B1 (hereinafter referred to as “Li”) in view of Togashi, US 2008/0144941 A1 (hereinafter referred to as “Togashi”).

Regarding claim 1, Li discloses in a computer system having at least one processor and computer memory (see Li Abstract, and col. 7. ll. 44, and col. 7, ll. 50, and col. 8, ll. 27, and col. 8, ll. 67 to col. 9, ll. 3, where a computer programmed with computer code is used), a method for identifying individuals within a video comprising: accessing, from the computer memory, a video comprising a plurality of digital frames having an original capture resolution and capturing the movement of one or more unidentified individuals over a period of time (see Li col. 8, ll. 4-22, where “[a]s an additional example, the computer 102 requests a video clip, containing a set (meaning one or more) of frames or still images, from the web server 114”); dividing, in the computer system, the plurality of frames into a set of segments, wherein each segment digitally describes a part of a frame of the video (see Li col. 10, ll. 32-37, where “[i]n one implementation, the facial portion (also referred to herein as a facial window) is a rectangular area. In a further implementation, the facial window has a fixed size, such as 100x100 pixels, for different faces of different people”); and executing, in the computer system, a recognition algorithm configured to extract a feature vector representative of the detected face in the recognition bounding box wherein the feature vector is configured to be compared to a feature vector of a target individual's face (see Li col., ll. 25-38, where “[a]t 1308, for each facial feature in the detected face, the Software application calculates a matching score for each position (m, n) using the facial feature probability and each of the convolution values of the corresponding LBP feature templates”).
Li does not explicitly disclose in the computer system, adjusting pixel resolution of each segment to a detection resolution; applying, in the computer system, a detection algorithm configured to detect a face of one or more unidentified individuals within the segment; in at least some segments, generating a detection bounding box at the detection resolution around each of at least a plurality of the detected faces, each bounding box having a plurality of vertices; for each of at least a plurality of the detection bounding boxes, generating a recognition bounding box by mapping each vertex of the detection bounding box from their relative locations within a segment to their proportionally equivalent locations in the original frame of the video, whereby the recognition bounding box is proportionately larger than the detection bounding box.
However, Togashi discloses in the computer system, adjusting pixel resolution of each segment to a detection resolution; applying, in the computer system, a detection algorithm configured to detect a face of one or more unidentified individuals within the segment; in at least some segments, generating a detection bounding box at the detection resolution around each of at least a plurality of the detected faces, each bounding box having a plurality of vertices; for each of at least a plurality of the detection bounding boxes, generating a recognition bounding box by mapping each vertex of the detection bounding box from their relative locations within a segment to their proportionally equivalent locations in the original frame of the video, whereby the recognition bounding box is proportionately larger than the detection bounding box (see Togashi Fig. 1, and paras. 0091-0101, where the face is detected in the downsampled image, then mapped back to the original resolution – with proportional vertices on the bounding boxes – before normalization and recognition by the face recognizer).
It would have been obvious to one of ordinary skill in the art at the time of filing to use the face detection technique of Togashi on the images of Li, because it is predictable that the additional searching with additional face detector resolutions tuned at various expected resolutions would predictably improve the accuracy of face detection thereby improving overall face recognition by reducing the number of faces that are not detected.  It is predictable that performing face detection at more image resolutions will catch faces that the other resolutions would miss, thereby improving the overall robustness of the face detection and recognition. 

Regarding claim 2, Li discloses dividing the video into a set of frames, wherein each set of frames corresponds to a range of timestamps from the period of time during which the video was recorded (see Li Fig. 11A, and col. 18, ll. 35-41, where "[a]t 1102, the application further selects a set of representing frames or all frames from the video clip to derive a model” and “[a]t 1104, the Software application performs a process, such as the process 200, to detect a face and derive a final feature of the face from a first frame, for example, such as the first or second frame of the selected set of frames”); and wherein each segment includes a portion of the frame such that the proportion of a face with respect to the segment is larger relative to the proportion of the face with respect to the frame (see Li col. 18, ll. 41-44, where “the server application identifies the facial area or window within the first frame that contains the detected face” and “[f]or example, the facial window is in a rectangular or square shape”).

Regarding claim 7, Li discloses the step of executing the recognition algorithm further comprises: for each segment, extracting, through the use of a neural network executing in the computer system (see Li col. 14, ll. 34-36, where the software application is based on a deep  is based on a multi-layer deep belief network (for example, a neural network) that uses two image features to derive a new image feature; and Li col. 14, ll. 55-59, where “Model training process is performed on a set of images to derive a final or recognition model for a certain face. Once the model is available, it is used to recognize a face within an image. The recognition process is further illustrated by reference to FIG. 4”), at least one feature vector describing at least one physical feature of each detected face within the segment (see Li col. 12, ll. 5-11, where “For each facial feature part, at 212, the software application concatenates the set of image features into a Subpart feature. For example, the set of image features is concatenated into an Mx1 or 1xM vector, where M is the number of image features in the set. At 214, the Software application concatenates the Mx1 or 1xM vectors of all the facial feature parts into a full feature for the face”).

Regarding claim 16, Li discloses a non-transitory computer readable storage medium comprising stored program code executable by at least one processor, the program code when executed causes the processor to (see Li Abstract, and col. 7. ll. 44, and col. 7, ll. 50, and col. 8, ll. 27, and col. 8, ll. 67 to col. 9, ll. 3, where a computer programmed with computer code is used): access, from computer memory, a video comprising a plurality of frames having an original capture resolution that capture the movement of one or more unidentified individuals over a period of time (see Li col. 8, ll. 4-22, where “[a]s an additional example, the computer 102 requests a video clip, containing a set (meaning one or more) of frames or still images, from the web server 114”); divide at least some of the plurality of frames of the video into one or more sets of segments, wherein each segment describes a part of a frame of the video (see Li col. 10, ll. 32-37, where “[i]n one implementation, the facial portion (also referred to herein as a facial window) is a rectangular area. In a further implementation, the facial window has a fixed size, such as 100x100 pixels, for different faces of different people”); and execute a recognition algorithm configured to extract a feature vector representative of the detected face in the recognition bounding box wherein the feature vector is configured to be compared to a feature vector of a target individual's face (see Li col., ll. 25-38, where “[a]t 1308, for each facial feature in the detected face, the Software application calculates a matching score for each position (m, n) using the facial feature probability and each of the convolution values of the corresponding LBP feature templates”).
Li does not explicitly disclose adjust, for each segment, pixel resolution of the segment to a smaller detection resolution such that a detection algorithm detects a face of one more unidentified individuals within the segment; responsive to the detection algorithm detecting a face, box around each of at least a plurality of the detected faces, each bounding box having a plurality of vertices; generate, for each of at least a plurality of the bounding boxes, a recognition bounding box by mapping each vertex of the detection bounding box from their relative locations within a segment to their proportionally equivalent locations in the original frame of the video, whereby the recognition bounding box is proportionately larger than the detection bounding box.
However, Togashi discloses adjust, for each segment, pixel resolution of the segment to a smaller detection resolution such that a detection algorithm detects a face of one more unidentified individuals within the segment; responsive to the detection algorithm detecting a face, box around each of at least a plurality of the detected faces, each bounding box having a plurality of vertices; generate, for each of at least a plurality of the bounding boxes, a recognition bounding box by mapping each vertex of the detection bounding box from their relative locations within a segment to their proportionally equivalent locations in the original frame of the video, whereby the recognition bounding box is proportionately larger than the detection bounding box (see Togashi Fig. 1, and paras. 0091-0101, where the face is detected in the downsampled image, then mapped back to the original resolution – with proportional vertices on the bounding boxes – before normalization and recognition by the face recognizer).
It would have been obvious to one of ordinary skill in the art at the time of filing to use the face detection technique of Togashi on the images of Li, because it is predictable that the additional searching with additional face detector resolutions tuned at various expected resolutions would predictably improve the accuracy of face detection thereby improving overall face recognition by reducing the number of faces that are not detected.  It is predictable that performing face detection at more image resolutions will catch faces that the other resolutions would miss, thereby improving the overall robustness of the face detection and recognition. 

Regarding claim 20, Li discloses a system comprising: an input-output interface, communicatively coupled to at least one processor for at least partly directing storage of data in and retrieval of data from computer memory; and a non-transitory computer readable storage medium comprising stored program code executable by the at least one processor, the program code when executed causing the processor to (see Li Abstract, and col. 7. ll. 44, and col. 7, ll. 50, and col. 8, ll. 27, and col. 8, ll. 67 to col. 9, ll. 3, where a computer programmed with computer code is used): access, from computer memory, a video comprising one or more frames of pixels capturing the movement of one or more unidentified individuals over a period of time, the frames having an original capture resolution (see Li col. 8, ll. 4-22, where “[a]s an additional example, the computer 102 requests a video clip, containing a set (meaning one or more) of frames or still images, from the web server 114”); divide at least some frames of the video into one or more sets of segments, wherein each segment describes a part of a frame of the video such that the size proportion of a face in the segment increases relative to the proportion of the face in the frame (see Li col. 10, ll. 32-37, where “[i]n one implementation, the facial portion (also referred to herein as a facial window) is a rectangular area. In a further implementation, the facial window has a fixed size, such as 100x100 pixels, for different faces of different people”); and execute a recognition algorithm configured to extract a feature vector representative of each detected face wherein each such feature vector is configured to be compared to a feature vector of a target individual's face (see Li col., ll. 25-38, where “[a]t 1308, for each facial feature in the detected face, the Software application calculates a matching score for each position (m, n) using the facial feature probability and each of the convolution values of the corresponding LBP feature templates”).
Li does not explicitly disclose adjust, for each segment, pixel resolution of the segment to a detection resolution such that a detection algorithm detects a face of one more unidentified individuals within the segment; responsive to the detection algorithm detecting a face, map, for each detected face, their relative locations within a segment at detection resolution to their proportionally equivalent locations in the original frame of the video; and responsive to the mapping.
However, Togashi discloses adjust, for each segment, pixel resolution of the segment to a detection resolution such that a detection algorithm detects a face of one more unidentified individuals within the segment; responsive to the detection algorithm detecting a face, map, for each detected face, their relative locations within a segment at detection resolution to their proportionally equivalent locations in the original frame of the video; and responsive to the mapping (see Togashi Fig. 1, and paras. 0091-0101, where the face is detected in the downsampled image, then mapped back to the original resolution – with proportional vertices on the bounding boxes – before normalization and recognition by the face recognizer).
It would have been obvious to one of ordinary skill in the art at the time of filing to use the face detection technique of Togashi on the images of Li, because it is predictable that the additional searching with additional face detector resolutions tuned at various expected resolutions would predictably improve the accuracy of face detection thereby improving overall face recognition by reducing the number of faces that are not detected.  It is predictable that performing face detection at more image resolutions will catch faces that the other resolutions would miss, thereby improving the overall robustness of the face detection and recognition. 

Claim(s) 6, 8, 11, 19, and 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Togashi as applied to claim(s) 1, 16, and 20 above, and in further view of Luo et al., US 7,689,011 B2 (hereinafter referred to as “Luo”).

Regarding claim 6, Li does not explicitly disclose wherein the step of executing the recognition algorithm further comprises: identifying, in the computer system, the face within the bounding box based at least in part on one or more colors of pixels representing physical features of the detected face; for each background pixel within the bounding box, reducing the influence of each background pixel surrounding the face by normalizing the color of each background pixel within the bounding box; and extracting, in the computer system, the feature vector representative of the detected face.
However, Luo discloses wherein the step of executing the recognition algorithm further comprises: identifying, in the computer system, the face within the bounding box based at least in part on one or more colors of pixels representing physical features of the detected face (see Luo col. 5, ll. 46-50, where “[i]n general, any type of facial feature extraction process may be used to extract features from the detected face regions that are reported in the facial parameter values 42 that are output by the face detection processing component 12” and col. 5, ll. 64-67, where “[i]n some embodiments, each of the auxiliary identification regions is represented by a set of low-level image features (e.g., color features, texture features, shape and layout features) . . .”); for each background pixel within the bounding box, reducing the influence of each background pixel surrounding the face by normalizing the color of each background pixel within the bounding box (see Luo col. 8, ll. 40-44, where “[t]he color normalization processing component 82 color normalizes the auxiliary identification region and passes a color-normalized version 83 of the auxiliary identification region to the feature extraction processing component 16”); and extracting, in the computer system, the feature vector representative of the detected face (see Luo col. 8, ll. 8-11, where “[t]he set of auxiliary identification feature values that are computed for the auxiliary identification region 62 constitutes an auxiliary identification feature vector 70” and col. 9, ll. 29-34, where “[t]he color normalization processing component 82 passes the color-normalized auxiliary identification region 83 to the feature extraction processing component 16, which calculates features from the normalized auxiliary identification region in accordance with one or more of the auxiliary identification feature extraction methods described above”).
It would have been obvious to one of ordinary skill in the art at the time of filing to combine the method taught by Li and Togashi with the color normalization process of Luo.  The motivation for combining being that this would predictably allow for ensuring all colors are evaluated equally across the spectrum to ensure the correct facial feature is extracted.

Regarding claim 8, Li does not explicitly disclose further comprising: receiving in the computer system, from a user device, a query to compare one or more target individuals with one or more detected faces, the query comprising the feature vector of the face of each target individual wherein that feature vector describes physical features of the face of each target individual; and for each segment of the video, executing in the computer system a search for a feature vector associated with the detected face of an unidentified individual that matches the feature vector of the face of at least one target individual of the query by calculating the distance between the respective feature vectors.
However, Luo discloses further comprising: receiving in the computer system, from a user device, a query to compare one or more target individuals with one or more detected faces, the query comprising the feature vector of the face of each target individual wherein that feature vector describes physical features of the face of each target individual (see Luo col. 10, ll. 20-25, where “[a] respective distance feature vector is determined between the query image and each of the candidate images in the collection based on the features of the query image and the corresponding features of the candidate images (FIG.9, block 102)” and col. 5, ll. 50-55, where “Exemplary facial feature extraction processes include, but are not limited to: edge, line and curve based feature extraction methods; extraction methods based on templates that are designed to detect specific features points (e.g., the eyes and mouth) and structural matching methods”); and for each segment of the video, executing in the computer system a search for a feature vector associated with the detected face of an unidentified individual that matches the feature vector of the face of at least one target individual of the query by calculating the distance between the respective feature vectors (see Luo col. 10, ll. 4-8, where “[t]he candidate images are ranked based on the computed similarity measures (FIG. 8, block 94)” and “the candidate images typically are ranked in order from the candidate image most similar to the query image to the candidate image least similar to the query image” and col. 2, ll. 48-51, where the input image may correspond to any type of image include an original image, for example, a video keyframe, a still image, or scanned image and that it can be captured by an image sensor, for example, a digital video camera).
It would have been obvious to one of ordinary skill in the art at the time of filing to combine the method taught by Li and Togashi with the query process of Luo.  The motivation for combining being that this would predictably allow for multiple images features to be used to lead to matching target image that is of interest to the user.

Regarding claim 11, Li discloses further comprising the step of calculating a confidence level for a match wherein the confidence level for a match is inversely related to the determined distance between the feature vector of the face of the target individual and the extracted feature vector of the unidentified individual (see Li col. 16, ll. 45-60, where "[i]n one implementation, the image feature distances are ranked from the Smallest to the largest; and the K faces corresponding to the first K. Smallest image feature distances" via a K-nearest neighbor algorithm comprising a ranking score being the inverse of the image feature distance).

Regarding claim 19, Li does not explicitly disclose further comprising stored program code that when executed causes the processor to: distinguish, by the recognition algorithm, the detected face within the bounding box from background pixels based on one or more colors of pixels representing physical features of the detected face; for each background pixel within the bounding box, reduce the influence of the environment surrounding the detected face by normalizing the color of each background pixel within the bounding box; and extract the feature vector representative of the detected face.
However, Luo discloses further comprising stored program code that when executed causes the processor to: distinguish, by the recognition algorithm, the detected face within the bounding box from background pixels based on one or more colors of pixels representing physical features of the detected face (see Luo col. 5, ll. 46-50, where “[i]n general, any type of facial feature extraction process may be used to extract features from the detected face regions that are reported in the facial parameter values 42 that are output by the face detection processing component 12” and col. 5, ll. 64-67, where “[i]n some embodiments, each of the auxiliary identification regions is represented by a set of low-level image features (e.g., color features, texture features, shape and layout features) . . .”); for each background pixel within the bounding box, reduce the influence of the environment surrounding the detected face by normalizing the color of each background pixel within the bounding box (see Luo col. 8, ll. 40-44, where “[t]he color normalization processing component 82 color normalizes the auxiliary identification region and passes a color-normalized version 83 of the auxiliary identification region to the feature extraction processing component 16”); and extract the feature vector representative of the detected face (see Luo col. 8, ll. 8-11, where “[t]he set of auxiliary identification feature values that are computed for the auxiliary identification region 62 constitutes an auxiliary identification feature vector 70” and col. 9, ll. 29-34, where “[t]he color normalization processing component 82 passes the color-normalized auxiliary identification region 83 to the feature extraction processing component 16, which calculates features from the normalized auxiliary identification region in accordance with one or more of the auxiliary identification feature extraction methods described above”).
It would have been obvious to one of ordinary skill in the art at the time of filing to combine the method taught by Li and Togashi with the color normalization process of Luo.  The motivation for combining being that this would predictably allow for ensuring all colors are evaluated equally across the spectrum to ensure the correct facial feature is extracted.

Regarding claim 23, Li discloses wherein the stored program code further comprises program code that when executed causes the processor to: generate, by the detection algorithm, a bounding box encompassing the detected face, wherein the bounding box demarcates the detected face from a surrounding background environment recorded by the video (see Li col. 18, ll. 41-44, where “the server application identifies the facial area or window within the first frame that contains the detected face” and “[f]or example, the facial window is in a rectangular or square shape”). 
Li does not explicitly disclose distinguish the face within the bounding box from background pixels based on one or more colors of pixels representing physical features of the face; for each background pixel of the bounding box, reduce the influence of the environment surrounding the face by normalizing the color of each background pixel within the bounding box; and extract the feature vector representative of the face.
However, Luo discloses distinguish the face within the bounding box from background pixels based on one or more colors of pixels representing physical features of the face (see Luo col. 5, ll. 46-50, where “[i]n general, any type of facial feature extraction process may be used to extract features from the detected face regions that are reported in the facial parameter values 42 that are output by the face detection processing component 12” and col. 5, ll. 64-67, where “[i]n some embodiments, each of the auxiliary identification regions is represented by a set of low-level image features (e.g., color features, texture features, shape and layout features) . . .”); for each background pixel of the bounding box, reduce the influence of the environment surrounding the face by normalizing the color of each background pixel within the bounding box (see Luo col. 8, ll. 40-44, where “[t]he color normalization processing component 82 color normalizes the auxiliary identification region and passes a color-normalized version 83 of the auxiliary identification region to the feature extraction processing component 16”); and extract the feature vector representative of the face (see Luo col. 8, ll. 8-11, where “[t]he set of auxiliary identification feature values that are computed for the auxiliary identification region 62 constitutes an auxiliary identification feature vector 70” and col. 9, ll. 29-34, where “[t]he color normalization processing component 82 passes the color-normalized auxiliary identification region 83 to the feature extraction processing component 16, which calculates features from the normalized auxiliary identification region in accordance with one or more of the auxiliary identification feature extraction methods described above”).
It would have been obvious to one of ordinary skill in the art at the time of filing to combine the method taught by Li and Togashi with the color normalization process of Luo.  The motivation for combining being that this would predictably allow for ensuring all colors are evaluated equally across the spectrum to ensure the correct facial feature is extracted.

Claim(s) 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Togashi as applied to claim(s) 1 above, and in further view of Wang et al., US 2016/006335 A1 (hereinafter referred to as “Wang”).

Regarding claim 13, Li discloses consecutive frames (see Li Fig. 11A, and col. 18, ll. 31-64, where features for faces are extracted from consecutive image frames in a video clip).
Li does not explicitly further comprising: detecting in the computer system, within a plurality of frames, an unidentified individual, wherein the detections are based at least in part on the extracted feature vector for the detected face of the unidentified individual in each frame of the plurality; determining in the computer system, for pairs of consecutive frames, a distance between the extracted feature vectors; responsive to determining the distance to be within a threshold distance, generating, for pairs of consecutive frames, an updated feature vector representative of the detected face of an unidentified individual by aggregating the feature vectors from the pair of frames; and clustering in the computer system, across any plurality of frames of the video, representative feature vectors determined to be within a threshold distance.
However, Wang discloses further comprising: detecting in the computer system, within a plurality of frames, an unidentified individual, wherein the detections are based at least in part on the extracted feature vector for the detected face of the unidentified individual in each frame of the plurality; determining in the computer system, for pairs of consecutive frames, a distance between the extracted feature vectors; responsive to determining the distance to be within a threshold distance, generating, for pairs of consecutive frames, an updated feature vector representative of the detected face of an unidentified individual by aggregating the feature vectors from the pair of frames; and clustering in the computer system, across any plurality of frames of the video, representative feature vectors determined to be within a threshold distance (see Wang Figs. 4-8 and paras. 0041-0047 and 0064-0069, where facial feature vectors are combined and/or clustered based on a threshold distance between the feature vectors).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify the combination of Li and Togashi by applying to the consecutive image frames of Li the use the threshold distance to determine feature sets and combination for multiple images as taught by Wang.  The motivation to combine being that this would predictably improve the accuracy of person detection using multiple different fused feature sets of an individual in question.

Claim(s) 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Togashi as applied to claim(s) 1 above, and in further view of Liu et al., US 9,373,024 B2 (hereinafter referred to as “Liu”).

Regarding claim 14, Li does not explicitly disclose further comprising: identifying in the computer system, from one or more frames of the video, frames in which an unidentified individual was present with a second individual; identifying in the computer system, for each combination of unidentified individuals and second individuals, the number of frames in which both individuals are present; and assigning in the computer system a label to each combination based on the identified number of frames, the label describing a strength of the relationship between the individuals of the combination.
However, Liu discloses further comprising: identifying in the computer system, from one or more frames of the video, frames in which an unidentified individual was present with a second individual (see Liu col. 8, ll. 20-25, where "[i]n another embodiment, information from event data could be stored in database 20 and used to estimate social relationships between people from many images” and “[a]s different people are identified using the content of images and event data about those people and images are stored, the system can begin to estimate social relationships between individuals”); identifying in the computer system, for each combination of unidentified individuals and second individuals, the number of frames in which both individuals are present (see Liu col. 8, ll. 25-33, where “[t]hese relationships would have stronger or weaker computed links based on the co-occurrence of people (420 of FIG. 4) and event data from the corresponding images that they were identified in” and “[f]or example, as Zoe Ellen and Nathanial Jackson appeared in more images together in different locations, the strength of a computed a social link could be incremented and the system might propose that Zoe and Nathanial are friends or are related”); and assigning in the computer system a label to each combination based on the identified number of frames, the label describing a strength of the relationship between the individuals of the combination (see Liu Fig. 4, and col. 8, ll. 25-33, where a table with the individuals and their events together in image frames are disclosed).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify the combination of Li and Togashi to use relationship between individuals as taught by Liu. The motivation to combine being that this would predictably improve the accuracy of person detection using the relationships with other individuals to allow for additional features to be used with facial recognition.

Conclusion
Pertinent prior art: Ren et al., US 2021/0133461 A1 (hereinafter referred to as “Ren”) discloses object tracking for consecutive frames of a video; and Kim et al., US 2006/0204058 A1 (hereinafter referred to as “Kim”) and Feng et al., US 2016/0299920 A1 (hereinafter referred to as “Feng”) discloses clustering of feature vectors (see Kim Fig. 4).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW M MOYER whose telephone number is (571)272-9523. The examiner can normally be reached Monday-Friday 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on (571)270-1051. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ANDREW M MOYER/             Primary Examiner, Art Unit 2663