Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Preliminary Amendment
Preliminary Amendment filed 03/18/2021 is received and acknowledged.

Specification
Applicant is reminded of the proper content of an abstract of the disclosure.
Extensive mechanical and design details of an apparatus should not be included in the abstract. The abstract should be in narrative form and generally limited to a single paragraph within the range of 50 to 150 words in length.
See MPEP § 608.01(b) for guidelines for the preparation of patent abstracts.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1 - 20 are rejected under 35 U.S.C. 101 
Regarding Independent Claim 1,
Step 1 Analysis: Claim 1 is directed to a method, which falls within one of the four statutory categories. 

Step 2A Prong 1 Analysis: Claim 1 recites, in part, “performing a person detection on the video images, to determine, according to an obtained person detection result, an image set corresponding to at least one person among a plurality of persons, the image set including person images; determining track information of the at least one person according to position information of the plurality of image capture devices, the image set corresponding to the at least one person, and a time for capturing the person images; and determining companions among the plurality of persons according to the track information of the plurality of persons.” The limitations of “performing a person detection on the video images, to determine, according to an obtained person detection result, an image set corresponding to at least one person among a plurality of persons, the image set including person images; determining track information of the at least one person according to position information of the plurality of image capture devices, the image set corresponding to the at least one person, and a time for capturing the person images; and determining companions among the plurality of persons according to the track information of the plurality of persons,” as drafted, are processes that, under broadest reasonable interpretation, covers the performance of the limitation in the mind which falls within the “Mental Processes” grouping of abstract ideas. The limitations of:
performing a person detection on the video images, to determine, according to an obtained person detection result, an image set corresponding to at least one person among a plurality of persons, the image set including person images can be considered to be an observation in the human mind. 
determining track information of the at least one person according to position information of the plurality of image capture devices, the image set corresponding to the at least one person, and a time for capturing the person images can be considered to be an observation in the human mind. 
determining companions among the plurality of persons according to the track information of the plurality of persons can be considered to be an observation in the human mind.
Accordingly, the claim recites an abstract idea.

Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional element – 
obtaining video images respectively captured by a plurality of image devices deployed in different areas during a preset time period. 
This step merely constitutes pre-solution activity involving data gathering. Such extra-solution activity does not integrate the abstract idea into a practical application. Please see MPEP §2106.05(g). 
In view of the of the foregoing, the additional step does not integrate the abstract idea into a practical application.

Step 2B Analysis: The additional elements do not amount to significantly more than the judicial exception. Indeed, the pre-solution step of obtaining video images does not constitute an improvement to the functioning of a computer or other technical field, it does not perform a transformation, nor does it apply the abstract idea in a meaningful way beyond generally linking to a particular technological environment. Furthermore, the “plurality of image devices” are generic devices; thus, the additional element does not apply the abstract idea to a particular machine. Please see MPEP §2106.05. 

For all of the foregoing reasons, claim 1 does not comply with the requirements of 35 USC 101. 

Regarding Independent Claim 16,
Step 1 Analysis: Claim 16 is directed to a device, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 16 recites, in part, “performing a person detection on the video images, to determine, according to an obtained person detection result, an image set corresponding to at least one person among a plurality of persons, the image set including person images; determining track information of the at least one person according to position information of the plurality of image capture devices, the image set corresponding to the at least one person, and a time for capturing the person images; and determining companions among the plurality of persons according to the track information of the plurality of persons.” The limitations of “performing a person detection on the video images, to determine, according to an obtained person detection result, an image set corresponding to at least one person among a plurality of persons, the image set including person images; determining track information of the at least one person according to position information of the plurality of image capture devices, the image set corresponding to the at least one person, and a time for capturing the person images; and determining companions among the plurality of persons according to the track information of the plurality of persons,” as drafted, are processes that, under broadest reasonable interpretation, covers the performance of the limitation in the mind which falls within the “Mental Processes” grouping of abstract ideas. The limitations of:
performing a person detection on the video images, to determine, according to an obtained person detection result, an image set corresponding to at least one person among a plurality of persons, the image set including person images can be considered to be an observation in the human mind. 
determining track information of the at least one person according to position information of the plurality of image capture devices, the image set corresponding to the at least one person, and a time for capturing the person images can be considered to be an observation in the human mind. 
determining companions among the plurality of persons according to the track information of the plurality of persons can be considered to be an observation in the human mind.
Accordingly, the claim recites an abstract idea.

Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites the following additional elements – 
a processor; and 
a memory configured to store processor executable instructions, 
wherein the processor is configured to invoke the instructions stored in memory so as to: 
obtain video images respectively captured by a plurality of image devices deployed in different areas during a preset time period. 
The additional elements – “a processor,”  “a memory” – are recited at a high level of generality (i.e. as a generic processor performing a generic computer function of evoking instructions stored in the memory) such that it amounts to no more than mere instructions to apply the exception using a generic camera. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim as a whole is directed to an abstract idea. Accordingly, these additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. Please see MPEP §2106.04.(a)(2).III.C. 
The step of obtaining the video images merely constitutes pre-solution activity involving data gathering. Such extra-solution activity does not integrate the abstract idea into a practical application. Please see MPEP §2106.05(g). 
In view of the of the foregoing, the additional step does not integrate the abstract idea into a practical application.


Step 2B Analysis: The additional elements are not sufficient to amount to significantly more than the judicial exception. The additional elements of utilizing an electronic device and a processor to perform the steps of the claimed process amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Please see MPEP §2106.05(b). The claim is not patent eligible.
Furthermore, the pre-solution step of obtaining video images does not constitute an improvement to the functioning of a computer or other technical field, it does not perform a transformation, nor does it apply the abstract idea in a meaningful way beyond generally linking to a particular technological environment. Furthermore, the “plurality of image devices” are generic devices; thus, the additional element does not apply the abstract idea to a particular machine. Please see MPEP §2106.05. 
For all of the foregoing reasons, claim 16 does not comply with the requirements of 35 USC 101.

Regarding Claim 20,
Step 1 Analysis: Claim 20 is directed to a device, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 20 recites, in part, “performing a person detection on the video images, to determine, according to an obtained person detection result, an image set corresponding to at least one person among a plurality of persons, the image set including person images; determining track information of the at least one person according to position information of the plurality of image capture devices, the image set corresponding to the at least one person, and a time for capturing the person images; and determining companions among the plurality of persons according to the track information of the plurality of persons.” The limitations of “performing a person detection on the video images, to determine, according to an obtained person detection result, an image set corresponding to at least one person among a plurality of persons, the image set including person images; determining track information of the at least one person according to position information of the plurality of image capture devices, the image set corresponding to the at least one person, and a time for capturing the person images; and determining companions among the plurality of persons according to the track information of the plurality of persons,” as drafted, are processes that, under broadest reasonable interpretation, covers the performance of the limitation in the mind which falls within the “Mental Processes” grouping of abstract ideas. The limitations of:
performing a person detection on the video images, to determine, according to an obtained person detection result, an image set corresponding to at least one person among a plurality of persons, the image set including person images can be considered to be an observation in the human mind. 
determining track information of the at least one person according to position information of the plurality of image capture devices, the image set corresponding to the at least one person, and a time for capturing the person images can be considered to be an observation in the human mind. 
determining companions among the plurality of persons according to the track information of the plurality of persons can be considered to be an observation in the human mind.
Accordingly, the claim recites an abstract idea.

Step 2A Prong 2 Analysis: 
This judicial exception is not integrated into a practical application. In particular, the claim recites the following additional elements – 
A non-transitory computer-readable medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, is caused to perform the operations of: 
obtaining video images respectively captured by a plurality of image devices deployed in different areas during a preset time period. 
The additional elements – “a processor,”  “a non-transitory computer-readable storage medium” – are recited at a high level of generality (i.e. as a generic processor performing a generic computer function of evoking instructions stored in the memory) such that it amounts to no more than mere instructions to apply the exception using a generic camera. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim as a whole is directed to an abstract idea. Accordingly, these additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. Please see MPEP §2106.04.(a)(2).III.C. 
The step of obtaining the video images merely constitutes pre-solution activity involving data gathering. Such extra-solution activity does not integrate the abstract idea into a practical application. Please see MPEP §2106.05(g). 
In view of the of the foregoing, the additional step does not integrate the abstract idea into a practical application.

Step 2B Analysis: The additional elements are not sufficient to amount to significantly more than the judicial exception. The additional elements of utilizing an electronic device and a processor to perform the steps of the claimed process amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Please see MPEP §2106.05(b). The claim is not patent eligible.
Furthermore, the pre-solution step of obtaining video images does not constitute an improvement to the functioning of a computer or other technical field, it does not perform a transformation, nor does it apply the abstract idea in a meaningful way beyond generally linking to a particular technological environment. Furthermore, the “plurality of image devices” are generic devices; thus, the additional element does not apply the abstract idea to a particular machine. Please see MPEP §2106.05. 
For all of the foregoing reasons, claim 20 does not comply with the requirements of 35 USC 101.

Regarding Claim 2 (and its dependent claims 4-8) and corresponding claim 17 (and its dependent claims 18-19),
Claim 2 is dependent on independent claim 1 and therefore includes all the limitations of claim 1. Thus claim 2 recites a mental process. Claim 2 further recites:
determining, for at least one person image in the image set corresponding to the at least one person, first position information of a target person in the person image in a video image corresponding to the person image which is part of the mental process since a person can visually identify and track persons in video. 

Claim 2 further recites additional elements:
determining a spatial position coordinate of the target person in a spatial coordinate system according to the first position information and second position information, the second position information being position information of an image capture device for capturing the video image corresponding to the person image; 
obtaining a spatio-temporal position coordinate of the target person in a spatio-temporal coordinate system according to the spatial position coordinate and time for capturing the video image corresponding to the person image; and 
obtaining the track information of the at least one person in the spatio-temporal coordinate system according to spatio-temporal position coordinates of the plurality of persons. 
These additional elements integrate the abstract idea into a practical application. Specifically, as discussed in the first full paragraph on page 11 of the specification of the subject application, “Since the track information can better reflect the dynamic state of the at least one person, determining the companions based on the track information can improve the accuracy of detection on the companions”. The additional elements of claim 2 describe the underlying technological features of reflecting the dynamic state of the persons in the images, and thereby constitute an improvement to the technical field of companion detection in video. As such, the additional elements of claim 2 integrate the mental process into a practical application. Therefore, claim 2 recites eligible subject matter. 

Claims 4-8 are dependent on claim 2 and therefore also recite eligible subject matter by virtue of their dependency. 

Claim 17 recites features nearly identical to those recited in claim 2. Accordingly, claim 17 recites eligible subject matter for reasons analogous to those discussed above in conjunction with claim 2. 
Claims 18-19 are dependent on claim 2 and therefore also recite eligible subject matter by virtue of their dependency.

Regarding Claim 3,
Claim 3 is dependent on independent claim 1 and therefore includes all the limitations of claim 1. Thus claim 3 recites a mental process. Claim 3 further recites 
clustering the track information of the plurality of persons to obtain at least one cluster set; and 
determining persons respectively corresponding to a plurality of pieces of track information belonging to the same cluster set as a group of companions. 
These steps can be performed mentally with pen and paper by drawing a diagram of the trajectories of visually observed persons and identifying the drawn trajectories that correspond to one another as the trajectories of companions. Thus, claim 3 merely recites further elements that are part of the mental process of claim 1. Claim 3 does not recite additional elements that integrate the abstract idea into a practical application or recite significantly more. Therefore, claim 3 does not recite eligible subject matter. 

Regarding Claim 9,
Claim 9 is dependent on independent claim 1 and therefore includes all the limitations of claim 1. Thus claim 9 recites a mental process. Claim 9 further recites 
performing the person detection on the video images to obtain person images including detection information, the person detection including at least one of face detection and body detection, wherein in a case where the person detection includes the face detection, the detection information includes face information; and in a case where the person detection includes the body detection, the detection information includes body information; and determining, according to the person images, the image set corresponding to the at least one person among the plurality of persons. 
These steps can be performed mentally since a human can observe video and visually identify faces and corresponding bodies of persons in the images. Thus, claim 9 merely recites further elements that are part of the mental process of claim 1. Claim 9 does not recite additional elements that integrate the abstract idea into a practical application or recite significantly more. Therefore, claim 9 does not recite eligible subject matter.

Regarding Claim 10 (and its dependent claims 11-14),
Claim 10 is dependent on claim 9 and therefore includes all the limitations of claim 9. Thus, claim 10 recites a mental process. Claim 10 further recites
clustering the person images including the face information to obtain a face clustering result, the face clustering result including at least one face identity for the person images including the face information; 
clustering the person images including the body information to obtain a body clustering result, the body clustering result including at least one body identity for the person images including the body information; and 
determining, according to the face clustering result and the body clustering result, the image set corresponding to the at least one person among the plurality of persons. 
These steps cannot be performed mentally and are therefore considered additional elements. Further, these additional elements integrate the abstract idea into a practical application. Specifically, as discussed in the last full paragraph on page 27 of the specification of the subject application, “through mutual correction between the face clustering result and the body clustering result, the clustering accuracy may be improved, and thus the accuracy of the image set corresponding to the person and obtained according to the body clustering result and the face clustering result is improved; and through the more accurate image set, the more accurate track information may be determined”. The additional elements of claim 10 describe the underlying technological features of aggregating face and body clustering results to increase accuracy of determining track information, and thereby constitute an improvement to the technical field of companion detection in video. As such, the additional elements of claim 10 integrate the mental process into a practical application. Therefore, claim 10 recites eligible subject matter.
Claims 11-14 are dependent on claim 10 and therefore also recite eligible subject matter by virtue of their dependency. 

Regarding Claim 15,
Claim 15 is dependent on independent claim 1 and therefore includes all the limitations of claim 1. Thus claim 15 recites a mental process. Claim 15 further recites 
determining a marketing plan for the companions according to the companions among the plurality of persons; and 
determining an abnormal person among the companions. 
These additional elements constitute extra-solution activity that does not integrate the abstract idea into a practical application nor provide significantly more. Thus, claim 15 does not recite eligible subject matter. 





Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-5, 15-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Moon et al. (“US 11,004,093 B1” hereinafter as “Moon”).
Regarding claim 1, Moon discloses a method for detecting companions (column 4, lines 33-35, discloses the present invention is a method to detect shopping groups; moreover, column 1, lines 41-42, discloses the groups can be family or friends; therefore, shopping groups can be understood as companions as claimed), comprising: obtaining video images respectively captured by a plurality of image capture devices deployed in different areas (col. 6, lines 30-33, discloses FIG.2 shows different means for capturing images track the shoppers within each field-of-view and also across different fields of views; therefore, image capturing means can be understood as a plurality of image capturing devices) during a preset time period (FIG. 4 shows y-axis indicates time [a copy of FIG. 4 is shown below to better assist with understanding] all shoppers are detected within the same time period; moreover, col. 6, lines 35-38, discloses both spatial coordinate and timestamps of instances of shoppers are recoded from the image capturing means; therefore, FIG. 2 y-axis shows the time period of detection of instances of shoppers can be understood as a preset time period as claimed); performing a person detection on the video images (col. 7, lines 21-23, discloses a person detection step searches for shoppers individually through frames for tracking), to determine, according to an obtained person detection result, an image set corresponding to at least one person among a plurality of persons (col. 7, lines 23-27, discloses based on the result of person detection, shopper trajectories can be obtained through frames; therefore, the frames used to obtain the shopper trajectories are interpreted as the claimed image sets corresponding to the respective shoppers, and each has one trajectory based on FIG. 5; moreover, col. 7, lines 21-22, indicates person detection step searching is done for shoppers individually, which indicates a plurality of shoppers can be detected through frames), the image set including person images (col. 6, lines 31-34, discloses the shopper trajectory estimation is done based on images of each shopper); determining track information of the at least one person (col. 7, lines 21-25, discloses the shopper trajectory includes tracking information of person detection through frames; moreover, FIG. 5 shows the step 438 “person tracking” follows the step 434 “person detection” and the result of shopper trajectories are obtained follows in step 422; therefore shopper trajectory can be understood as tracking information as claimed) according to position information of the plurality of image capture devices, the image set corresponding to the at least one person, and a time for capturing the person images (col. 6, lines 35-39, discloses shopper trajectories includes spatial coordinates and timestamps of instances of the shoppers; furthermore, line 40-43, indicates camera coordinates are used to indicate world coordinates so the trajectories can be consistently estimated across multiple views, which means camera coordinates are to present world coordinates to indicate a space; moreover, FIG.4’s x-axis indicating space of where movements captured; therefore, camera coordinates can be understood as position information of the plurality of image capture devices; moreover, col. 7, lines 35-54, and FIG. 6 showing embodiment of trajectories, discloses a point within the trajectory having coordinate represented in (x, y,) such as, the origin coordinate is (0, 0) with a timestamp in the spatiotemporal coordinate system; furthermore, as discussed previously, the shopper trajectory include the image set and timestamps can be understood as a time for capturing the person images, timestamps constitute the time axis in FIG. 4); and determining companions among the plurality of persons according to the track information of the plurality of persons (FIG. 17 shows steps including step 602 of recognizing shopping groups based on step 615’s shopper trajectories).

    PNG
    media_image1.png
    981
    828
    media_image1.png
    Greyscale

Moon’s Figure 4
Regarding claim 2, Moon discloses the method according to claim 1, wherein determining the track information of the at least one person according to the position information of the plurality of image capture devices, the image set corresponding to the at least one person, and the time for capturing the person images includes (col. 6, lines 35-39 and column 7 in lines 21-25 and 35-34 disclose obtaining of person trajectory including tracking information and position information as discussed above in claim 1): determining, for at least one person image in the image set corresponding to the at least one person, first position information of a target person in the person image in a video image corresponding to the person image (col. 9, lines 22-37 discloses FIG. 11 showing a camera view capturing images from above create two trajectories of two persons, and a rectangle representing a graph of position information in space of the persons represented in vertical axis, y-axis is the time axis; therefore, the graph indicates of position information according to captured person images, the graph shows two lines of position information corresponding to the two trajectories, each dot within each line represents one position point in space of the person within his/her trajectory according to a captured person image at a certain timestamp; the position information of the person in space within an image view can be understood as the first information as claimed); determining a spatial position coordinate of the target person in a spatial coordinate system according to the first position information and second position information, the second position information being position information of an image capture device for capturing the video image corresponding to the person image (col. 6, lines 40-43 discloses camera coordinates are used to indicate world coordinates as discussed in claim 1 above in the limitation including position information of plurality of cameras; therefore, the position information of each capturing mean is taken into determining a spatial position coordinate of each instance of the shopper according to col. 6 in lines 27-43; camera coordinates can be understood as second position information); obtaining a spatio-temporal position coordinate of the target person in a spatio-temporal coordinate system according to the spatial position coordinate and a time for capturing the video image corresponding to the person image (col. 6 in lines 35-44 discloses each instance of the shopper is represented a spatial coordinate and a timestamp in a spatiotemporal dimension); and obtaining the track information of the at least one person in the spatio-temporal coordinate system according to spatio-temporal position coordinates of the plurality of persons (col. 6 in lines 35-44 discloses person trajectories are obtained based on the spatiotemporal coordinates of the instances of the shoppers).
Regarding claim 3, Moon discloses the method according to claim 1, wherein determining companions among the plurality of persons according to track information of the plurality of persons includes (col. 4, lines 33-35, discloses the present invention is a method to detect shopping groups as discussed above in claim 1): clustering the track information of the plurality of persons to obtain at least one cluster set (col. 5 in lines 4-10 discloses grouping step utilizes a segmentation framework to find corresponding clusters of trajectories); and determining persons respectively corresponding to a plurality of pieces of track information belonging to the same cluster set as a group of companions set (col. 5 in lines 4-10 discloses trajectories that have high group scores are grouped together using segmentation method to identify shopping groups/groups of companion set).
Regarding claim 4, Moon discloses the method according to claim 2, wherein the track information of the at least one person includes a point group in the spatio-temporal coordinate system; and determining companions among the plurality of persons according to track information of the plurality of persons includes (col. 6  in lines 35-44 discloses determining shopping groups/companions according to tracking information of person trajectories as discussed above in claim 2): determining similarity for point groups corresponding to every two persons in the spatio-temporal coordinate system in the track information of the plurality of persons (col. 6, lines 10-14, discloses a shopper trajectory pairing step combines two candidates as a pair and then perform determination to whether or not the pair belong to the same shopping group; moreover, col. 4, lines 64-67, discloses group scores are used to determine how likely the pair of shoppers belongs to the same group; moreover, col. 8, lines 24-28, discloses group scores is the likelihood of two trajectories belonging to the same shopping group computed by step 470 based on position [col. 11, lines 1-6]; col 7, lines 13-15 discloses that the pairs of trajectories are compared to identify trajectories that are close in spatiotemporal dimension to determine whether they show group behavior; group scores can be understood as similarity as claimed); determining a plurality of person pairs based on a relationship between the similarity and a first similarity threshold (col. 10, lines 38-53, discloses a shopping group merging, shown in FIG. 18 [a copy of FIG. 18 is shown below to better assist with understanding], the merging takes into determining proximity between two persons and additional person can be merged when they have a close proximity, FIG. 18 shows three persons detected and pairing is done based on proximity; this covers the instances of between shopper 425 and 426 can be paired based on close proximity and 425 and 427 can also be paired based on proximity; since 425 appears in both pairs, 425, 427 and 427 can be grouped into the same shopping group), each person pair including two persons, and the similarity for each person pair having a value greater than the first similarity threshold (FIG. 20 shows multiple pairs determined; moreover, col. 8, lines 20-33, discloses a group is determined when any pair of trajectories has a high group score, which indicates a threshold value for determining of high group score); and determining at least one group of companions according to the plurality of person pairs (col. 10, lines 38-53, discloses a shopping group merging as discussed previously, the same group having pairs of shoppers sharing high group scores each group score is determined between pair of shoppers).

    PNG
    media_image2.png
    810
    460
    media_image2.png
    Greyscale

Moon’s Figure 18
Regarding claim 5, Moon discloses the method according to claim 4, wherein determining at least one group of companions according to the plurality of person pairs includes (col. 10, lines 38-53, discloses a shopping group merging step as discussed above in claim 4): establishing a companion set according to a first person pair in the plurality of person pairs (col 10, lines 38-53, discloses a shopping group merging where a group having pairs of shoppers sharing high group scores as discussed above in claim 4; a group can be understood as a companion set); determining an associated person pair from at least one second person pair, other than the person pair included in the companion set, in the plurality of person pairs, the associated person pair including at least one person in the companion set (col. 10, lines 38-53, discloses a shopping group merging as discussed above in claim 4 also shown in FIG. 18, the merging takes into determining proximity between two people and additional person can be merged when they have a close proximity, FIG. 18 shows three persons detected and pairing is done based on proximity such as, between shopper 425 and 426 can be paired based on close proximity and 425 and 427 can also be paired based on good proximity; pair of 425 and 427 can be understood as an associated person pair from at least one second person pair having at least one person in the companion set as claimed); adding the associated person pair to the companion set; and determining persons in the companion set as a group of companions (col. 10, lines 38-53, discloses a shopping group merging as discussed above in claim 4 also shown in FIG. 18, since pairs of 425 and 426 can be grouped and 425 and 427 can also be grouped, they share the same person 425, therefore 425, 426 and 427 can be grouped together, can be understood as a companion group as claimed).
Regarding claim 15, Moon discloses the method according to claim 1, wherein after determining companions among the plurality of persons according to the track information of the plurality of persons (FIG. 17 shows steps including step 602 of recognizing shopping groups based on step 615’s shopper trajectories as discussed above in claim 1); Moon further discloses the method further comprises at least one of: determining a marketing plan for the companions according to the companions among the plurality of persons; and determining an abnormal person among the companions. (col. 2, lines 15-21, discloses based on the trajectories of group of people and group members, data can be obtained for marketing purpose). 
Regarding claim 16, Moon discloses an electronic device (col. 4, lines 8-13, discloses the present invention is a system, also in FIG. 2 showing an operational environment of the system according to col. 6 in lines 27-28; a system by BRI includes at least one electronic device), comprising: a processor (col. 4, lines 8-14 disclose that Moon’s system processes video frames to detect and track shoppers and to generate trajectories; such processing requires a computer that necessarily includes a processor); and a memory configured to store processor executable instructions (Col. 4, lines 8-14 disclose that Moon’s system processes video frames to detect and track shoppers and to generate trajectories; such processing requires computer-executable software instructions that are necessarily stored in memory), wherein the processor is configured to invoke the instructions stored in the memory so as to: obtain video images respectively captured by a plurality of image capture devices deployed in different areas (col. 6, lines 30-33, discloses FIG.2 shows different means for capturing images track the shoppers within each field-of-view and also across different fields of views; therefore, image capturing means can be understood as a plurality of image capturing devices) during a preset time period (FIG. 4 shows y-axis indicates time, all shoppers are detected within the same time period; moreover, col. 6, lines 35-38, discloses both spatial coordinate and timestamps of instances of shoppers are recoded from the image capturing means; therefore, FIG. 2 y-axis shows the time period of detection of instances of shoppers can be understood as a preset time period as claimed); perform person detection on the video images (col. 7, lines 21-23, discloses a person detection step searches for shoppers individually through frames for tracking) to determine, according to an obtained person detection result, an image set corresponding to at least one person among a plurality of persons (col. 7, lines 23-27, discloses based on the result of person detection, shopper trajectories can be obtained through frames; therefore, the shopper trajectories include the images/image sets corresponding to the shoppers, and each has one trajectory based on FIG. 5; moreover, col. 7, lines 21-22, indicates person detection step searching is done for shoppers individually, which indicates a plurality of shoppers can be detected through frames), the image set including person images (col. 6, lines 31-34, discloses the shopper trajectory estimation is done based on images of each shopper); determine track information of the at least one person (col. 7, lines 21-25, discloses the shopper trajectory includes tracking information of person detection through frames; moreover, FIG. 5 shows the step 438 “person tracking” follows the step 434 “person detection” and the result of shopper trajectories are obtained follows in step 422; therefore shopper trajectory can be understood as tracking information as claimed) according to position information of the plurality of image capture devices, the image set corresponding to the at least one person, and time for capturing the person images (col. 6, lines 35-39, discloses shopper trajectories includes spatial coordinates and timestamps of instances of the shoppers; furthermore, line 40-43, indicates camera coordinates are used to indicate world coordinates so the trajectories can be consistently estimated across multiple views, which means camera coordinates are to present world coordinates to indicate a space; moreover, FIG.4’s x-axis indicating space of where movements are captured; therefore, camera coordinates can be understood as position information of the plurality of image capture devices; moreover, col. 7, lines 35-54, and FIG. 6 showing embodiment of trajectories, discloses a point within the trajectory having coordinate represented in (x, y,) such as, the origin coordinate is (0, 0) with a timestamp in the spatiotemporal coordinate system; furthermore, as discussed previously, the shopper trajectory include the image set and timestamps can be understood as a time for capturing the person images, timestamps constitute the time axis in FIG. 4); and determine companions among the plurality of persons according to track information of the plurality of persons (FIG. 17 shows steps including step 602 of recognizing shopping groups based on step 615’s shopper trajectories).
Regarding claim 17, Moon discloses the electronic device according to claim 16, wherein determining the track information of the at least one person according to the position information of the plurality of image capture devices, the image set corresponding to the at least one person, and the time for capturing the person images includes (col. 6, lines 35-39 and col. 7 in lines 21-25 and 35-34 disclose obtaining of person trajectory including tracking information and position information as discussed above in claim 16): determine, for at least one person image in the image set corresponding to the at least one person, first position information of a target person in the person image in a video image corresponding to the person image (page 31, lines 22-37 discloses FIG. 16 showing a camera view capturing images from above create two trajectories of two persons, and a rectangle representing a graph of position information in space of the persons represented in vertical axis, y-axis is the time axis; therefore, the graph indicates of position information according to captured person images, the graph shows two lines of position information corresponding to the two trajectories, each dot within each line represents one position point in space of the person within his/her trajectory according to a captured person image at a certain timestamp; the position information of the person in space within an image view can be understood as the first information as claimed); determine a spatial position coordinate of the target person in a spatial coordinate system according to the first position information and second position information, the second position information being position information of an image capture device for capturing the video image corresponding to the person image (col. 6, lines 40-43 discloses camera coordinates are used to indicate world coordinates as discussed in claim 1 above, in the limitation including position information of plurality of cameras; therefore, the position information of each capturing mean is taken into determining a spatial position coordinate of each instance of the shopper according to col. 6 in lines 27-43; camera coordinates can be understood as second position information); obtain a spatio-temporal position coordinate of the target person in a spatio-temporal coordinate system according to the spatial position coordinate and a time for capturing the video image corresponding to the person image (col. 6 in lines 35-44 discloses each instance of the shopper is represented in a spatial coordinate and a timestamp in a spatiotemporal dimension); and obtain the track information of the at least one person in the spatio-temporal coordinate system according to spatio-temporal position coordinates of the plurality of persons (col. 6 in lines 35-44 discloses person trajectories are obtained based on the spatiotemporal coordinates of the instances of the shoppers).
Regarding claim 18, Moon discloses a system for detecting companions, comprising the plurality of image capture devices disposed in different areas and the electronic device according to claim 17, wherein the plurality of image capture devices are configured to capture the video images, and send the video images to the electronic device (col. 6, lines 27-35, discloses image capturing means, and the shopper trajectory module processes the images captured by the image capturing means; this indicates the images are being sent to the device as claimed; FIG. 2 shows the image capturing means 100 are located at different areas).
	Regarding claim 19, Moon discloses the system according to claim 18, wherein the electronic device is integrated in the image capture devices (col. 4, lines 10-13, discloses in one exemplary embodiment, the video cameras can track shoppers and generate trajectories without using any costly and cumbersome devices, which can cover the instances of when the cameras perform the process which means the system is integrated in the cameras as claimed).
Regarding claim 20, Moon discloses a non-transitory computer-readable storage medium having computer program instructions stored thereon (Col. 4, lines 8-14 disclose that Moon’s system processes video frames to detect and track shoppers and to generate trajectories; such processing requires computer-executable software instructions that are necessarily stored on a computer-readable storage medium), wherein the computer program instructions, when executed by a processor (Col. 4, lines 8-14 disclose that Moon’s system processes video frames to detect and track shoppers and to generate trajectories; such processing requires a computer that necessarily includes a processor), is caused to perform the operations of: obtaining video images respectively captured by a plurality of image capture devices deployed in different areas (col. 6, lines 30-33, discloses FIG.2 shows different means for capturing images track the shoppers within each field-of-view and also across different fields of views; therefore, image capturing means can be understood as a plurality of image capturing devices) during a preset time period (FIG. 4 shows y-axis indicates time, all shoppers are detected within the same time period; moreover, col. 6, lines 35-38, discloses both spatial coordinate and timestamps of instances of shoppers are recoded from the image capturing means; therefore, FIG. 2 y-axis shows the time period of detection of instances of shoppers can be understood as a preset time period as claimed); performing a person detection on the video images (col. 7, lines 21-23, discloses a person detection step searches for shoppers individually through frames for tracking), to determine, according to an obtained person detection result, an image set corresponding to at least one person among a plurality of persons (col. 7, lines 23-27, discloses based on the result of person detection, shopper trajectories can be obtained through frames; therefore, the shopper trajectories include the images/image sets corresponding to the shoppers, and each has one trajectory based on FIG. 5; moreover, col. 7, lines 21-22, indicates person detection step searching is done for shoppers individually, which indicates a plurality of shoppers can be detected through frames), the image set including person images (col. 6, lines 31-34, discloses the shopper trajectory estimation is done based on images of each shopper); determining track information of the at least one person (col. 7, lines 21-25, discloses the shopper trajectory includes tracking information of person detection through frames; moreover, FIG. 5 shows the step 438 “person tracking” follows the step 434 “person detection” and the result of shopper trajectories are obtained follows in step 422; therefore shopper trajectory can be understood as tracking information as claimed) according to position information of the plurality of image capture devices, the image set corresponding to the at least one person, and a time for capturing the person images (col. 6, lines 35-39, discloses shopper trajectories includes spatial coordinates and timestamps of instances of the shoppers; furthermore, line 40-43, indicates camera coordinates are used to indicate world coordinates so the trajectories can be consistently estimated across multiple views, which means camera coordinates are to present world coordinates to indicates a space; moreover, FIG.4’s x-axis indicating space of where movements captured; therefore, camera coordinates can be understood as position information of the plurality of image capture devices; moreover, col. 7, lines 35-54, and FIG. 6 showing embodiment of trajectories, discloses a point within the trajectory having coordinate represented in (x, y,) such as, the origin coordinate is (0, 0) with a timestamp in the spatiotemporal coordinate system; furthermore, as discussed previously, the shopper trajectory include the image set and timestamps can be understood as a time for capturing the person images, timestamps constitute the time axis in FIG. 4); and determining companions among the plurality of persons according to the track information of the plurality of persons (FIG. 17 shows steps including step 602 of recognizing shopping groups based on step 615’s shopper trajectories).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over Moon et al. (“US 11,004,093 B1” hereinafter as “Moon”) in view of Nishikawa et al. (“US 2019/0026560 A1” hereinafter as “Nishikawa”).
Regarding claim 6, Moon discloses the method according to claim 5 (as mapped above in conjunction with claim 5). Moon does not explicitly discloses adding the associated person pair to the companion set includes: determining a number of person pairs including a first person in the associated person pairs; and adding the associated person pair to the companion set in a case where the number of person pairs including the first person is less than a number-of-person-pairs threshold.
In the same field of detection of human flow, people moving in association with each other in a space (Nishikawa’s page 1, “Abstract”, lines 1-8), Nishikawa discloses adding the associated person pair to the companion set includes: determining a number of person pairs including a first person in the associated person pairs ([0148], lines 1-7, discloses cases of congested area, determination of association between two-person nodes is processed to extract associated-person nodes; moreover, lines 4-15, discloses extracting of mutually associated person nodes in a predetermined space based on a certain population density of that space, then associated person node is determined between two-person nodes included in the multiple persons; therefore, determining population density of a space is analogous to determining a number of person pairs including a first person because in a predetermined space any person can be paired with one picked person); and adding the associated person pair to the companion set in a case where the number of person pairs including the first person is less than a number-of-person-pairs threshold ([150], lines 6-9, discloses when the population density is less than a threshold value, perform two person nodes/person pair extraction; which covers the instances of processing on a particular two-person node [understood as a companion set of two people] within the multiple persons, find mutually associated two-person nodes to be associated with the current two-person node to extract an associated-person node which is analogous to adding the associated person pair to the companion set as claimed).
Thus, it would have been obvious for a person of ordinary skill in the art before the effective filing date to modify Moon’s method to perform adding associated person pairs into the companion set when the number of associated person pairs/population density, where the pairs including at least one person already in the association/companion set, being less than a threshold as taught by Nishikawa to arrive at the claimed invention discussed above. Such a modification is the result of combing prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to more accurately identify group of companions when certain areas are being congested (Nishikawa’s [0148]).
Regarding claim 7, Moon discloses the method according to claim 4, the step of determining the at least one group of companions according to the plurality of person pairs. Moon does not explicitly disclose determining, in a case where the number of persons included in the group of companions is greater than a first number threshold, at least one person pair having a value of the similarity greater than a second similarity threshold in the plurality of person pairs as a group of companions, such that the number of persons included in the group of companions is less than the first number threshold, the second similarity threshold being greater than the first similarity threshold.
In the same field of detection of human flow, people moving in association with each other in a space (Nishikawa’s page 1, “Abstract”, lines 1-8), Nishikawa discloses determining the at least one group of companions according to the plurality of person pairs, further comprising ([0149], lines 9-13, discloses association is determined between two person nodes based on distances between them [association between persons based on distances between them indicates of a group of companions or a group of association]; which covers the instances when at least one group of companions is determined according to the plurality of person pairs as claimed): determining, in a case where the number of persons included in the group of companions is greater than a first number threshold ([0145], lines 7-11, discloses multiple-person node extraction rather than only extracting two-person nodes, this is done when there is an ambient environment where multiple persons are present within a space; furthermore [0149] discloses for the instances of having multiple-person nodes in a predetermined space, extract two-person nodes from the multiple-person nodes; therefore, this covers the instances of when the number threshold is 3 persons, but when the node has more than two persons [multiple persons] perform further processing therefore, is analogous to the number of persons in a group of companions is greater than a threshold as claimed), at least one person pair having a value of the similarity greater than a second similarity threshold in the plurality of person pairs as a group of companions ([1490] further discloses, for multiple-person nodes in a predetermined space, divide the space into subdivided regions and determine the population density of each subdivided region; moreover, [150], lines 9-13, discloses when the population density is greater than a threshold, the distance threshold between two person nodes is set lower to account for the instances of crowded space, only people with very close proximity are considered to be in association with each other; therefore, by changing distance threshold between people, group score between people for association determination is higher which is analogous to similarity threshold is higher as claimed), such that the number of persons included in the group of companions is less than the first number threshold (when the distance threshold for determining of association is lowered for a particular space, the number of persons included in each association group become lower; this covers instances of when the number of persons included in the group of companions is less than a threshold as claimed; [150] discloses extracting two-person nodes from multiple-person nodes which has number of people less than 3 as the threshold discussed previously), the second similarity threshold being greater than the first similarity threshold (an increase in similarity threshold as discussed previously indicates that the second similarity threshold is greater than the first similarity threshold as claimed when the distance threshold between two person nodes are smaller as discussed previously).
Thus, it would have been obvious for a person of ordinary skill in the art before the effective filing date to modify Moon’s method to perform determining at least one group of companions in a case where the number of persons included in the group of companions is greater than a first number threshold, at least one person pair having a value of the similarity greater than a second similarity threshold in the plurality of person pairs as a group of companions, such that the number of persons included in the group of companions is less than the first number threshold, the second similarity threshold being greater than the first similarity threshold as taught by Nishikawa to arrive at the claimed invention discussed above. Such a modification is the result of combing prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to more accurately identify group of companions when certain areas are being congested (Nishikawa’s [0148]).
Claims 9-14 are rejected under 35 U.S.C. 103 as being unpatentable over Moon et al. (“US 11,004,093 B1” hereinafter as “Moon”) in view of Hu et al. (“US 2020/0380299 A1” hereinafter as “Hu”)
Regarding claim 9, Moon discloses the method according to claim 1, wherein performing person detection on the video images, to determine, according to the obtained person detection result, the image set corresponding to at least one person among a plurality of persons includes: performing the person detection on the video images to obtain person images including detection information. Moon does not explicitly disclose the person detection including at least one of face detection and body detection, wherein in a case where the person detection includes the face detection, the detection information includes face information; and in a case where the person detection includes the body detection, the detection information includes body information; and determining, according to the person images, the image set corresponding to the at least one person among the plurality of persons.
In the same field of person detection on video images (page 1, “Abstract”), Hu discloses performing the person detection on the video images to obtain person images including detection information, the person detection including at least one of face detection and body detection ([0010] discloses the methods for recognizing people across images using face and body characteristics; recognizing people based on face and body characteristics indicates face and body detection; furthermore, [0048], lines 1-3 and FIG. 5 discloses the system of the invention and the capturing lens can capture video images for processing), wherein in a case where the person detection includes the face detection, the detection information includes face information ([0010], discloses recognizing people based on face and body characteristics as discussed previously; for face detection, face characteristic can include face information; moreover, [0044] discloses face characteristic is based on face data); and in a case where the person detection includes the body detection, the detection information includes body information ([0010], discloses recognizing people based on face and body characteristics as discussed previously; for body detection, body characteristic includes body information; moreover, [0044] discloses body characteristic is based on body data); and determining, according to the person images, the image set corresponding to the at least one person among the plurality of persons ([0011] discloses set of images are used to detect people in the images and use clustering method to cluster images relating to a particular individual; clustered images can be understood as the image set as claimed).
Thus, it would have been obvious for a person of ordinary skill in the art before the effective filing date to modify Moon’s method to perform person detection based on detection information including face and body information on video images as taught by Hu to arrive at the claimed invention discussed above. Such a modification is the result of combing prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to improve person identification based on face and body characteristics in image data across captured images within a common time and/or location (Hu’s [0023], lines 1-10).
Regarding claim 10, Moon as modified by Hu discloses the method according to claim 9, Hu discloses wherein determining, according to the person images, the image set corresponding to the at least one person among the plurality of persons includes ([0011] discloses set of images are used to detect people in the images and use clustering method to cluster images relating to a particular individual as discussed above in claim 9): clustering the person images including the face information to obtain a face clustering result, the face clustering result including at least one face identity for the person images including the face information (FIG. 3 shows the steps of obtaining a representative face vector which is used for clustering image sets based on face characteristics into clusters; moreover, [0011] discloses clusters are used to identify people based on images of people according to the obtained face vector; furthermore [0033], lines 1-5, discloses in more details of the process using representative vector, each cluster for the sets of images is found based on face characteristics determined according to the face vector; therefore, each cluster based on face characteristic can be understood as face clustering result, face characteristic can be understood as face identity based on face data which is face information as claimed); clustering the person images including the body information to obtain a body clustering result, the body clustering result including at least one body identity for the person images including the body information ([0011] discloses body vector can be used for recognizing people even in images where their faces are occluded, body vector is used for clustering which can be understood as including the steps of face vector used for clustering as discussed previously where clusters found based on body vector can be understood as body clustering result, body characteristic can be understood as body identity and body data can be understood as body information as claimed; moreover, FIG. 2 shows the steps of obtaining a representative vector based on both face and body; and then the representative vector is used to cluster the set of images based on face and body characteristics to identify people according to FIG. 4; therefore for this approach, the cluster is based on both face and body characteristics); and determining, according to the face clustering result and the body clustering result, the image set corresponding to the at least one person among the plurality of persons (FIG. 4 shows the steps of obtaining a vector representation 406 based on both face and body characteristics [hereinafter called “face and body representative vector”] and FIG. 2 shows the clustering steps based on the face and body representative vector; this indicates identifying of people can be done based one face alone or body or based on both body and face). 
Thus, it would have been obvious for a person of ordinary skill in the art before the effective filing date to modify Moon’s method to perform person detection based on detection information including face and body information, specifically by clustering images based on face information to obtain face identity and clustering images based on body information to obtain body identity, on video images as taught by Hu to arrive at the claimed invention discussed above. Such a modification is the result of combing prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to improve person identification based on face and body characteristics in image data across captured images within a common time and/or location (Hu’s [0023], lines 1-10).
Regarding claim 11, Moon as modified by Hu discloses the method according to claim 10, Hu discloses wherein determining, according to the face clustering result and the body clustering result, the image set corresponding to the at least one person among the plurality of persons includes ([0044] discloses the model library may include an identity associated with face clusters and/or body clusters): determining corresponding relationships between face identities and body identities in at least one person image including the face information and the body information ([0044] discloses finding of an identity based on face clusters and body clusters, an identity can be understood as a relationship between face identity and body identity); and obtaining, according to a first corresponding relationship in the corresponding relationships, person images including the face information and/or the body information in the first corresponding relationship from the person images to form an image set corresponding to one person ([0044] discloses the identity/identifying information is for a person, and the person identified based on face and body characteristics through clusters of image data corresponding to the face vector and body vector, the clustered image data for the particular person can be understood as the person images to form an image set corresponding to one person as claimed). 
Thus, it would have been obvious for a person of ordinary skill in the art before the effective filing date to modify Moon’s method to perform person detection based on detection information including face and body information, specifically by clustering images based on face information to obtain face identity and clustering images based on body information to obtain body identity and by determining a relationship between face identity and body identity to obtain an image set corresponding to one person, on video images as taught by Hu to arrive at the claimed invention discussed above. Such a modification is the result of combing prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to improve person identification based on face and body characteristics in image data across captured images within a common time and/or location (Hu’s [0023], lines 1-10).
Regarding claim 12, Moon as modified by Hu discloses the method according to claim 11, Hu discloses the method according to claim 11, wherein determining corresponding relationships between face identities and body identities in at least one person image including the face information and the body information includes ([0044] discloses obtaining an identity based on face clusters and body clusters as discussed above in claim 11): obtaining face identities corresponding to the face information and body identities corresponding to the body information in the person images including the face information and the body information ([0019] discloses in one or more embodiments of a model library, the model may include a body embedding space for images of people are embedded based on characteristics of a body as well as a face [paragraph 0023 in lines 12-14 and 00026 in lines 12-14 discloses face characteristics can be excluded in the body embedding space to create the body vector and use body characteristics only], and a face specific embedding space for images of people based on face characteristics, and the identities of people are identified using the models; therefore, identities based on face embedding space can be understood as face identities and identities based on body embedding space can be understood as body identities as claimed); grouping the person images including the face information and the body information according to body identities to which the person images correspond ([0019] discloses the media management model to categorize images as being associated with identities of people identified using the models based on face embedding space where face characteristics alone is used and/or body embedding space where characteristic of body and face is used; categorizing images can be understood as grouping images as claimed), to obtain at least one body image group, person images in the same body image group having the same body identity ([0019] discloses based on categorized images based on face and body embedding spaces can result in a body network to be utilized in later steps such as [0022] discloses a face network and a body network are used to track people in images based on face characteristics and/or body characteristics according to which vector being used for the tracking such as, using face vector and/or body vector obtained using the networks; in case of using body vector a tracking of people based on body is obtained where the image group is obtained based on the particular body vector; moreover, in a different embodiment, [0023] discloses the media management module may generate a representative vector based on the body network and the face network [if face information is not already considered in body network], the image group is obtained based on clustering algorithm using the face and body representative vector, therefore, the image group obtained using the face and body representative vector can also be understood to indicate an image group obtained using body information therefore having the same body identity as claimed); and determining, for a first body image group in the body image groups, face identities respectively corresponding to at least one person image in the first body image group ([0023] further discloses the image group obtained using the face and body representative vector can be performed a cross-referencing step using only the face data which means for each cluster obtained using the face and body representative vector, and then various face vectors are used to identify common clusters across; therefore, each identified face cluster can be understood to indicate a face identity obtained based on the various face vectors within the cluster of body image group obtained based on face and body representative vector [as discussed previously]), and determining, according to the number of person images corresponding to at least one face identity in the first body image group ([0031] discloses when the clusters are identified using the face and body representative vector, face characteristic is then used to identify common individuals across the clusters also shown in FIG. 3, the identified individuals based on face characteristics, in the clusters obtained using face and body representative vector, can be understood a face identities in the obtained body image group as claimed; since a cluster indicates at least two images, the determination is based on the number of images exceeding two), corresponding relationships between face identities and body identities in the person images in the first body image group ([0031] discloses identifying common individuals across distinct clusters using face characteristics, the distinct clusters are obtained using face and body representative vector to obtain image groups based on face characteristics as discussed previously, therefore, this indicates each individual/person identified is a person identity as discussed above in claim 11, obtained identities represent corresponding relationships between face identities and a body identities as claimed).
Thus, it would have been obvious for a person of ordinary skill in the art before the effective filing date to modify Moon’s method to perform person detection based on detection information including face and body information, specifically by obtaining images based on body identity and from these images obtain images based on face identity to result in an image set based on corresponding relationship between face and body identities, on video images as taught by Hu to arrive at the claimed invention discussed above. Such a modification is the result of combing prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to improve person identification based on face and body characteristics in image data across captured images within a common time and/or location (Hu’s [0023], lines 1-10).

Regarding claim 13, Moon as modified by Hu discloses the method according to claim 11, Hu discloses wherein determining corresponding relationships between face identities and body identities in at least one person image including the face information and the body information includes ([0044] discloses obtaining an identity based on face clusters and body clusters as discussed above in claim 11): obtaining face identities corresponding to the face information and body identities corresponding to the body information in the person images including the face information and the body information ([0019] discloses in one or more embodiments of a model library, the model may include a body embedding space for images of people are embedded based on characteristics of a body as well as a face [paragraph 0023 in lines 12-14 and 00026 in lines 12-14 discloses face characteristics can be excluded in the body embedding space to create the body vector and use body characteristics only], and a face specific embedding space for images of people based on face characteristics, and the identities of people are identified using the models; therefore, identities based on face embedding space can be understood as face identities and identities based on body embedding space can be understood as body identities as claimed); grouping the person images including the face information and the body information according to face identities to which the person images correspond ([0019] discloses the media management model to categorize images as being associated with identities of people identified using the models based on face embedding space where face characteristics alone is used and/or body embedding space where characteristic of body and face is used; categorizing images can be understood as grouping images as claimed), to obtain at least one face image group, person images in the same face image group having the same face identity ([0019] discloses based on categorized images based on face and body embedding spaces can result in a body network to be utilized in later steps such as [0022] discloses a face network and a body network are used to track people in images based on face characteristics and/or body characteristics according to which vector being used for the tracking such as, using face vector and/or body vector obtained using the networks; in case of using body vector a tracking of people based on body is obtained where the image group is obtained based on the particular body vector; moreover, in another embodiment, [0043] discloses the media management module obtains a vector representative of the detected person based on the face and body characteristics [hereinafter called “face and body representative vector”] in the image frames, also shown in FIG. 4; the obtained images based on the face and body representative vector can be understood as the face image group as claimed); and determining, for a first face image group in the face image groups, body identities respectively corresponding to at least one person image in the first face image group ([0044] discloses in FIG. 4 block 408, the media management module identifies the person based on the vector representation of the body characteristic embedding space following block 406 where the image frame group is already obtained using the face and body representative vector; therefore, this indicates that the identifying of person based on body vector is performed on the image frame group obtained from the face and body representative vector according to blocks 406, 408; the identified person using the body vector can be understood as the body identity as claimed), and determining, according to the number of person images corresponding to at least one body identity in the first face image group, corresponding relationships between face identities and body identities in the person images in the first face image group ([0044] and FIG. 4 as discussed previously show the identified person can be understood as a relationship between face identity and body identity according to discussion above in claim 11 where an identity is understood as a relationship and the cluster identified based on the body vector can be understood as the image group; since a cluster indicates at least two images, the determination is based on the number of images being two or more).
Thus, it would have been obvious for a person of ordinary skill in the art before the effective filing date to modify Moon’s method to perform person detection based on detection information including face and body information, specifically by obtaining images based on face identity and from these images obtain images based on body identity to result in an image set based on corresponding relationship between face and body identities, on video images as taught by Hu to arrive at the claimed invention discussed above. Such a modification is the result of combing prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to improve person identification based on face and body characteristics in image data across captured images within a common time and/or location (Hu’s [0023], lines 1-10).
Regarding claim 14, Moon as modified by Hu discloses the method according to claim 11, Hu discloses wherein determining, according to the face clustering result and the body clustering result, the image set corresponding to the at least one person among the plurality of persons includes ([0044] discloses clustering images based on face and body characteristics as discussed above in claim 11; moreover, [0012] discloses a second set of images can be captured for the same space of a second moment [at a different time interval], and based on face and body characteristics clustering algorithm identifies individuals in the second image set; however, the individuals identified in this manner may not be identified across moments [if the same individuals also appears in the images captured during the first moment] because the images from the two distinct moments have not been considered together): determining, for person images including the face information and not belonging to the image set, an image set corresponding to at least one person according to face identities of the person images ([0013] discloses identifying people in images across moments where a face data is determined based on a centroid of first and second embedding spaces [embedding spaces are to store face and body characteristics and their corresponding images] to form a third embedding space based on the face data only, then based on the clusters formed within the third embedding space; therefore, this covers instances of when using face and body characteristics together to identify persons create clusters within each of certain capturing moments, however, there may be some individuals are not identified within the each of the certain capturing moments as discussed previously [paragraph 0012 in lines 12-16]; therefore, by combing the capturing moments together and use solely face data, individuals sharing the same face data can be identified across moments; the clusters of identified individuals across moments based on face data can be understood as an image set corresponding to at least one person according to face identities and not belonging to the image set obtained based on face and body clustering results as claimed).
Thus, it would have been obvious for a person of ordinary skill in the art before the effective filing date to modify Moon’s method to perform person detection based on detection information including face and body information, by determining, for person images including the face information and not belonging to the image set, an image set corresponding to at least one person according to face identities of the person images, on video images as taught by Hu to arrive at the claimed invention discussed above. Such a modification is the result of combing prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to identify people across different distinct moments based on face identify (Hu’s [0013], lines 1-4).
Allowable Subject Matter
Claim 8 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
US-20200327353-A1: cropping image based on region of detected objects
US-8295597-B1: segmenting people in a physical space based on automatic behavior analysis
US-9740977-B1: determining shopper intention in a retail store based on trajectories
Henri Bouma et al., WPSS: watching people security services
Ismail Haritaoglu and Myron Flickne, Detection and Tracking of Shopping Groups in Stores.
Nurul Japar, Chee Seng Chan, Coherent Crowd Analysis in Still Image, 2019, Centre of Image & Signal Processing: Hausdoff clustering distance method and agglomerative clustering method
Weina Ge, Robert T. Collins, Barry Ruback, Automatically Detecting the Small Group Structure of a Crowd
Weina Ge, Robert T. Collins, Senior Member, IEEE, and R. Barry Ruback, Vision-Based Analysis of Small Groups in Pedestrian Crowds

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHUONG HAU CAI whose telephone number is (571)272-9424. The examiner can normally be reached M-F 8:30 am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire X. Wang can be reached on (571) 270-1051 or the examiner’s primary examiner reviewer Sean Conner at (571) 272-1486. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/PHUONG HAU CAI/               Examiner, Art Unit 2663                                                                                                                                                                                         
/SEAN M CONNER/Primary Examiner, Art Unit 2663