DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claims 1-20 are pending.


Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

Examiner’s notes: the corresponding text descriptions of any figure(s)  and table(s) cited from the prior art are incorporated herein for further details associated with the examiner’s review comments on the corresponding claims below.

Claim(s) 1, 5, 13, 17 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tang et al (Multiple People Tracking, 2017) in view of Meier et al (US2020/0065653).

Regarding claims 1, 13 and 20, Tang teaches 
receiving a source graph and a target graph, 
(Tang, Fig. 3(a), two input images, one may be called source image and the other one may be called target image)
	Tang does not expressly disclose but Meier teaches:
	the source graph being representative of a source map and the target graph being representative of a target map, and comprising nodes and edges that connect the nodes;
(Meier, Fig. 2; human skeleton input 105 as shown; input 105 is made of nodes and the edges connecting between nodes; the skeletonized input image is a body map representing a human; the skeletonized input may be applied to the two input images of Tang (Fig. 3(a) for identifying the topological or pose similarity between them with better efficiency and accuracy)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the teachings of Meier into the system or method of Tang in order to identify topological or pose similarity between two input images with better efficiency and accuracy using skeletonized inputs. The combination of Tang and Meier also teaches other enhanced capabilities.
	The combination of Tang and Meier further teaches:
processing each of the source graph and the target graph in a convolutional layer to provide convolutional layer outputs related to the source graph and the target graph;
processing each of the convolutional layer outputs for the source graph and the target graph in a linear rectifying layer to output node feature maps related to the source graph and the target graph, the node feature maps comprising data representative of characteristic features of each node;
(Tang, Fig. 3, “Red rectangles indicate the convolutional, ReLU and pooling layers of VGG16”; “the features FC6(xi) and FC6(xj) from a pair of images are extracted from the ﬁrst fully-connected layer of the VGG-based Siamese network that shares the weights”,  p3704/c2; “a ReLU non-linearity”, p3705/c1; note that a typical linear ReLu is represented by a first section y = 0 for x<x1 and a second section y = a*(x-x1) for x>= x1; ReLu (Rectified Linear Unit) is called because of the second linear section even though the overall ReLu function is a two-section nonlinear function)
selecting pairs of node representations from the node feature maps related to the source graph and the target graph, and
(Tang, Fig. 3(a), SiameseNet; each branch of the SiameseNet extracts body feature nodes from each input image (e.g., human skeleton map of Meier); one of the input image, xi, may be called the source input and the other, xj, may be called the target input; Fig. 3(d), the feature nodes extracted from the input images represent the human body parts, e.g., head; one head node is from the input source image and the other from the input target image, “More specifically, the features FC6(xi) and FC6(xj) from a pair of images are extracted from the first fully-connected layer of the VGG-based Siamese network that shares the weights”, p3704, c2; SiameseNet (Fig. 3(a)) and StackNet (Fig. 3(b)) are similar except that “The StackNet allows a pair of images to communicate at the early stage of the network”, p3705, c1; so the extracted feature nodes shown below from the StackNet should be the same for the SiameseNet.

    PNG
    media_image1.png
    375
    215
    media_image1.png
    Greyscale

Fig. 3(d), portion of the figure
)
aggregating the selected pairs to output selected and aggregated pairs of node representations;
(Tang, Fig. 3; while SiameseNet (Fig. 3(a)) and StackNet (Fig. 3(b) can extract body nodes such as head nodes from input images xi and xj (Fig. 3(d)), they cannot identify these two head nodes are of the same person, “The StackNet allows a pair of images to communicate at the early stage of the network, but it is still limited by the lack of ability to incorporate body part correspondence between the images”, p3705, c1; Tang introduces another layer at the input of StackNet, called body part score maps to form a StackNetPose (Fig. 3(c)); this layer can identify if the two head nodes are of the same person based on the movement track, Fig. 2, “At the same time, this decision has to be certified a posteriori by a track connecting the two”, p3703, c1; with this body part score map layer, the StackNet can take advantage of its internal communications between the parallel source and target neural networks to identify if the two head nodes belong to the same person, i.e., label the two head nodes as “same person” or “not the same person”; indeed, labeling the two heads as the same person is a process of node aggregation which outputs the two head node as “same person”; “A desirable property of the network is to localize the corresponding regions of the body parts, and to reason about the similarity of a pair of pedestrian images based on the localized regions and the full images”, p3705, c1; “Note that augmenting the network with body layout information can be interpreted as an attention mechanism that allows us to focus on the relevant part on the input image. It can also be seen as a mechanism to highlight the foreground and to enable the network to establish corresponding regions between input images”, p3705, c2)
processing the selected and aggregated pairs of node representations in a fully connected layer to provide a fully connected layer output;
(Tang, Fig. 3; “Then the features are concatenated and transformed by two fully-connected layers (FC7,FC8)”, p3704/c2 - p3705/c1)
softmax processing the fully connected layer output to output a probability of matching of nodes in the node feature maps related to the source graph and the target graph; and
(Tang, Fig. 3, “FC8 uses a softmax function to produce a probability estimation over a binary decision, namely the same identity or different identities”, p3705/c1)
determining, based on the probability of matching of nodes, whether to fuse nodes in the source map with a corresponding node in the target graph.
(Tang, Fig. 3; with the body part score map layer which establishes corresponding local features between the input images, the StackNet can take advantage of its internal communications between the parallel source and target neural networks to identify if the two head nodes belong to the same person, i.e., label the two head nodes as “same person” or “not the same person”; p3705; section 3.2, “Fusing Body Part Information”)

Regarding claims 5 and 17, the combination of Tang and Meier teaches its/their respective base claim(s).
The combination further teaches the method of claim 1, wherein the convolutional layers comprise weights and the weights are self-learning.
(Meier, Figs. 1-2; “backpropagate the error through the encoder-decoder neural networks and perform weight updates”, [0027])

Claim(s) 2-4, 6-7, 14-16 and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tang et al (Multiple People Tracking, 2017) in view of Meier et al (US2020/0065653) and further in view of Thakur (Step by step VGG16 implementation, Aug 6, 2019).

Regarding claims 2 and 14, the combination of Tang and Meier teaches its/their respective base claim(s).
The combination does not expressly disclose but Thakur teaches the method of claim 1, wherein each node of the node feature maps is represented by a node feature vector, the node feature vector comprising the data representative of characteristic features of each node.
(Thakur, figure, p1, 1x4096 fully connected layer; the fully-connected layer of VGG16 outputs a flatten feature vector with size of 4096 (i.e., 4096 characteristic features), p5; Meier, Fig. 2; human skeleton input 105 as shown; input 105 is made of nodes and the edges connecting between nodes; both nodes and the corresponding edges may represent features extractable by a convolutional neural network; also, Tang, Fig. 3, “Red rectangles indicate the convolutional, ReLU and pooling layers of VGG16”; “Our basic CNN architecture is VGG-16 Net”, p3704/c2; each branch VGG16 of a SiameseNet (Tang, Fig. 3(a)) can process nodes and/or edges in each of the two input images; each node or edge represents a feature corresponding to an output feature vector)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the teachings of Thakur into the modified system or method of Tang and Meier in order to use a large scale output feature vector to represent rich information of each node or edge in the input images. The combination of Tang, Meier and Thakur also teaches other enhanced capabilities.

Regarding claims 3 and 15, the combination of Tang and Meier teaches its/their respective base claim(s).
The combination of Tang, Meier and Thakur teaches the method of claim 1 further comprising at least one additional convolutional layer and at least one additional linear rectifying layer subsequent to the convolutional layer and the linear rectifying layer for processing each of the source graph and the target graph to output node feature maps related to the source graph and the target graph.
(Thakur, figure, p1; multiple convolution + ReLU layers; Tang, Fig. 3(a); two branches of VGG16 in the SiameseNet)

Regarding claims 4 and 16, the combination of Tang and Meier teaches its/their respective base claim(s).
The combination of Tang, Meier and Thakur teaches the method of claim 1 further comprising at least one additional fully connected layer subsequent to the fully connected layer for processing the selected and aggregated pairs of node representations in a fully connected layer to provide a fully connected layer output.
(Thakur, figure, p1; multiple convolution + ReLU layers; Tang, Fig. 3(a); two branches of VGG16 in the SiameseNet)

Regarding claims 6 and 18, the combination of Tang and Meier teaches its/their respective base claim(s).
The combination of Tang, Meier and Thakur teaches the method of claim 5, wherein the weights are trained end-to-end using labeled data as training data.
(Meier, Figs. 1-2; “To train the duel network of the first and second encoder-decoder neural networks the device adjusts parameters, e.g., node weights, of the first and second encoder-decoder neural networks based on several comparisons”, [0006]; Thakur, figure, p3; “object of ImageDataGenerator for both training and testing data”, p2; “The ImageDataGenerator will automatically label all the data inside cat folder as cat and vis-à-vis for dog folder”, p3)

Regarding claims 7 and 19, the combination of Tang, Meier and Thakur teaches its/their respective base claim(s).
The combination further teaches the method of claim 6, wherein, in response to matchings being underrepresented in the training data, the matchings are oversampled.
(Note the 112(b) rejection to claims 7 and 19. It’s a commonly understandable that when the shape of a region formed by nodes is not sufficiently representative to a target shape, the matching result of these two shapes would be poor especially at low image resolutions; a common practice to make these two shapes potentially identifiable would be to increase the image resolution of the shape for matching; e.g., in Fig. 3(d) of Tang (see the annotated figure in the claim 1 review comments above): the 3rd and 4th columns are two input images (upper) represented by two node maps (lower) at high image resolutions; it is seen from the red annotation that the two green nodes on the right are successfully classified as the same as the green node on the left to be the same body part (left leg, classification color = green) due to high image resolution (or “oversampled” matching))

Claim(s) 8 and 10-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tang et al (Multiple People Tracking, 2017) in view of Meier et al (US2020/0065653) and further in view of Perone et al (US2021/0089571).

Regarding claim 8, the combination of Tang and Meier teaches its/their respective base claim(s).
The combination does not expressly disclose but Perone teaches the method of claim 1 further comprising long short-term memory layers for processing each of the source graph and the target graph to output edge feature maps related to the source graph and the target graph, the edge feature maps comprising data representative of characteristic features of each edge.
(Perone, “One or more layers of the CNN-LSTM encoder may work to build a feature space, and encode k-dimensional feature vectors 132. An initial layer may learn first order features, e.g. color, edges etc”, [0021]; the VGG16 CNN of Tang (Fig. 3(a)) may use CNN-LSTM hardware for high speed object (nodes or edges) classification and comparison)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the teachings of Perone into the modified system or method of Tang and Meier in order to use LSTM based CNN for high speed object (nodes or edges) classification and comparison. The combination of Tang, Meier and Perone also teaches other enhanced capabilities.

Regarding claim 10, the combination of Tang, Meier and Perone teaches its/their respective base claim(s).
The combination further teaches the method of claim 8, wherein edge features are learned in the long short-term memory layers from an underlying geometry in the source graph and the target graph.
(Perone, “One or more layers of the CNN-LSTM encoder may work to build a feature space, and encode k-dimensional feature vectors 132. An initial layer may learn first order features, e.g. color, edges etc”, [0021]; the VGG16 CNN of Tang (Fig. 3(a)) may use CNN-LSTM hardware for high speed object (nodes or edges) classification and comparison)

Regarding claim 11, the combination of Tang, Meier and Perone teaches its/their respective base claim(s).
The combination of Tang, Meier and Thakur teaches the method of claim 10, wherein learning the edge features is based one or more sequences of support points from one node to another node.
(Meier, Fig. 2; human skeleton input 105 as shown; input 105 is made of nodes and the edges connecting between nodes; both nodes and the corresponding edges may represent features extractable by a convolutional neural network; also, Tang, Fig. 3, “Red rectangles indicate the convolutional, ReLU and pooling layers of VGG16”; “Our basic CNN architecture is VGG-16 Net”, p3704/c2; each branch VGG16 of a SiameseNet (Tang, Fig. 3(a)) can process nodes and/or edges in each of the two input images; each node or edge represents a feature corresponding to an output feature vector; learning just one edge of two nodes is the simplest example in one sequence; obviously, edges are learned one by one in a multi-edge image)

Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tang et al (Multiple People Tracking, 2017) in view of Meier et al (US2020/0065653) and further in view of Perone et al (US2021/0089571) and Thakur (Step by step VGG16 implementation, Aug 6, 2019).

Regarding claim 9, the combination of Tang, Meier and Perone teaches its/their respective base claim(s).
The combination of Tang, Meier, Perone and Thakur teaches the method of claim 8, wherein each edge of the edge feature maps is represented by an edge feature vector, the edge feature vector comprising the data representative of characteristic features of each edge.
(Thakur, figure, p1, 1x4096 fully connected layer; the fully-connected layer of VGG16 outputs a flatten feature vector with size of 4096 (i.e., 4096 characteristic features), p5; Meier, Fig. 2; human skeleton input 105 as shown; input 105 is made of nodes and the edges connecting between nodes; both nodes and the corresponding edges may represent features extractable by a convolutional neural network; also, Tang, Fig. 3, “Red rectangles indicate the convolutional, ReLU and pooling layers of VGG16”; “Our basic CNN architecture is VGG-16 Net”, p3704/c2; each branch VGG16 of a SiameseNet (Tang, Fig. 3(a)) can process nodes and/or edges in each of the two input images; each node or edge represents a feature corresponding to an output feature vector)


Allowable Subject Matter
Claim(s) 12 is/are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening Claim(s).
The following is a statement of reasons for the indication of allowable subject matter:

Claim(s) 12 recite(s) limitation(s) related to a particular hierarchical image processing technique. There are no explicit teachings to the above limitation(s) found in the prior art cited in the rejection to its/their base claim(s).


Response to Arguments
Applicant's arguments filed on 7/5/2022 with respect to one or more of the pending claims have been fully considered but they are not persuasive.

Regarding claim(s) 1, Applicant, in pages 7-10 of the remarks, argues that the combination of the cited reference(s) fails to teach “selecting pairs of node representations from the node feature maps related to the source graph and the target graph, and aggregating the selected pairs to output selected and aggregated pairs of node representations; processing the selected and aggregated pairs of node representations in a fully connected layer to provide a fully connected layer output” as recited in claim 1. 
The Examiner respectfully disagreed. The office action has been updated to address applicant’s argument. See the updated review comments for details.


Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIANXUN (JAMES) YANG whose telephone number is (571)272-9874. The examiner can normally be reached on MON-FRI: 8AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nay Maung can be reached on (571)272-7882. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/JIANXUN YANG/Primary Examiner, Art Unit 2664                                                                                                                                                                                                        
8/4/2022