DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
	The information disclosure statements, filed 10 March 2020 and 6 February 2021, complies with the provisions of 37 CFR 1.97, 1.98. It has been placed in the application file, and the information referred to therein has been considered as to the merits1.  An initialed and dated copy of Applicant’s IDS forms 1449- Paper No 20200310 and 20210206, are attached to the instant Office action.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-2, 12-13, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Tuzel et al. (US 2016/0055237 A1).
a.	Regarding claim 1, Tuzel discloses a semantic segmentation model training method, comprising:
performing, by a semantic segmentation model, image semantic segmentation on at least one unlabeled image to obtain a preliminary semantic segmentation result as a category of the at least one unlabeled image (Tuzel discloses that “A local feature 411 is obtained by extracting 410 pixel features from pixels in local regions in the image. A local semantic feature 416 is obtained by semantically mapping 415 local features to a semantic space” at Figs. 4-411, 415, 416 and ¶0055); 
obtaining, by a convolutional neural network based on the category of the at least one unlabeled image and a category of at least one labeled image, sub-images respectively corresponding to at least two images and features corresponding to the sub-images, wherein the at least two images comprise the at least one unlabeled image and the at least one labeled image, and the at least two sub-images carry the categories of the corresponding images (Tuzel discloses “The combiner network F_com recursively combines the semantic feature of two child nodes (superpixels) to obtain the semantic feature for a parent node … the local semantic features are combined 420 recursively to form intermediate segments until a semantic feature for the entire image 421 is obtained” at Fig. 3B, Figs. 4-420, 421 and ¶¶0047-0048 and 0055); and 
training the semantic segmentation model on the basis of the categories of the at least two sub-images and feature distances between the at least two sub-images (Tuzel discloses that “[t]he rCPN 350 and F_CNN 310 could be jointly trained using training images. However, the recursions makes the depth of the networks too deep to perform joint training efficient. Therefore, we first learn parameters .theta_CNN for F_CNN 310 using the input image and ground truth segmentation labels. After the F_CNN is trained, we obtain local features and train the parameters …” at ¶¶0060-0065).
b.	Regarding claim 2, Tuzel discloses wherein the training the semantic segmentation model on the basis of the categories of the at least two sub-images and feature
distances between the at least two sub-images comprises:
establishing a patch graph according to category relations between the  sub-images, wherein the patch graph comprises nodes and an edge, wherein the nodes comprise the sub-images, and the edge comprises a feature distance between any two sub-images (Tuzel discloses that “[a] parse tree 200 of the acquired image 100 is used for recursive context propagation. Nodes in the parse tree represent semantic features of the segments of the acquired image. Hence, combining 250 and decombining 260 segments corresponds to combining and decombining information about semantic features of the segments. Local semantic features 201, 202, 203 are recursively combined to form intermediate segments 210 until a semantic feature is obtained for the entire image 220. The semantic feature for the entire image 220 is decombined to form enhanced semantic features of the intermediate segments 230 until enhanced semantic features for all the local regions 241, 242, 243 are obtained. Then, the local segments can be labeled using their enhanced semantic features. The enhanced semantic features contain both local information and context information from the entire image” at Fig. 2 and ¶¶0019 and 0042-0044); and 
training the semantic segmentation model to enable the feature distance
between two sub-images of a same category in the patch graph to be lower than a first
preset value, and the feature distance between two sub-images of different categories to be greater than a second preset value (Tuzel discloses that “[t]he rCPN 350 and F_CNN 310 could be jointly trained using training images. However, the recursions makes the depth of the networks too deep to perform joint training efficient. Therefore, we first learn parameters .theta_CNN for F_CNN 310 using the input image and ground truth segmentation labels. After the F_CNN is trained, we obtain local features and train the parameters …” at ¶¶0060-0065).
b.	Regarding claims 12-13, claims 12-13 are analogous and correspond to claims 1-2, respectively. See rejection of claims 1-2 for further explanation.

c.	Regarding claim 20, claim 20 is analogous and correspond to claim 1. See rejection of claim 1 for further explanation.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3-4, 6, 14-15, 17 are rejected under 35 U.S.C. 103 as being unpatentable over Tuzel et al. (US 2016/0055237 A1) in view of Philbin et al. (US 2016/0180151 A1).
a.	Regarding claim 3, Tuzel discloses wherein the establishing a patch graph according to category relations between the sub-images comprises: 
selecting at least one sub-image as a reference node, and for at least one reference node (Tuzel discloses local semantic features, that can be nodes connected in a parse tree at Fig. 2 and ¶¶0019-0020); and 
forming a sparse connectivity patch graph according to the at least one reference node, the positively correlated node of the reference node (Tuzel discloses that “[a] parse tree 200 of the acquired image 100 is used for recursive context propagation. Nodes in the parse tree represent semantic features of the segments of the acquired image. Hence, combining 250 and decombining 260 segments corresponds to combining and decombining information about semantic features of the segments. Local semantic features 201, 202, 203 are recursively combined to form intermediate segments 210 until a semantic feature is obtained for the entire image 220. The semantic feature for the entire image 220 is decombined to form enhanced semantic features of the intermediate segments 230 until enhanced semantic features for all the local regions 241, 242, 243 are obtained. Then, the local segments can be labeled using their enhanced semantic features. The enhanced semantic features contain both local information and context information from the entire image” at Fig. 2 and ¶¶0019 and 0042-0044).
However, Tuzel does not disclose using a sub-image of a same category as the reference node as a positively correlated node, using a sub-image of a different category from the reference node as a negatively correlated node, and separately establishing a positive correlation connection between the reference node and at least one positively correlated node, and a negative correlation connection between the reference node and at least one negatively correlated node.
Philbin discloses using a sub-image of a same category as the reference node as a positively correlated node, using a sub-image of a different category from the reference node as a negatively correlated node, and separately establishing a positive correlation connection between the reference node and at least one positively correlated node, and a negative correlation connection between the reference node and at least one negatively correlated node (Philbin discloses that “[t]he system processes the positive image in the triplet using the neural network in accordance with the current values of the parameters of the neural network to generate a numeric embedding of the positive image … [and] the negative image in the triplet using the neural network in accordance with the current values of the parameters of the neural network to generate a numeric embedding of the negative image” at Figs. 3-304 and 306 and ¶¶0039-0040).
Before the time of the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to utilize the positive image and the negative image of Philbin to Tuzel’s parse tree. 
The suggestion/motivation would have been to “decrease training time and improve the accuracy of the embeddings generated by the trained neural network” (Philbin; ¶0018).
b.	Regarding claim 4, Tuzel discloses all the previous claim limitation including wherein the training the semantic segmentation model (Tuzel discloses that “A local feature 411 is obtained by extracting 410 pixel features from pixels in local regions in the image. A local semantic feature 416 is obtained by semantically mapping 415 local features to a semantic space” at Figs. 4-411, 415, 416 and ¶0055). 
However, Tuzel does not explicitly disclose training the semantic segmentation model by a gradient back propagation algorithm, so as to minimize an error of the convolutional neural network, wherein the error is a triplet loss of the features of the corresponding sub-images obtained based on the convolutional neural network.
Philbin discloses training the semantic segmentation model by a gradient back propagation algorithm, so as to minimize an error of the convolutional neural network, wherein the error is a triplet loss of the features of the corresponding sub-images obtained based on the convolutional neural network (Philbin discloses that “The system adjusts the current values of the parameters of the neural network using the triplet loss (step 310). That is, the system adjusts the current values of the parameters of the neural network to minimize the triplet loss. The system can adjust the current values of the parameters of the neural network using conventional neural network training techniques, e.g., stochastic gradient descent with backpropagation” at Fig. 3-310 and ¶0043).
Before the time of the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to utilize the process of adjust the neural network of Philbin to Tuzel’s extracting local feature for the semantic map generation.
The suggestion/motivation would have been to “decrease training time and improve the accuracy of the embeddings generated by the trained neural network” (Philbin; ¶0018).
c.	Regarding claim 6, the combination applied in claim 4 discloses wherein the training the semantic segmentation model comprises: 
obtaining parameters of the convolutional neural network based on a training result of the convolutional neural network (Philbin discloses that “The system adjusts the current values of the parameters of the neural network using the triplet loss (step 310). That is, the system adjusts the current values of the parameters of the neural network to minimize the triplet loss. The system can adjust the current values of the parameters of the neural network using conventional neural network training techniques, e.g., stochastic gradient descent with backpropagation” at Fig. 3-310 and ¶0043); and 
initializing parameters in the semantic segmentation model based on the obtained parameters of the convolutional neural network (Philbin discloses that “The system adjusts the current values of the parameters of the neural network using the triplet loss (step 310). That is, the system adjusts the current values of the parameters of the neural network to minimize the triplet loss. The system can adjust the current values of the parameters of the neural network using conventional neural network training techniques, e.g., stochastic gradient descent with backpropagation” at Fig. 3-310 and ¶0043). 
d.	Regarding claims 14-15 and 17, claims 14-15 and 17 are analogous and correspond to claims 3-4 and 6, respectively. See rejection of claims 3-4 and 6 for further explanation.

Claims 7-11 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Tuzel et al. (US 2016/0055237 A1) in view of Seow et al. (US 2014/0003713 A1).
a.	Regarding claim 7, Tuzel discloses, wherein the obtaining, by a convolutional neural network based on the category of the at least one unlabeled image and a category of at least one labeled image, sub-images respectively corresponding to at least two images and features corresponding to the sub-images (Tuzel discloses “The combiner network F_com recursively combines the semantic feature of two child nodes (superpixels) to obtain the semantic feature for a parent node … the local semantic features are combined 420 recursively to form intermediate segments until a semantic feature for the entire image 421 is obtained” at Fig. 3B, Figs. 4-420, 421 and ¶¶0047-0048 and 0055); 
obtaining features corresponding to the sub-images by the convolutional neural network (Tuzel discloses that “[a] parse tree 200 of the acquired image 100 is used for recursive context propagation. Nodes in the parse tree represent semantic features of the segments of the acquired image. Hence, combining 250 and decombining 260 segments corresponds to combining and decombining information about semantic features of the segments. Local semantic features 201, 202, 203 are recursively combined to form intermediate segments 210 until a semantic feature is obtained for the entire image 220. The semantic feature for the entire image 220 is decombined to form enhanced semantic features of the intermediate segments 230 until enhanced semantic features for all the local regions 241, 242, 243 are obtained. Then, the local segments can be labeled using their enhanced semantic features. The enhanced semantic features contain both local information and context information from the entire image” at Fig. 2 and ¶¶0019 and 0042-0044). 
 However, Tuzel does not disclose in response to movement of a select box with a preset size on the at least two images, performing determination on pixels in the select box, and when a proportion of pixels of a same category among the pixels in the select box is greater than or equal to a preset value, outputting the image in the select box as a sub-image, and labeling the sub-image as said category.
Seow discloses in response to movement of a select box with a preset size on the at least two images, performing determination on pixels in the select box, and when a proportion of pixels of a same category among the pixels in the select box is greater than or equal to a preset value, outputting the image in the select box as a sub-image, and labeling the sub-image as said category (Seow discloses that “the autogain filter module resizes the bounding box based on the correlation results of step 450, if necessary. As discussed, the correlation function may be configured to return a score indicating a degree of match between the texture around a pixel in the background model gradient image and the video frame gradient image. If the correlation score is high (or low, depending on the implementation), it may indicate that a foreground patch pixel is actually part of the background (i.e., that the pixel is a false-positive foreground pixel). As a result, the autogain filter module may be configured to, for example, remove pixels from the foreground patch where the correlation score for those pixels exceeds (or is less than) a threshold. The bounding box for the foreground patch may then be adjusted accordingly to have width and height equal to the maximum width and maximum height, respectively, of the modified foreground patch” at Fig. 4-460 and ¶0058).
Before the time of the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to utilize resizing the bounding box of Seow to Tuzel’s extracting local feature for the semantic map generation.
The suggestion/motivation would have been to “correct for autogain by, for example, maintaining color constancy or modeling a specific camera's response during autogain and compensating for the response” (Seow; ¶0007).
b.	Regarding claim 8, the combination applied in claim 7 discloses further comprising: when the proportion of the pixels of the same category among the pixels in the select box is less than the preset value, discarding the select box (Seow discloses that “the autogain filter module resizes the bounding box based on the correlation results of step 450, if necessary. As discussed, the correlation function may be configured to return a score indicating a degree of match between the texture around a pixel in the background model gradient image and the video frame gradient image. If the correlation score is high (or low, depending on the implementation), it may indicate that a foreground patch pixel is actually part of the background (i.e., that the pixel is a false-positive foreground pixel). As a result, the autogain filter module may be configured to, for example, remove pixels from the foreground patch where the correlation score for those pixels exceeds (or is less than) a threshold. The bounding box for the foreground patch may then be adjusted accordingly to have width and height equal to the maximum width and maximum height, respectively, of the modified foreground patch” at Fig. 4-460 and ¶0058).
 c.	Regarding claim 9, the combination applied in claim 7 discloses wherein the obtaining features corresponding to the sub-images by the convolutional neural network comprises: 
performing feature extraction respectively on the unlabeled image and the labeled image by the convolutional neural network, so as to obtain features maps respectively corresponding to the unlabeled image and the labeled image (Tuzel discloses that “A local feature 411 is obtained by extracting 410 pixel features from pixels in local regions in the image. A local semantic feature 416 is obtained by semantically mapping 415 local features to a semantic space” at Figs. 4-411, 415, 416 and ¶0055); and 
obtaining, based on a position and size of the select box corresponding to the sub-image, the corresponding feature in the select box from the corresponding feature map, so as to determine the features corresponding to the sub-image (Seow discloses that “the autogain filter module resizes the bounding box based on the correlation results of step 450, if necessary. As discussed, the correlation function may be configured to return a score indicating a degree of match between the texture around a pixel in the background model gradient image and the video frame gradient image. If the correlation score is high (or low, depending on the implementation), it may indicate that a foreground patch pixel is actually part of the background (i.e., that the pixel is a false-positive foreground pixel). As a result, the autogain filter module may be configured to, for example, remove pixels from the foreground patch where the correlation score for those pixels exceeds (or is less than) a threshold. The bounding box for the foreground patch may then be adjusted accordingly to have width and height equal to the maximum width and maximum height, respectively, of the modified foreground patch” at Fig. 4-460 and ¶0058).
d.	Regarding claim 10, Tuzel discloses further comprising: performing, by the semantic segmentation model, image semantic segmentation on the at least one unlabeled image (Tuzel discloses that “A local feature 411 is obtained by extracting 410 pixel features from pixels in local regions in the image. A local semantic feature 416 is obtained by semantically mapping 415 local features to a semantic space” at Figs. 4-411, 415, 416 and ¶0055). 
However, Tuzel does not disclose using stochastic gradient descent to train the semantic segmentation model until a preset convergence condition is satisfied. 
Seow discloses using stochastic gradient descent to train the semantic segmentation model until a preset convergence condition is satisfied (Seow discloses that “At step 430, the autogain filter module loops over each bounding box. For each bounding box, the autogain filter module determines gradient values from the video frame and the background model image at step 440. Note, gradient is one of a number of techniques for computing texture (i.e., local variability of intensity values of pixels) in an image, and other techniques for computing texture may be used in lieu of gradient” at Fig. 4-440 and ¶0054). 
Before the time of the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to utilize the autogain filter module calculating gradient values of Seow to Tuzel’s extracting local feature for the semantic map generation.
The suggestion/motivation would have been to “correct for autogain by, for example, maintaining color constancy or modeling a specific camera's response during autogain and compensating for the response” (Seow; ¶0007).
e.	Regarding claim 11, Tuzel discloses further comprising: before obtaining, by the convolutional neural network based on the category of the at least one unlabeled image and a category of the at least one labeled image, the sub-images respectively corresponding to the at least two images and the features corresponding to the sub-images (Tuzel discloses “The combiner network F_com recursively combines the semantic feature of two child nodes (superpixels) to obtain the semantic feature for a parent node … the local semantic features are combined 420 recursively to form intermediate segments until a semantic feature for the entire image 421 is obtained” at Fig. 3B, Figs. 4-420, 421 and ¶¶0047-0048 and 0055).
However, Tuzel does not disclose using stochastic gradient descent to train the semantic segmentation model until a preset convergence condition is satisfied. 
Seow discloses using stochastic gradient descent to train the semantic segmentation model until a preset convergence condition is satisfied (Seow discloses that “At step 430, the autogain filter module loops over each bounding box. For each bounding box, the autogain filter module determines gradient values from the video frame and the background model image at step 440. Note, gradient is one of a number of techniques for computing texture (i.e., local variability of intensity values of pixels) in an image, and other techniques for computing texture may be used in lieu of gradient” at Fig. 4-440 and ¶0054). 
Before the time of the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to utilize the autogain filter module calculating gradient values of Seow to Tuzel’s extracting local feature for the semantic map generation.
The suggestion/motivation would have been to “correct for autogain by, for example, maintaining color constancy or modeling a specific camera's response during autogain and compensating for the response” (Seow; ¶0007).
f.	Regarding claims 18-19, claims 18-19 are analogous and correspond to claims 7-8, respectively, See rejection of claims 7-8 for further explanation.

Allowable Subject Matter
Claims 5 and 16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN W LEE whose telephone number is (571)272-9554.  The examiner can normally be reached on Mon-Fri 8:00AM-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, NAY MAUNG can be reached on 571-272-7882.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/JOHN W LEE/Primary Examiner, Art Unit 2664                                                                                                                                                                                                        



    
        
            
    

    
        1 See MPEP § 609