DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
Claims 1-20 are pending.


Claim Rejections - 35 USC § 102 and/or 35 USC § 103
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Examiner’s notes: the corresponding text descriptions of any figure(s)  

Claim(s) 1-4, 6-7, 9-13, 15-16 and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Long et al (FCN for Semantic Segmentation, 2015) in view of Hu et al (Segmentation from natural language, 2016).

Regarding claims 1 and 10, Long teaches a method for training an artificial neural network, performed by at least one computer processor executing computer program instructions stored on at least one non-transitory computer-readable medium, the method comprising:
(Long, Figure 1, FCN)
(A) receiving a plurality of images and a plurality of sets of corresponding ground truth instance labels;
(Long, Figure 6, input images with corresponding ground truth images; use “20 images” to train an FCN, p6:c2; ground truth, p3:c2, p5:c1, p9:c1)
(B) receiving a training objective, wherein the training objective specifies that a \\divergence of a probability\\ that a first sample and a second sample correspond to the same instance label as each other in the plurality of sets of corresponding ground truth instance labels is to be minimized, wherein:
	the first sample corresponds to a first pixel in an image;
	a first pixel label distribution comprises the first sample;

	a second pixel label distribution comprises the second sample; and
(Long, Figure 6; images are segmented into different classes of objects in class segmentation; each class of objects is labeled using a same color; Figure 6, top row, vehicles are segmented into a same class labeled with gray color because of their similar spatial characteristics; “The spatial output maps of these convolutionalized models make them a natural choice for dense problems like segmantic segmentation. With ground truth available at every output cell, both the forward and backward passes are straightforward, and both take advantage of the inherent computational efficiency (and aggressive optimization) of convolution”, p3:c2)
While Long implicitly indicates that semantic classification is based on common characteristics of different objects (e.g., human, animals, vehicles, etc.) that belong to the same category, Long does not expressly disclose but Hu teaches:
… divergence of a probability … is to be minimized;
(Hu, Fig. 1(d), “people in blue coat”; semantic segmentation can be based on colors of segmented objects; two people in blue coat have minimum divergence of a probability in coat colors as compared with other segmented people objects and therefore they belong to the same class of detected people in terms of color of coat)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the teachings of Hu into the system or method of Long in order to perform semantic segmentation in some simple categories using color discriminations. The combination of Long and Hu also teaches other enhanced capabilities.
further teaches:
(C) training the artificial neural network based on the plurality of images, the plurality of sets of corresponding ground truth instance labels, and the training objective, comprising:
	applying a minimum-loss mapping from neural network labels to ground truth instance labels; and
	training the artificial neural network to minimize a loss function after applying the minimum loss mapping.
(Long, Fig. 4, ground truth instances (biker and bike); “using image classification as supervised pre-training, and fine-tune fully convolutionally to learn simply and efficiently from whole image inputs and whole image ground truths”, p2:c2; “SIFT Flow is a dataset of 2,688 images with pixel labels for 33 semantic categories (“bridge”, “mountain”, “sun”), as well as three geometric categories (“horizontal”, “vertical”, and “sky”), p8:c1; these are labels for various instance categories; Hu, Fig. 1(d); “Classification over segmentation proposals. In this baseline method, we first extract a set of candidate segmentation proposals using MCG [31], and then train a binary classifier to determine whether or not a candidate segmentation proposal matches the expression”, p9; “output a score to indicate whether a spatial location belong to the target image region or not”, p6; eq. (2), at a ground truth point (Mij =1), higher match score vij leads to a lower loss L(vij, Mij); Fig. 3, “a recurrent LSTM network” is a neural network takes back-propagation to minimize classification loss; “instance segmentation (e.g. [8]), which additionally distinguishes different instances of an object class (Figure 1, c). It also differs from language-independent foreground segmentation”, p2; Fig. 1, a first object instance for segmentation can be "people in blue coat" (Fig. 1(d)); the method here can certainly segment a second object instance of "people in red coat" even though it is not shown in Fig. 1(d)) 

Regarding claims 2 and 11, the combination of Long and Hu further teaches its/their respective base claim(s).
The combination further teaches the method of claim 1, further comprising:
(D) applying the artificial neural network, after the training,
	to a particular image comprising a plurality of pixels,
	to produce a categorical probability distribution of the plurality of pixels over a plurality of instance labels.
(Hu, Fig. 1(d), “people in blue coat”; semantic segmentation)

Regarding claims 3 and 12, the combination of Long and Hu further teaches its/their respective base claim(s).
The combination further teaches the method of claim 2, wherein the categorical probability distribution maps a first one of the plurality of pixels to a first instance of a first class and maps a second one of the plurality of pixels to a second instance of the first class.
(Hu, Fig. 1, for object class “human” (Fig. 1(a)), a first object instance for segmentation 

Regarding claims 4 and 13, the combination of Long and Hu further teaches its/their respective base claim(s).
The combination further teaches the method of claim 2, wherein the categorical probability distribution maps a first one of the plurality of pixels to a first instance of a first class and maps a second one of the plurality of pixels to a first instance of a second class.
(Long, Figure 1; different class labelings for a cat and a dog at the output of an FCN)

Regarding claims 6 and 15, the combination of Long and Hu further teaches its/their respective base claim(s).
The combination further teaches the method of claim 1, wherein (C) comprises training the artificial neural network by minimizing a divergence of permutation-invariant auxiliary distributions derived from the ground truth instance labels and network distributions.
(Note: the terminology of “permutation-invariant auxiliary distributions” is neither defined in the specification nor commonly known in the art; however, the specification indicates ” minimizing the divergence of permutation-invariant auxiliary distributions derived from the ground truth and network distributions (e.g., that two pixels share the same label)”, [0005]; this guides the below office action on this claim);
(Hu, Fig. 1(d), two people with blue coat are classified into a same class/label color; eq. 

Regarding claims 7 and 16, the combination of Long and Hu further teaches its/their respective base claim(s).
The combination further teaches the method of claim 1, wherein the artificial neural network comprises a Fully Convolutional Neural Network.
(Long, Figure 1)

Regarding claims 9 and 18, the combination of Long and Hu further teaches its/their respective base claim(s).
The combination further teaches the method of claim 1, wherein the training in (C) enforces that L, the probability distribution over N instance labels, factorizes over H x W independent categorical distributions.
(See 112(b) rejection to this claim; Hu, Fig. 1; the FCN in this example can classify an image with W x H pixels (eq. (1)) into 4 people labels (Fig. 1(c), N = 4) each with pi pixels; the FCN can further select, per objective, one semantic segmentation output (Fig. 1(d), 2 “people with blue coat”) out of 4 identified people with 3 different colors of 

Regarding claims 19 and 20, the combination of Long and Hu teaches a method for training an artificial neural network, performed by at least one computer processor executing computer program instructions stored on at least one non-transitory computer-readable medium, the method comprising:
(A) receiving a plurality of images and a plurality of sets of corresponding ground truth instance labels;
(B) receiving a training objective, wherein the training objective specifies to minimize, for each pixel in an image, a divergence between
	\\(1) a distribution over instance labels produced by the artificial neural network and
	(2) a distribution resulting from applying an injective mapping of ground truth instance labels to neural network output labels to the distribution over instance labels determined from the corresponding ground truth instance labels,
	wherein the injective mapping is chosen for each image at every update step to minimize the divergence between (1) and (2); and\\
(C) training the artificial neural network based on the plurality of images, the plurality of sets of corresponding ground truth instance labels, and the training objective.
(Long, Hu, see comments on claim 1)
	The combination further teaches:
		(1) a distribution over instance labels produced by the artificial neural 
(Hu, Fig. 1(c), object instance segmentation of class people)
		(2) a distribution resulting from applying an injective mapping of ground truth instance labels to neural network output labels to the distribution over instance labels determined from the corresponding ground truth instance labels,
(Hu, “Mij is the binary ground-truth label at pixel (i, j)”, eqs (1) and (2), p7; a ground truth image is incorporated in the FCN training)
		wherein the injective mapping is chosen for each image at every update step to minimize the divergence between (1) and (2); and
(Hu, “The whole network is trained with standard back-propagation using SGD with momentum”, p7; the ground truth image contribution in the loss function is feedback to the neural network through backpropagation as part of neural network training, which results in semantic segmentation results in e.g., Fig, 1(d) with minimum color difference of two segmented people: the “people in blue coat”))

Claim(s) 8 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Long et al (FCN for Semantic Segmentation, 2015) in view of Hu et al (Segmentation from natural language, 2016) and further in view of Liu et al (US2018/0211157).

Regarding claims 8 and 17, the combination of Long and Hu further teaches its/their respective base claim(s).
The combination does not expressly disclose but Liu teaches the method of 
(Liu, The outputs of the neural network may be provided in various forms. For example, according to the practical need, an activation function for the output layer may be selected from the group consisting of a softmax function, a sigmoid function and a tan h function. Through the functions such as the softmax function, each label may be endowed with a certain probability, and the label having the largest probability may be selected as a label or type of the image”, [0043]
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the teachings of Liu into the modified system or method of Long and Hu in order to obtain a statistically accurate classification result from an FCN using a softmax function at the output activation layer. The combination of Long, Hu and Liu also teaches other enhanced capabilities.


Response to Arguments
Applicant's arguments filed on 3/21/2022 with respect to one or more of the pending claims have been fully considered but they are not persuasive.

Regarding claim(s) 1, Applicant, in pages 11-15 of the remarks, appears to argue that the combination of the cited reference(s) fails to teach “mapping from neural network labels to ground truth instance labels” as recited in claim 1. 
.


Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIANXUN (JAMES) YANG whose telephone number is (571)272-9874. The examiner can normally be reached on MON-FRI: 8AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nay Maung can be reached on (571)272-7882. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.



/JIANXUN YANG/
Primary Examiner, Art Unit 2664				3/28/2022