DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


a.	Claims 15-20 are rejected under 35 U.S.C. 101 because the claims are rejected under 35 U.S.C. 101 because the claimed invention is directed to nonstatutory subject matter. The claims recite a non-volatile computer readable storage medium, which typically covers both forms of non-transitory tangible medium and transitory propagating signals per se under the broadest reasonable interpretation of the claims. The specification of the application does not preclude the non-volatile computer readable storage medium construed as transitory tangible medium and transitory propagating signals per se.  Claims drawn to such a non-volatile computer readable storage medium that covers both transitory and non-transitory embodiments may be amended to narrow the claim to cover only statutory embodiments to avoid a rejection under 35 U.S.C. § 101 by adding the limitation non-transitory computer readable storage medium to the claim preamble (emphasis added).

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –




(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-5, 7-12, and 14-19 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by El-Khamy et al. (US 2019/0057507 A1; hereinafter, “El”).
a.	Regarding claim 1, El discloses a method for training a semantic segmentation model, comprising-the following steps:
constructing a training sample set, wherein the training sample set comprises a plurality of first-category objects and a plurality of second-category objects (El discloses that “the system 10 for instance semantic segmentation includes a fully convolutional instance semantic segmentation (FCIS) core network 100, which, at 2100, processes the initial image to extract core neural network features 102 from an input image 20 (e.g., a bitmap image of a scene containing one or more objects, such as a photograph of a street)” at Fig. 1A, Fig. 1B-2100 and ¶0050), wherein the first-category objects are marked with bounding boxes and segmentation masks, and the second-category objects are marked with bounding boxes (El discloses that “the RPN 300 generates a plurality of detection boxes or bounding boxes (RPN BBoxes) 302 corresponding to the locations of individual features. Each of the detection boxes is defined by a plurality of box coordinates that identify a region of interest that corresponds to one of the objects in the image (e.g., the RPN generates a detection box for each object it detects in the image). The quality of instance semantic segmentation is governed by having a fairly accurate RPN that does not miss any detection. However, having a high recall (do not miss detections) usually results in multiple false detections as well” at Fig. 1B-2300 and ¶0055; El discloses that “the bounding boxes 302 are supplied to a segmentation mask prediction network 400 or segmentation mask head, which, at 2400, generates predicts a segmentation mask for each 
inputting the training sample set into a deep network model for training to obtain first bounding box parameters and first mask parameters of the first-category objects and second bounding box parameters of the second-category objects (El discloses that “The segmentation mask head is a fully convolutional deep neural network that is trained to predict a segmentation mask for each box proposal 302 from the RPN 300, and for each object class. The segmentation mask prediction network 400 (or segmentation mask head) is configured to predict a segmentation mask from a cropped feature map corresponding to an RPN bounding box (e.g., a portion of a feature map, as cropped by an RPN bounding box), either by a one shot prediction or for each grid cell after pooling the feature map crop corresponding to the RPN into a fixed-size grid of cells. The segmentation mask head 400 is further configured to provide a pixel-level classification score for each class (e.g., each class of object to be detected by the instance semantic segmentation system 10, where the classes may include, for example, humans, dogs, cats, cars, debris, furniture, and the like), and can also provide a pixel-level score of falling inside or outside a mask. The pixel level scores are aggregated over each mask to produce a confidence score for the mask. According to one embodiment of the present disclosure, the instance semantic segmentation system 10 provides segmentation mask prediction by aggregating all intermediate feature maps of different scales (instead of using a single feature to provide to the segmentation mask predictor, or choosing only one single pyramid scale for each region of interest). In the embodiment shown in FIG. 1A, features at three scales are shown as 410, 430, and 450, which represent feature maps at the different feature scales that are from different feature maps 210, 230, and 250 computed by the FPN 200” at Fig. 1B-2400 and ¶0056); and inputting the first bounding box parameters and the first mask parameters into a weight transfer function for training to obtain a bounding box prediction mask parameter (El discloses that “convolutional neural network (CNN) is translation invariant, 

b.	Regarding claim 2, El discloses wherein after the step of inputting the first bounding box parameters, the first mask parameters, the second bounding box parameters, and the bounding box prediction mask parameter into the deep network model and the weight transfer function to construct a semantic segmentation model, the method comprises:
inputting an image to be segmented into the semantic segmentation model to output a semantic segmentation result of the image to be segmented (El discloses that “The segmentation masks 402 are then supplied to a pyramid segmentation network 500 to generate, at 2500, a segmentation mask 502 for a particular object, as generated from the separate masks generated at different resolutions (e.g., at the different resolutions of the multi-resolution feature maps 410, 430, and 450) by the segmentation mask prediction network 400. According to one embodiment, the present system learns a combination of the segmentation masks predicted at multiple scales of the FPN. For each class, each mask and RPN box is defined by a pixel-wise score” at Fig. 1B-2500 and ¶0058).
c.	Regarding claim 3, El discloses wherein the step of inputting an image to be segmented into the semantic segmentation model to output a semantic segmentation result of the image to be segmented comprises:

predicting mask parameters of the first-category objects in the image to be segmented by using the bounding boxes of the first-category objects and the bounding box prediction mask parameter, and predicting mask parameters of the second-category objects in the image to be segmented by using the bounding boxes of the second-category objects and the bounding box prediction mask parameter (El discloses that “a pyramid segmentation network to prevent false positives based on detections that are totally overlapped with other detections of same class. The present system includes a metric referred to as an Intersection over Self (IoS) that eliminates detections almost totally contained in other detections: IoS=(Intersection area with other detection)/Self Area; If the IoS=1 means the detection is totally contained in another and 
performing semantic segmentation on the first-category objects and the second-category objects in the image to be segmented by using the mask parameters of the first-category objects and the mask parameters of the second-category objects in the image to be segmented (El discloses that “a pyramid segmentation network to prevent false positives based on detections that are totally overlapped with other detections of same class. The present system includes a metric referred to as an Intersection over Self (IoS) that eliminates detections almost totally contained in other detections: IoS=(Intersection area with other detection)/Self Area; If the IoS=1 means the detection is totally contained in another and can be safely discarded. Hence the present system may discard detections where IoS exceeds some threshold, where the threshold is a value less than, but close to, 1 (thereby indicating a large degree of containment in the other region … The higher resolution feature maps, such as the third feature map 250, produced by the FCN 200 are also provided to a belonging-bounding box (BBBox) prediction network 600 and a density prediction network 700. The BBBox prediction network 600 and the density prediction network 700 may be referred to herein as "auxiliary networks” at ¶¶0060-0061).
d.	Regarding claim 4, El discloses wherein the deep network model is a Mask-RCNN network model (¶0086).

f.	Regarding claim 7, El discloses wherein the number of the second-category objects is greater than that of the first-category objects (El discloses that “The FPN 200 generates, at 2200, higher resolution feature maps 210, 230, and 250 of the input image and may apply upsampling to generate feature maps at multiple resolutions or scales. Generally, feature maps are downsampled in the core fully convolutional kernels (e.g., FCIS 100) by max or average pooling in order to improve the quality of the trained representations and to manage or constrain growth in the computational complexity of the deep neural networks (e.g., in neural networks containing many hidden layers). According to some aspects of embodiment of the present disclosure, the FPN 200 includes a final representation which is of higher resolution than the output of the FCIS 100 and which contains information from high-level representations as well. Accordingly, in one embodiment, the FPN upsamples a first feature map 210 using a nearest neighbor approach to generate a second feature map 220 of higher resolution, applies a convolution kernel transformation to the first feature map 210 of the same upsampled resolutions, then combines both representations into a third feature map 230 at the resolution of the upsampled feature map. The third feature map 230 can then be further upsampled to generate a fourth feature map 240 and combined with another kernel of similar resolution after a convolutional representation to generate a fifth feature map 250. This may be repeated until the desired resolution of the final feature map is achieved, with the limit being the resolution of the input image. This network is referred to being a feature "pyramid" because the size of the feature map increases at each level (e.g., feature maps of levels 210, 230, and 250). It is appreciated that there may be any number or level of feature maps without deviating from the scope of the present disclosure” at Fig. 1B-2200 and ¶0052).

h.	Regarding claims 15-19, claims 15-19 are analogous and correspond to claims 1-5, respectively. See rejection of claims 1-5 for further explanation.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 6, 13, 20 are rejected under 35 U.S.C. 103 as being unpatentable over El-Khamy et al. (US 2019/0057507 A1; hereinafter, “El”).
a.	Regarding claim 6, El discloses all the previous claim limitation.
Moreover, El discloses wherein the weight transfer function is a two-layer fully connected neural network, wherein the two fully connected layers have 5120 neurons and 256 neurons, respectively, and an activation function used is LeakyReLU (El discloses that “deep neural network includes a plurality of neurons arranged into layers. Input data (e.g., in this case, an input image) is supplied to an input layer of neurons and an output layer in generated at a layer of output neurons. In the case of a deep neural network, more than one "hidden layer" of neurons exists between the input layer and the output layer, where, generally speaking, neurons in one layer receive inputs from a previous layer and provide their outputs to a following layer, where each neuron generates an output that is a mathematical function of the sum of the inputs it receives, such as a logistic function” at ¶0098). 

Here, this is a design choice, and the applicant does not disclose that the neurons being 5120 and 256 provides an advantage, is used for a particular purpose, or solves a stated problem.
Before the time of the effective filing date of the claimed invention, it would have been an obvious matter of design choice to a person of ordinary skill in the art to utilize the neurons being 5120 and 256 in the fully connected neural network of El
The suggestion/motivation would have expected the combination and the applicant’s invention, to perform equally well the neurons being 5120 and 256 or a different one because both the neurons would perform the same function as part of the invention. See MPEP 2144.04.
b.	Regarding claims 13 and 20, claims 13 and 20 are analogous and correspond to claim 6. See rejection of claim 6 for further explanation.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN W LEE whose telephone number is (571)272-9554.  The examiner can normally be reached on Mon-Fri 8:00AM-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, NAY MAUNG can be reached on 571-272-7882.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR 




/JOHN W LEE/Primary Examiner, Art Unit 2664