DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
Applicant is reminded of the proper language and format for an abstract of the disclosure.
The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words in length. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.
The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc.  In addition, the form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided.
The abstract of the disclosure is objected to because it has more than 150 words.  Correction is required.  See MPEP § 608.01(b).
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the 

Claims 1, 6, 11 and 13 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Qin et al (U.S PG-PUB NO. 20210201499 A1; Provisional Application NO. 62/955, 045).
-Regarding claim 1, Qin discloses a computer-implemented method for segmenting an input image into a segmentation map (Abstract; FIGS. 1-3), the method comprising the step of running a convolutional neural network (CNN) (FIG. 1; Abstract, “encoders/decoders performs convolution neural network processing”) to generate the segmentation map (FIG.1,                         
                            
                                
                                    S
                                
                                
                                    f
                                    u
                                    s
                                    e
                                
                            
                        
                    ) from the input image (FIG. 1, input image) after the CNN is trained (FIGS. 4-5; [0052]-[0053]), wherein the CNN comprises (Abstract; FIGS. 1-3): an encoder arranged to encode the input image (FIG. 1) into an encoded final-stage feature map (FIG. 1,                         
                            
                                
                                    S
                                
                                
                                    s
                                    i
                                    d
                                    e
                                
                                
                                    6
                                
                            
                        
                    ) through plural encoding stages (FIG. 1, En_1, En_2, … En_6; [0043]), generating one or more encoded intermediate feature maps (FIG. 1, input feature maps of En_2, … En_6) before the encoded final-stage feature map is generated (FIG. 1,                         
                            
                                
                                    S
                                
                                
                                    s
                                    i
                                    d
                                    e
                                
                                
                                    6
                                
                            
                        
                    ); a multi-scale context aggregation module (FIGS. 2-3) arranged to sequentially aggregate multi-scale contexts (FIGS. 1-3) of the encoded final-stage feature map from a global scale to a local scale ([0028], “capture and fuse both richer local and global contextual information”; [0029]; [0031]-[0034]) for allowing semantic relationships of respective contexts of different scales to be strengthened to thereby improve segmentation accuracy ([0010]; “semantic information obtained from deep low resolution feature maps”; [0028], “extract multi-scale multi-resolution contextual information … outperforms … segmentation methods in terms of accuracy, robustness and qualitative measures”; [0029], “outputs labels for every pixel in the image”), the multi-scale context aggregation module generating an aggregated-context feature map (FIG. 2, multi-scale feature                         
                            U
                            (
                            
                                
                                    F
                                
                                
                                    1
                                
                            
                            
                                
                                    x
                                
                            
                            )
                        
                    ); and a decoder (FIG. 1, De_1 … De_5; [0043]-[0045]) arranged to decode the aggregated-context feature map (FIGS. 1-3) according to, directly or indirectly, the encoded final-stage feature map and the one or more encoded intermediate feature maps (Abstract; FIG. 1; [0030], “the output of decoders and the last encoder stage generates six output probability maps S(1) through S(6) at different resolutions … fused … to generate”; FIGS. 2-3;[0044]-[0045]), whereby the segmentation map is generated (FIG.1,                         
                            
                                
                                    S
                                
                                
                                    f
                                    u
                                    s
                                    e
                                
                            
                        
                    ).
-Regarding claim 6, Qin discloses wherein: the decoder comprises a plurality of decoding stages (FIG. 1, De_1 … De_5; [0043]-[0045]), an individual decoding stage being arranged to receive first and second input feature maps to generate one output map, the first and second input feature maps each having a same dimension and a same number of channels (FIG. 1; [0044], “Each decoder stage takes the concatenation of the up-sampled feature maps from its previous stage”; [0045], “it up-samples these saliency maps to the input image size”), wherein the individual decoding stage comprises a merging module ([0043], “a fusion module attached to the decoder stages”) and a decoding block (FIG. 1, De_1 … De_5), the merging module being arranged to merge the first and second input feature maps to form a merged feature map (FIG. 1; [0030], “maps are then up-sampled to the input image size and are fused”), the decoding block being arranged to decode the merged feature map to give the output map (FIG. 1; [0044]; [0030]; Abstract).
FIG. 1, De_1 … De_5), the one or more subsequent decoding stages including a last decoding stage (FIG. 1, De_5); the first input feature map of the initial decoding stage is the aggregated-context feature map (FIG. 1, En_5, En_6, De_5), and the second input feature map thereof is, or is derived from, the encoded final-stage feature map; the first input feature map of an individual subsequent decoding stage is the output map of a decoding stage immediately preceding the individual subsequent decoding stage, and the second input feature map thereof is, or is derived from, a feature map selected from the one or more encoded intermediate feature maps (FIG. 1; Abstract, “each decoder in the sequence of decoders performs convolution neural network processing of an up-sampled version of the ultrasound image from a prior decoder, the sequence of decoders form a second parallel dimension to the first dimension  … segmentation maps are produced from paired encoders and decoders in the sequence of encoders and the sequence of decoders”); the output map of the last decoding stage is the segmentation map (FIG. 1, ,                         
                            
                                
                                    S
                                
                                
                                    f
                                    u
                                    s
                                    e
                                
                            
                        
                    ; Abstract, “segmentation maps are combined to form a final probability segmentation output”); the decoding block of the individual decoding stage includes one or more convolutional layers (FIG. 1); and the decoding block of the last decoding stage is realized as a 1 x 1 convolutional layer (FIG. 1; [0045], “output saliency probability maps … from stages En_6, De_5, De_4, De_3, De_2, De_1 by a 3×3 convolution layer …followed by a 1×1 convolution layer”).
FIGS. 1-3; [0029]-[0032]; [0045]). 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-3 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Qin et al (U.S PG-PUB NO. 20210201499 A1; Provisional Application NO. 62/955, 045) in view of Meng (U.S PG-PUB NO. 20210256657 A1; Provisional Application NO. 62/757,278).
-Regarding claim 2, Qin discloses  wherein the multi-scale context aggregation module is further arranged to (FIGS. 2-3): compute                         
                            N
                        
                     atrous-convolution feature maps of the encoded final-stage feature map for                         
                            N
                        
                     different dilation rates ([0033]; [0040], “dilation rate of the convolution filter … dilation rate is set to 2”), respectively, for extracting the multi-scale contexts from the encoded final-stage feature map ([0028], “capture and fuse both richer local and global contextual information”; [0029]; [0031]-[0034]; FIGS. 1-3), where                         
                            N
                            ≥
                            2
                        
                    ; and compute the aggregated-context feature map,                         
                            
                                
                                    s
                                
                                
                                    N
                                
                            
                        
                    , by a recursive procedure of computing                         
                            
                                
                                    s
                                
                                
                                    n
                                
                            
                            =
                            
                                
                                    f
                                
                                
                                    n
                                
                            
                            (
                            
                                
                                    r
                                
                                
                                    n
                                
                            
                            ⨁
                            
                                
                                    s
                                
                                
                                    n
                                    -
                                    1
                                
                            
                            )
                        
                     for                         
                            n
                            ∈
                            {
                            1,2
                            ,
                             
                            .
                            .
                            .
                            ,
                             
                            N
                            }
                             
                        
                    where                         
                            
                                
                                    r
                                
                                
                                    n
                                
                            
                        
                     is an nth computed atrous-convolution feature map,                         
                            
                                
                                    s
                                
                                
                                    n
                                
                            
                        
                     is an nth intermediate result of the aggregated-context feature map, so is a null feature map, ⨁ denotes elementwise summation and                         
                            
                                
                                    f
                                
                                
                                    n
                                
                            
                        
                     is an nth nonlinear function (FIG. 3, connection 302; FIG. 2,                         
                            U
                            (
                            
                                
                                    F
                                
                                
                                    1
                                
                            
                            
                                
                                    x
                                
                            
                            )
                        
                    ), wherein                         
                            (
                            
                                
                                    r
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    r
                                
                                
                                    2
                                
                            
                            …
                             
                            
                                
                                    r
                                
                                
                                    N
                                
                            
                            )
                        
                     forms a sequence of atrous-convolution feature maps arranged in a descending order of dilation rate such that local-scale contexts of the encoded final-stage feature map are allowed to be aggregated under guidance of global-scale contexts thereof, and wherein                         
                            (
                            
                                
                                    f
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    f
                                
                                
                                    2
                                
                            
                            …
                             
                            
                                
                                    f
                                
                                
                                    N
                                
                            
                            )
                        
                      are independently configured.
Qin does teach a multi-scale context aggregation module that has a convolution filter with dilation rate                         
                            d
                            =
                            2
                        
                     for the last encoder stage (FIG. 3, block 304; [0034]; [0030]). Qin is silent to teach a multi-scale context aggregation module that has                         
                            N
                        
                     atrous-convolution feature maps with                        
                             
                            N
                            ≥
                            2
                        
                    .
In the same field of endeavor, Meng discloses a method (Meng: Abstract; FIGS. 1-9) to include a multi-scale context aggregating block at a bottleneck between the downsampling stage and the upsampling stage (Meng: FIGS. 3, 5; [0082]; [0090]).  Meng further teaches wherein the multi-scale context aggregation module is further arranged to: compute                         
                            N
                        
                     atrous-convolution feature maps of the encoded final-stage feature map for                         
                            N
                        
                     different dilation rates respectively, for extracting the multi-scale contexts from the encoded final-stage feature map, where                         
                            N
                            ≥
                            2
                        
                     (Meng: Abstract; FIGS. 3, 6, block 350; FIG. 5, dilation layers 354; [0090], “various dilation rates”; [0091]-[0094]).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Qin with the teaching of Mend by computing                         
                            N
                        
                     atrous-convolution feature maps of the encoded final-stage feature map for                         
                            N
                        
                     different dilation rates in order to improve the performance of extracting local context information at different scales.

Qin discloses that wherein                          
                            
                                
                                    f
                                
                                
                                    n
                                
                            
                        
                     is given by                         
                            
                                
                                    f
                                
                                
                                    n
                                
                            
                            
                                
                                    x
                                
                            
                            =
                            x
                            +
                            
                                
                                    g
                                
                                
                                    n
                                
                            
                            (
                            x
                            )
                        
                     where                         
                            x
                        
                     denotes an input feature map,                         
                            
                                
                                    f
                                
                                
                                    n
                                
                            
                            
                                
                                    x
                                
                            
                             
                        
                    denotes an output of the nth nonlinear function with the input feature map, and                         
                            
                                
                                    g
                                
                                
                                    n
                                
                            
                            
                                
                                    x
                                
                            
                        
                     is a nonlinear component of                         
                            
                                
                                    f
                                
                                
                                    n
                                
                            
                            
                                
                                    x
                                
                            
                        
                     (FIG. 3, connection 302).
Qin is silent to teach a multi-scale context aggregation module that has                         
                            N
                        
                     atrous-convolution feature maps with                        
                             
                            N
                            ≥
                            2
                        
                    .
In the same field of endeavor, Meng discloses a method (Meng: Abstract; FIGS. 1-9) to include a multi-scale context aggregating block at a bottleneck between the downsampling stage and the upsampling stage (Meng: FIGS. 3, 5; [0082]; [0090]).  Meng further teaches wherein the multi-scale context aggregation module is further arranged to: compute                         
                            N
                        
                     atrous-convolution feature maps of the encoded final-stage feature map for                         
                            N
                        
                     different dilation rates respectively, for extracting the multi-scale contexts from the encoded final-stage feature map, where                         
                            N
                            ≥
                            2
                        
                     (Meng: Abstract; FIGS. 3, 6, block 350; FIG. 5, dilation layers 354; [0090], “various dilation rates”; [0091]-[0094]).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Qin with the teaching of Mend by computing                         
                            N
                        
                     atrous-convolution feature maps of the encoded final-stage feature map for                         
                            N
                        
                     different dilation rates in order to improve the performance of extracting local context information at different scales.
-Regarding claim 12, Qin discloses wherein the CNN further comprises: one or more 1 x 1 convolutional layers (FIG. 1; [0045], “output saliency probability maps … from stages En_6, De_5, De_4, De_3, De_2, De_1 by a 3×3 convolution layer …followed by a 1×1 convolution layer”). Qin is silent to teach an individual 1 x 1 convolutional layer being arranged to derive the second input feature map of a decoding stage selected from the plurality of decoding stages, wherein the second input feature map of the selected decoding stage is derived from a corresponding feature map generated by the encoder by resampling the corresponding feature map such that the first and second input feature maps of the selected decoding stage have a same dimension.
In the same field of endeavor, Meng teaches wherein the CNN further comprises: one or more 1 x 1 convolutional layers (Meng: [0024]; [0037]; [0047]), an individual 1 x 1 convolutional layer being arranged to derive the second input feature map of a decoding stage selected from the plurality of decoding stages (Meng: [0024], “convolutional layers at the upsampling stage include 1×1 convolutional layers”; [0065]; FIGS. 3, 6, 8), wherein the second input feature map of the selected decoding stage is derived from a corresponding feature map generated by the encoder by resampling the corresponding feature map such that the first and second input feature maps of the selected decoding stage have a same dimension (Meng: FIG. 9, block 924; [0112], “convolutional layers 302 of the downsampling stage and the convolutional layers 302 of the upsampling stage having a (substantially) same resolution (or at substantially same downsampling and upscaling level) with the convolutional layers 302 of the downsampling stage are concatenated. The concatenation means feature maps are combined by means of copy and crop operations as needed”; [0113]; FIGS. 3, 6, 8).
.
Claims 4-5 are rejected under 35 U.S.C. 103 as being unpatentable over Qin et al (U.S PG-PUB NO. 20210201499 A1; Provisional Application NO. 62/955, 045) in view of Meng (U.S PG-PUB NO. 20210256657 A1; Provisional Application NO. 62/757,278), and further in view of Liu et al (IEEE Transactions on Image Processing ( Volume: 29), Page(s): 1413 – 1425, Date of Publication: 16 September 2019).
-Regarding claim 4, Qin in view of Meng discloses the method of claim 3. Qin in view of Meng does teaching the multi-scale context aggregation module includes a plurality of dilation layers at bottleneck for computing                         
                            
                                
                                    f
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    f
                                
                                
                                    2
                                
                            
                            …
                             
                            
                                
                                    f
                                
                                
                                    N
                                
                            
                        
                    . Qin in view of Meng the multi-scale context aggregation module includes a plurality of bottleneck blocks.
However, Liu is an analogous art pertinent to the problem to be solved in this application and further discloses wherein the multi-scale context aggregation module includes a plurality of bottleneck blocks for computing                         
                            
                                
                                    f
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    f
                                
                                
                                    2
                                
                            
                            …
                             
                            
                                
                                    f
                                
                                
                                    N
                                
                            
                             
                        
                    (Liu: Abstract; page 1417, Fig. 3, caption; page 1415, formula (1); page 1414, col. 1, 2nd paragraphs, “two bottleneck blocks”, 4th paragraphs, “two more discriminative bottleneck blocks”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Qin in view of Meng with the teaching of Liu by using a multi-scale context aggregation module                         
                            
                                
                                    f
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    f
                                
                                
                                    2
                                
                            
                            …
                             
                            
                                
                                    f
                                
                                
                                    N
                                
                            
                        
                     in order to improving segmentation accuracy and enhance discriminative power
-Regarding claim 5, Qin in view of Meng, and further in view of Liu discloses the method of claim 4.
The modification further discloses wherein an individual bottleneck block includes one or more convolutional layers (Qin: FIG. 3).
Allowable Subject Matter
Claims 7-10 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claims 14-20 allowed.
The following is a statement of reasons for the indication of allowable subject matter:  Qin and Meng appear to be the closest prior arts on record. Qin discloses a method extracting multi-scale multi-resolution contextual information regardless of the size of the input feature maps and comprising residual blocks at different stages that are able to capture and fuse both richer local and global contextual information in each stage without degrading feature map resolution. Meng discloses a method that includes processing a digital image with an encoder-decoder neural network comprising a plurality of convolutional layers classified into a downsampling stage and an upsampling stage, and a multi-scale context aggregating block configured to aggregate multi-scale context information of the digital image and employed between the downsampling stage and the upsampling stage. However, cited references as a whole, either alone or in combination do not teach or suggest the allowable subject matter or the claimed , inter alia, wherein the individual decoding stage comprises a merging module and a decoding block, the merging module being arranged to merge the first and second input feature maps to form a merged feature map, the decoding block being arranged to decode the merged feature map to give the output map; and the merging module is a channel-wise feature selection (CFS) module arranged to: process the first and second input feature maps each with an individual cascade of a global pooling (GP) layer and an attention layer to yield first and second attention feature maps of dimension 1 x 1 x C, respectively, wherein: each of the first and second input feature maps has a dimension of W x H x C; the GP layer performs a pooling operation on W x H data in each of C channels of a respective input feature map to yield a GP- output feature map of dimension 1 x 1 x C; and the attention layer generates a respective attention feature map by determining an attention of each of the C channels according to the GP-output feature map such that a channel of higher activation among the C channels has a higher attention, the attention layer being either a fully connected layer or a 1 x 1 convolutional layer, a same set of weights being used in the attention layer of the individual cascade in processing both the first and second input feature maps channel-wise multiply the first input feature map with the second attention feature map to yield a first post-processed input feature map; channel-wise multiply the second input feature map with the first attention feature map to yield a second post-processed input feature map; and perform elementwise addition of the first and second post-processed input feature maps to give the merged feature map such that channels with high activation in both the first and second input feature maps are preserved and enhanced.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIAO LIU whose telephone number is (571)272-4539.  The examiner can normally be reached on Monday-Thursday and Alternate Fridays 8:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nay Maung can be reached on (571) 272-7882.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/XIAO LIU/Examiner, Art Unit 2664                                                                                                                                                                                                        

/PING Y HSIEH/Primary Examiner, Art Unit 2664